·

8 min

Smoke Testing: The Quick Check Before Every Release

Roman Kirchmeier - Autemos

Roman Kirchmeier - Autemos

Developer checks a CI/CD pipeline dashboard where the build-verification smoke test passes green

A build compiles, the deployment succeeds – and the app still won't start. This is exactly where smoke testing earns its keep: a fast, broad check of core functions before any deeper test even begins. For QA leaders in DACH banking, that's no minor detail. According to the DORA report 2024, low performers fail 64% of their deployments, while elite teams stay below 5% (Google DORA via Octopus, 2024). The gap rarely comes from code alone. It comes from gates that stop broken builds early. This article shows how a smoke test works as an automated quality gate, where it differs from sanity and regression, and what it must deliver in regulated environments.

In brief: A smoke test checks broadly and shallowly right after the build whether the system fundamentally works. As a CI/CD gate it should run under 2 minutes and stop broken builds instantly. Low performers fail 64% of deployments (DORA, 2024) – a fast gate measurably cuts that risk.

Infographic: What a smoke test checks – app start, login, core transaction and critical APIs as a broad, shallow check under 2 minutes following the fail-fast principle.

Figure 1: A smoke test broadly and shallowly checks a build's core functions.

What is a smoke test?

A smoke test, per ISTQB, is a test suite covering the main functionality of a system to determine whether it works properly before planned testing begins (ISTQB, 2024). The term comes from hardware testing: switch the device on and watch for smoke. Applied to software, it asks a single thing – does the system even start?

The test goes by several names. ISTQB lists confidence test, intake test, and build verification test (BVT) as synonyms. In CI/CD contexts, BVT has stuck because it describes the function precisely: it verifies that a build is even testable.

The distinction matters. A smoke test checks breadth, not depth. It answers one question: is this build stable enough to make thorough testing worthwhile? If it finds a fault, the pipeline stops – before expensive regression runs tie up resources.

What does a smoke test cover – and what not?

A smoke test checks the critical happy-path flows of a system: does the application start, does login work, does a core transaction complete, do the critical APIs respond (BrowserStack, 2024). It stays deliberately broad and shallow. Deep validation is explicitly not the goal here.

What belongs in a smoke test:

  • Application start: Does the app load without crashing?

  • Authentication: Does login work with valid credentials?

  • Core transaction: Does the key business process complete (e.g. a transfer)?

  • Critical interfaces: Do core APIs and downstream services respond?

What a smoke test deliberately skips:

  • Edge cases and rare special scenarios

  • Negative tests with invalid input

  • Performance and load tests

  • Detailed validation of individual modules

That sharp scope is the whole point. A smoke test that tries to check too much becomes slow and unreliable – losing the very quality that makes it valuable.

Smoke vs. sanity vs. regression – where's the difference?

Comparison table: Smoke vs. sanity vs. regression by purpose, scope, timing and duration – smoke is broad, shallow and runs under 2 minutes.

Figure 3: Smoke, sanity and regression answer three different questions.

Smoke, sanity, and regression are often confused, yet they answer three different questions. The smoke test asks: is the build stable? The sanity test asks: does a specific change work? The regression test asks: did the change break something existing? ISTQB lists sanity partly as a synonym – but in practice most test architects keep the two deliberately separate.


Smoke

Sanity

Regression

Purpose

Is the build stable/testable at all?

Does a specific change/fix work?

Did a change break existing behaviour?

Scope

broad & shallow (whole system, critical paths)

narrow & deep (changed modules)

very broad (whole system)

Timing

right after build, BEFORE all tests (gate)

after fix, before regression

after changes, before release

Automation

mostly fully automated

often manual/partial

heavily automated

Duration

< 2 min (best practice)

minutes

min–hours

Sources: BrowserStack (2024), CloudBees (2024), CircleCI (2024).

The practical difference: a smoke test is broad and fully automated. A sanity test is narrow, deep, and often manual. Treat them as the same and you lose a valuable tool. For more on distinguishing test types, see our overview of software testing types.

How does a smoke test work as a CI/CD gate?

Infographic: Smoke test as a CI/CD gate – build, deploy to staging, smoke gate under 2 minutes; green leads to deeper tests, red stops the pipeline.

Figure 2: As a CI/CD gate, the smoke test stops broken builds following the fail-fast principle.

The smoke test runs immediately after deploy to test or staging – before all deeper suites. That makes it an automated quality gate that stops broken builds early (Harness, 2024). Instead of pushing a broken build through the entire pipeline, the gate aborts at once. The principle is fail-fast.

In practice, two patterns have proven themselves. The post-deploy smoke checks right after each deployment whether the environment runs at all. The pre-promotion smoke sits as a gate between staging and production: only a green smoke test permits promotion. In banking especially, where a faulty prod deployment gets costly, that second stage is decisive.

Why the 2-minute rule matters

Best practice is a smoke test under two minutes of runtime (CircleCI, 2024). The reason is both psychological and technical. If the gate takes too long, teams start bypassing it or ignoring red results. A fast, deterministic gate keeps trust – and that trust is the real currency.

You implement it with Selenium, Cypress, or Playwright via tagging: a small, tagged subset of tests forms the smoke suite. We describe what a full regression run looks like afterwards in our regression testing guide.

Which metrics show the ROI of a smoke gate?

Bar chart: DORA Change Failure Rate 2024 – elite 5%, high 10%, medium 15%, low 64% of failed deployments.

Figure 4: DORA Change Failure Rate 2024 – low performers fail 64% of deployments.

The key metric is the DORA Change Failure Rate. In 2024 it clearly separates the performance tiers: elite teams fail 5% of deployments, high 10%, medium 15% – low performers 64% (Google DORA via Octopus, 2024). A reliable smoke gate catches a share of these failures before they reach production.

The distribution of teams has also shifted. In the DORA report 2024 the high-performer cluster shrank from 31% to 22%, while the low cluster grew from 17% to 25% (DORA via getDX, 2024). Stability, then, is no given – it has to be earned deliberately.

We can't quote the exact runtime gain of an Autemos smoke gate here, because reliable benchmark data would be missing. Qualitatively, though, a clear pattern holds: the earlier a defect surfaces in the pipeline, the cheaper the fix. A gate that aborts in two minutes saves the cost of a failed prod deployment.

What if the smoke test itself turns flaky?

Flaky tests are an underrated risk: in an industrial study, flakiness caused 11–27% of build failures, with further noise at 5–16% (arXiv 2504.11839, 2025). If the smoke gate of all things becomes unreliable, it undermines the very trust it should protect. Red results then get ignored – and the gate becomes worthless.

The countermeasure starts with design. A smoke test must be small, deterministic, and fast. Fewer moving parts mean a more stable result. Yet UI smoke checks often break on shifted locators when the frontend changes.

This is where AI-assisted automation comes in. Self-healing locators detect changed UI elements and adapt, instead of going red (JetBrains, 2025). That noticeably lowers the maintenance burden of high-upkeep smoke suites. We explain how this works technically in self-healing locators.

Why is this especially relevant in DACH banking?

In regulated banking, a smoke gate is more than comfort – it's part of auditability. Notably, 73% of organisations still use no AI in their CI/CD workflows (JetBrains State of CI/CD, 2025). Adopting low-maintenance, AI-assisted gates early gives you an edge in reliability and provability.

Banking platforms are also fragmented. Per JetBrains, 32% of organisations use two CI/CD tools, and 9% even three or more (JetBrains, 2025). A single customer interacts via web, mobile, API, and sometimes desktop. A smoke gate that covers only one channel leaves blind spots.

The cross-channel approach solves exactly that. A unified smoke gate across all channels – web, mobile, API, desktop – checks the platform consistently. Autemos covers these stacks with a single suite and delivers reproducible gate results of the kind an audit demands. For the broader context, see our guide to AI test automation.

FAQ

How long should a smoke test take?

Best practice is under two minutes (CircleCI, 2024). If the gate takes longer, teams start bypassing it or ignoring red results. A short, deterministic smoke gate keeps developer trust – and only a trusted gate fulfils its purpose in the CI/CD run.

Is a smoke test the same as a sanity test?

No, even though ISTQB lists sanity partly as a synonym. A smoke test is broad, shallow, and fully automated, checking whether the build is stable. A sanity test is narrow, deep, and often manual, checking a specific change. In practice, test architects keep the two deliberately separate.

Can a smoke test be run manually?

In principle yes, but in practice it's mostly fully automated (BrowserStack, 2024). As a CI/CD gate it must run automatically after every build. Manual smoke tests slow the pipeline and contradict the fail-fast principle. Automation here is no luxury but a prerequisite.

What happens when the smoke test fails?

The pipeline stops at once – that's the fail-fast principle (Harness, 2024). Deeper tests like regression never even start, which saves resources. The build is rejected, the team fixes the root issue, and only a green smoke test releases the pipeline again.

Why are flaky smoke tests so dangerous?

Because they destroy trust in the gate. Flakiness causes 11–27% of build failures (arXiv 2504.11839, 2025). When a gate goes red without a real fault, teams get used to ignoring it. Self-healing locators and strict, deterministic design keep the suite stable.

Conclusion

A smoke test is the cheapest insurance in your pipeline. In under two minutes it checks broadly and shallowly whether a build is even testable – and stops broken builds before they trigger expensive regression runs or a faulty prod deployment. The numbers are clear: while elite teams fail 5% of their deployments, low performers fail 64% (DORA, 2024). In DACH banking, a reliable, cross-channel gate decides auditability and release safety. The biggest danger isn't the missing test but the flaky smoke test that destroys trust. Self-healing and a unified cross-stack gate address exactly this. Want to consolidate your smoke gate across web, mobile, API, and desktop? Talk to our team.

Experience Autemos. In just 30 minutes.

See for yourself and experience how simple, flexible, and controlled modern test automation can be today.

Social Connect

© 2026 Autemos. A product of selementrix GmbH.

Experience Autemos.
In just 30 minutes.

See for yourself and experience how simple, flexible, and controlled modern test automation can be today.

Social Connect

© 2026 Autemos. A product of selementrix GmbH.

Experience Autemos.
In just 30 minutes.

See for yourself and experience how simple, flexible, and controlled modern test automation can be today.

Social Connect

© 2026 Autemos. A product of selementrix GmbH.