·
8 min
Test Coverage and Code Coverage: How to Measure Meaningfully

Roman Kirchmeier - Autemos

A team celebrates 100% code coverage — and weeks later production reports the exact critical bug nobody caught. How does that happen? Coverage measures which lines your tests execute, not whether those tests check anything meaningful. A line can run without a single assertion verifying its behavior. This is where useful measurement and dangerous self-deception part ways. This article explains the difference between code coverage, test coverage, and requirements coverage, walks through the main criteria, and answers the real question: which percentage actually makes sense for your project. For deeper foundations, see our pillar overview.
TL;DR: Code coverage shows which lines ran — not whether they were tested correctly. Google calls 75% solid and 90% exemplary (Google Testing Blog, 2020) but mandates no fixed number. Use coverage to find untested code, never as proof of quality. Mutation testing gives a stronger signal.

Figure 1: Executed is not verified – code coverage measures reach, not quality.
What separates test coverage, code coverage, and requirements coverage?
The ISTQB defines coverage as the percentage to which a specified coverage item is exercised by a test suite — and that item can be code, a requirement, or a risk (ISTQB Glossary, n.d.). Coverage is therefore an umbrella term. Its concrete meaning depends entirely on what you measure.
Code coverage answers a narrow question: *was this line of code executed during the test run?* The ISTQB describes it as an analysis method that determines which parts of the software the test suite executed — at the statement, decision, or condition level (ISTQB Glossary, n.d.). It says nothing about whether the result was checked.
Requirements coverage asks something entirely different: *is this requirement tested by at least one test at all?* High code coverage can coexist with patchy requirements coverage. You might run every line, yet no test verifies your most important business rule.
In practice, the misunderstanding nearly always runs the same way: teams treat code coverage as a proxy for test quality. It only measures execution reach. Three projects with identical 85% can be tested wildly differently.
Per the ISTQB, code coverage measures which parts of the code the tests executed (ISTQB Glossary, n.d.). Requirements coverage instead checks whether every requirement is tested. The two complement each other — neither replaces the other.
For cleaner test cases, see our post on the unit test.
Which code coverage criteria exist?

Figure 2: The five coverage criteria from weak (C0) to strict (Path).
Coverage criteria differ in how deeply they penetrate the code — from simple line execution to checking every logical condition. 100% branch coverage automatically implies 100% statement coverage, but not the reverse (ISTQB Glossary, n.d.). The table below positions the five common criteria.
Criterion (EN) | Criterion (DE) | Code | What it checks |
|---|---|---|---|
Statement Coverage | Anweisungsüberdeckung | C0 | Every statement executed at least once |
Branch / Decision Coverage | Zweig-/Entscheidungsüberdeckung | C1 | Every branch outcome (true/false) taken |
Condition Coverage | Bedingungsüberdeckung | – | Each sub-condition evaluates true and false |
MC/DC | Modifizierte Bedingungs-/Entscheidungsüberdeckung | – | Each condition independently affects the outcome (~n+1 tests) |
Path Coverage | Pfadüberdeckung | – | Every possible execution path taken |
Statement coverage is the weakest measure. A condition like `if (a && b)` counts as covered the moment a single value triggers it. Branch coverage demands that each branch outcome is taken at least once — noticeably stricter.
MC/DC deserves special attention. It requires that each individual condition can affect the overall outcome independently of the others. Verifysoft describes MC/DC as the criterion demanded for decision-heavy, safety-critical code (Verifysoft, n.d.). For n conditions, roughly n+1 tests usually suffice — efficient yet highly informative.
100% decision coverage (C1) subsumes 100% statement coverage (C0), but not the reverse (ISTQB Glossary, n.d.). Measuring branch coverage covers statement coverage automatically — one reason teams rarely report C0 in isolation.
Why is 100% coverage no proof of quality?
100% code coverage proves only that every line ran — not that it was tested properly. Martin Fowler finds coverage useful for spotting untested code but "of little use as a numeric statement of how good your tests are" (martinfowler.com, 2012). A metric that checks no behavior tells you little about real quality.
This is a textbook case of Goodhart's law: once a measure becomes a target, it stops being a good measure. Fowler puts it plainly: make a certain coverage level a target and people will try to attain it — and "high coverage numbers are too easy to reach with low quality testing" (martinfowler.com, 2012).
We have seen test suites hit 95% without a single assertion. The tests called methods, swallowed exceptions, and never verified a result. The number gleamed on the dashboard. The protection was effectively zero.
The lesson is not to ignore coverage but to read it correctly. Low coverage is a reliable warning: this code is untested. High coverage, by contrast, is no proof — it is necessary, not sufficient. Our post on regression testing shows how this plays out in ongoing operations.
Which coverage percentage actually makes sense?

Figure 4: Coverage guideposts – 60% acceptable, 75% commendable, 90% exemplary (Google, 2020).
There is no universally correct number, but there are established guideposts. Google calls 60% acceptable, 75% commendable, and 90% exemplary — and deliberately mandates no org-wide quota because that is a "business decision" (Google Testing Blog, 2020). The threshold depends on the code's risk, not on a rigid ideal.
Martin Fowler lands in the same range. He expects well-tested code to sit "in the upper 80s or 90s" and is explicitly suspicious of anything near 100% (martinfowler.com, 2012). The jump from 90% to 100% often costs more than it returns.
A differentiated approach works best:
Prioritize by risk: critical business logic deserves higher coverage than trivial getters.
Trend over absolute value: falling coverage on new code is more alarming than a stable total.
Set coverage gates moderately: prevent regressions instead of forcing arbitrary 100%.
Check assertion quality: a line without a check is executed, not tested.
Google states it soberly: coverage "does not guarantee the covered lines have been tested correctly, just that they have been executed" (Google Testing Blog, 2020).
Google recommends 60% as acceptable, 75% as commendable, 90% as exemplary, and deliberately avoids a fixed mandate (Google Testing Blog, 2020). The right threshold is risk-based, not universal.
Why is mutation testing a stronger signal?

Figure 3: 73% of real faults are coupled to mutants – mutation testing beats code coverage (Just et al., FSE 2014).
Mutation testing checks whether your tests would actually catch faults — a far harder bar than mere execution. A study by Just et al. found that mutant detection correlates more strongly with real-fault detection than statement or branch coverage; 73% of real faults were coupled to at least one mutant (Just et al., FSE 2014, 2014).
The idea is simple. A tool changes the code minimally — `>` becomes `>=`, `+` becomes `-`. These mutants should turn your tests red. If a mutant survives undetected, an assertion is missing. Mutation testing thus measures effectiveness, not just reach.
The approach is practical. Google rolled out mutation testing to roughly 6,000 engineers inside its code review process (Google Research, 2018). Rather than mutating the whole codebase, the system surfaces surviving mutants in changed lines — exactly where reviewers can judge them.
We see mutation testing as the logical answer to the Goodhart problem. Coverage can be faked by executing code without checking it. A surviving mutant cannot be faked away — it proves that a real behavior change went unnoticed. More in our guide to AI test automation.
Where do regulated industries mandate specific coverage?
In safety-critical domains, coverage is not a recommendation but a requirement — with MC/DC as the gold standard. For the most critical aviation software, DO-178C Level A demonstrably requires MC/DC (LDRA, n.d.). This is not about dashboards but about certification and human lives.
The automotive world is similar. For ASIL D, the highest safety level under ISO 26262, MC/DC is highly recommended (Parasoft, n.d.). The reason is precisely the weakness of weaker criteria: statement coverage can wave complex conditions through without ever fully checking their logic.
These requirements show that the choice of criterion depends on risk. A marketing website needs no MC/DC. A brake controller does. For most business applications, the sensible range sits between solid branch coverage and targeted mutation testing for critical paths. For a structured starting point, see our test plan post.
DO-178C Level A requires MC/DC for the most critical avionics software (LDRA, n.d.); ISO 26262 highly recommends MC/DC for the highest automotive level, ASIL D (Parasoft, n.d.). The criterion follows the risk.
Frequently asked questions about code coverage and test coverage
Is 100% code coverage worth pursuing?
Rarely. Martin Fowler is suspicious of anything near 100% and expects good code to sit in the upper 80s or 90s (martinfowler.com, 2012). The final stretch usually costs more effort than it delivers in real protection. Spend that time on better assertions instead.
What is the difference between statement and branch coverage?
Statement coverage (C0) checks whether each statement ran. Branch coverage (C1) requires every branch outcome to be taken. 100% branch coverage automatically subsumes 100% statement coverage, but not the reverse (ISTQB Glossary, n.d.). Branch is therefore the more informative default metric.
Does mutation testing replace code coverage?
No, it complements it. Coverage finds untested code; mutation testing checks whether existing tests catch faults. Mutant detection correlates more strongly with real-fault detection than coverage metrics do (Just et al., FSE 2014, 2014). Together they paint a fuller picture of test quality.
What coverage percentage should my team target?
There is no mandatory number. Google calls 75% solid and 90% exemplary but stresses the threshold is a business decision (Google Testing Blog, 2020). Prioritize by risk: high coverage for critical logic, less for trivial paths. More in our test plan post.
Why can high coverage be misleading?
Because execution is not verification. A test can run a line without checking its result through an assertion. Coverage "does not guarantee the covered lines have been tested correctly, just that they have been executed" (Google Testing Blog, 2020). This is exactly where false confidence and missed bugs are born.
Conclusion
Code coverage is a useful tool with a narrow brief: it finds untested code. It does not prove your tests are good. Treat it as a warning light, not a victory medal. The credible guideposts — Google calls 75% solid, Fowler expects 80-90% — are orientation, not an end in themselves. Anyone serious about measuring quality pairs coverage with requirements coverage and, where it counts, with mutation testing. In regulated industries, the recommendation becomes an MC/DC obligation. The common thread: measure what verifies behavior, not just what runs.
Want to make your test coverage more meaningful with AI-powered automation? Take a look at the AI Recorder or talk to us.


