·
8 min
Unit Testing: Fundamentals, Examples and Best Practices

Roman Kirchmeier - Autemos

The unit test is the foundation of any solid test strategy—and the most frequently written test type of all. According to JetBrains, 95% of developers now test their code, up from 85% the year before (JetBrains Developer Ecosystem, 2024). Yet unit tests are written every day that check nothing, mock too much, or break on every refactor. This guide shows what makes a good unit test: the FIRST principles, the AAA pattern, the disciplined use of test doubles, and the common frameworks side by side. We also explain honestly what unit tests cannot do and how reliable AI-generated tests really are today.
In short: A unit test checks an individually testable component in isolation—in ISTQB terms, a component test (ISTQB, 2024). Good unit tests follow the FIRST principles and the AAA pattern. They do not catch integration or configuration errors, though. AI helps you draft tests but does not replace the review.

Figure 1: A unit test checks one unit in isolation – no DB, service or UI.
What is a unit test?
A unit test checks an individually testable component in isolation. ISTQB calls this level "component testing" and lists unit test and module test as synonyms (ISTQB Glossary, 2024). The "unit" is the smallest meaningfully testable building block—a function, a method, or a class.
Isolation is the core idea. A unit test does not load the database, call an external service, or boot a UI. It checks only the logic of the unit itself. That is what makes it fast, deterministic, and precisely locatable: when it fails, you know immediately which building block broke.
Why this discipline? Because reliability grows from the bottom. A component test is the lowest layer of the test pyramid and gives you second-by-second feedback while you code.
Our overview of software testing fundamentals provides the bigger picture.
What makes a good unit test?
Good unit tests follow the FIRST principles: Fast, Independent, Repeatable, Self-validating, Timely. Robert C. Martin coined them in *Clean Code* (Clean Code, 2008, Chapter 9). The book is a 2008 classic—the principles still hold, even if the exact terms vary today.
Here are the five properties in brief:
Fast – tests run in milliseconds so you run them constantly.
Independent (sometimes "Isolated") – no test depends on another or on execution order.
Repeatable – same result on every machine, with no external dependency.
Self-validating (sometimes "Self-verifying") – the test decides pass or fail itself, with no manual inspection.
Timely – ideally written close to the code, and in TDD even before it.
In client projects we see the most common break at "Independent." Tests that quietly share a database state pass on their own and fail in the suite. Take isolation seriously and you skip the flaky-test hunt.
Our article on test coverage digs into how much of this you can measurably secure.
How does the AAA pattern work?

Figure 2: The AAA pattern splits every test into Arrange, Act and Assert.
The AAA pattern (Arrange-Act-Assert) splits every test into three phases and makes it readable at a glance. Bill Wake named the pattern in 2001 (XP123, 2001). The structure is now the de facto standard for readable unit tests—framework-independent.
The three phases follow a clear logic:
1. Arrange – set up preconditions: create objects, prepare inputs and test doubles.
2. Act – run the action under test once, usually a single method call.
3. Assert – check the result against the expectation.
A pytest example makes it concrete – a test for an addition function:
Arrange: create the object under test, for example calculator = Calculator().
Act: run the action once, for example result = calculator.add(2, 3).
Assert: check the result against the expectation, for example assert result == 5.
A proven rule: one logical assert per test. If a test checks five things at once, a red run will not tell you which one broke. Small, focused tests are the opposite of maintenance debt.
What are mocks, stubs and test doubles?

Figure 3: The five test doubles per Meszaros and what each one checks.
Test doubles are placeholders that replace real dependencies in a test so a unit truly runs in isolation. Gerard Meszaros systematized the taxonomy in *xUnit Test Patterns* (xUnitPatterns, 2007). The work is from 2007, but the terms remain the reference frame today.
Meszaros distinguishes five types with clearly separated jobs:
Test double | Purpose | Checks |
|---|---|---|
Dummy | only fills a parameter, never used | nothing |
Fake | lightweight working version (e.g. in-memory DB) | indirectly via state |
Stub | returns canned values | state |
Spy | a stub that records calls | state + calls (after the fact) |
Mock | has expectations, fails on an unexpected call | interaction (behavior) |
The key distinction: a stub only returns answers, while a mock also sets expectations and fails on an unexpected call (xUnitPatterns, 2007). Stubs verify state, mocks verify behavior. Our practical advice: mock only real outer boundaries, not every internal class—over-mocking makes tests brittle.
Which unit testing frameworks exist?
Every major language ships an established framework—the choice almost always follows the ecosystem, not taste. Unit testing is, per JetBrains, the most-used test type of all (JetBrains Developer Ecosystem, 2024). The tools differ in syntax and style but share the same concepts: assertions, fixtures, test runners.
Framework | Language | Style trait |
|---|---|---|
JUnit (5) | Java | annotations, de facto standard on the JVM |
pytest | Python | plain `assert`, fixtures, low boilerplate |
NUnit | C#/.NET | attribute-based, expressive constraints |
xUnit | C#/.NET | minimalist, one object per test run |
Jest | JavaScript/TS | built-in mocking, snapshots, fast |
Pick the framework of your ecosystem, not the one with the most features. A Java team takes JUnit, a Python team takes pytest—team consistency beats any single feature. The FIRST principles and the AAA pattern apply identically in all of them.
What do unit tests not cover?
Unit tests check a unit in isolation—and that is exactly why they miss everything that only emerges in interplay. They do not catch integration or contract mismatches between components, wiring and configuration errors, race conditions, performance problems, or environment-specific bugs. That is precisely why the test pyramid adds service and UI layers (Martin Fowler, n.d.).
This is not a weakness but a division of labor. A module can be perfect in isolation and still fail to talk to its neighbor—wrong data format, unexpected status code, broken contract. Such relationship errors exist only in combination.
The consequence: build on many unit tests, but do not rely on them alone. Integration tests check the interfaces; end-to-end tests check the real business flows.
Our article on the test pyramid shows how these layers fit together. And the integration testing guide covers the interface layer in depth.
How reliable are AI-generated unit tests?

Figure 4: AI-generated unit tests – only 45% pass, review stays mandatory (TU Delft, 2024).
AI writes unit tests impressively fast—but not impressively reliably. A study from TU Delft found that within an existing suite only about 45% of the Python tests generated by GitHub Copilot passed, while roughly 55% were broken, empty, or failing. Without an existing suite, about 92% failed (TU Delft, AST 2024, 2024).
The typical weaknesses are nameable. AI tests sometimes contain no assertion at all—they run code but check nothing. Others over-mock so heavily that they test only the mock setup, not the logic. Both look green and prove nothing.
Our read: AI is a good draft generator, not a replacement for review. Let it propose tests, but check every assertion and every mock by hand. The human decides what "correct" means—the AI only guesses plausibly. At higher layers the bigger AI lever is stability anyway: self-healing locators cut brittle tests where unit tests no longer reach.
Our guide to AI test automation provides the wider frame here.
Frequently asked questions
What is the difference between a unit test and an integration test?
A unit test checks a single component in isolation; an integration test checks the interplay of several components and their interfaces. Unit tests are faster and more precisely locatable but miss interface errors. Both belong in the test pyramid (Martin Fowler, n.d.). Details in the integration testing guide.
How many unit tests do I need per function?
There is no fixed number, but a rule of thumb: one test per behavior case, not per line. Cover the happy path, relevant edge cases, and error paths. One logical assert per test keeps the suite readable. How meaningful your coverage is gets covered in our test coverage article.
Stub or mock—which do I use when?
Use a stub when you only need a canned return value and want to verify state. Use a mock when you want to verify whether and how a dependency was called (xUnitPatterns, 2007). Stub verifies state, mock verifies interaction. Do not over-mock—it makes tests brittle.
Is test-driven development worth it?
A study by Nagappan et al. found 40–90% lower defect density on TDD teams, though with 15–35% longer development time (Empirical Software Engineering, 2008). The study is from 2008 and team-dependent—the direction is solid, the exact figures are not universal. TDD pays off especially for critical logic.
Can AI write my unit tests completely?
No, not without review. In a TU Delft study, about 55% of Copilot tests within an existing suite were broken or empty, and about 92% without a suite (TU Delft, AST 2024, 2024). AI delivers fast drafts, but you must check every assertion and every mock.
Conclusion
A good unit test is fast, independent, and checks exactly one thing—structured by the FIRST principles and the AAA pattern. Test doubles keep the unit isolated, as long as you mock only real outer boundaries and not every internal class. The framework follows the ecosystem, not the fashion. With 95% of developers testing, the question is no longer whether but how well (JetBrains Developer Ecosystem, 2024). Stay honest about the limits: unit tests miss integration and configuration errors, and AI-generated tests need human review. Build unit tests cleanly and complement them with higher layers, and you gain reliable feedback in seconds. Talk to our team about automating your tests reliably.


