Test Pyramid: Layers, Anti-Patterns & Strategy

Jun 4, 2026

8 min

The Test Pyramid: How to Build a Test Strategy Right

Roman Kirchmeier - Autemos

Two test architects discussing a test pyramid diagram on a glass wall

Many teams automate at the wrong end. They build dozens of slow UI tests, skimp on fast unit tests, then wonder why their pipelines turn brittle and sluggish. The test pyramid offers the counter-model: a simple picture of how automated tests should be distributed. Many small, fast tests at the bottom, few broad tests at the top. The model is over fifteen years old and holds up despite the AI hype. This guide walks you through its origin, the three layers, the widespread ice-cream-cone anti-pattern, honest alternatives like the Testing Trophy — and what actually changes in the AI era.

TL;DR: The test pyramid organizes automated tests by granularity: many fast unit tests at the base, some integration tests in the middle, few end-to-end tests at the top. Martin Fowler calls it "a way of thinking about how different kinds of automated tests should be used" (Fowler, 2012). UI tests are expensive and brittle — hence few at the top.

Test pyramid with three layers: many unit tests at the wide base, some service/integration tests in the middle, few UI/E2E tests at the narrow tip.

Figure 1: The test pyramid – many fast tests at the bottom, few slow ones at the top.

What Is the Test Pyramid?

The test pyramid is a way of thinking about how automated tests spread across levels of granularity. Martin Fowler sums it up as "a way of thinking about how different kinds of automated tests should be used," stressing far more low-level unit tests than broad, stack-spanning tests (Fowler, 2012). The shape is the message.

The logic behind it is economics, not dogma. Tests at the base are fast, cheap, and stable — you can run thousands of them on every commit. Tests at the top check the whole system through the interface, but they run slowly and break easily. So keep them few and deliberate.

Why use a picture at all? Because test strategy otherwise becomes a gut call. The pyramid gives teams a shared language: does this new test belong at the bottom, the middle, or the top? That single question prevents the most costly misallocations.

We cover the bigger picture in our overview of software testing fundamentals.

Where Does the Model Come From? Mike Cohn and Martin Fowler

The concept traces back to Mike Cohn, who coined it in 2009 in *Succeeding with Agile* as the "Test Automation Pyramid"; Martin Fowler then popularized and sharpened it (Fowler, 2012). Cohn's original named three levels: Unit, Service, and UI — bottom to top.

Cohn's core concern was a corrective. In the early 2010s, agile teams preferred to automate through the interface because it felt intuitive. That is exactly what the pyramid warns against: chasing stability mainly through UI tests builds an expensive, fragile suite.

"Write lots of small and fast unit tests. Write some more coarse-grained tests and very few high-level tests." — Ham Vocke (Fowler, 2018)

Ham Vocke's widely cited "Practical Test Pyramid" updated the model in 2018 for microservices and modern CI/CD pipelines. The layer names have varied since — Service is now often called Integration — but the core message stayed the same: lots at the bottom, few at the top.

What Are the Three Layers of the Test Pyramid?

Comparison table of the three pyramid layers by speed and cost: Unit (milliseconds, cheap), Integration (seconds, medium), UI/E2E (minutes, expensive and brittle).

Figure 2: Three layers compared – the higher the layer, the slower and more expensive.

The classic pyramid has three layers that differ clearly in scope, speed, and stability (Fowler, 2018). Bottom to top: unit tests form the wide base, service or integration tests the middle, UI and end-to-end tests the narrow tip. The higher you go, the fewer you keep.

Layer	Scope	Speed	Cost / Stability
Unit (base)	single function, class, method	milliseconds, thousands per run	cheap, very stable
Service / Integration (middle)	modules, APIs, DB working together	seconds, a few hundred	medium, moderately stable
UI / E2E (tip)	full stack through the interface	minutes, a few dozen	expensive, brittle

Why more at the bottom? Because feedback speed drives productivity. A failing unit test pinpoints the cause in milliseconds. A failing E2E test only says "something is broken" — and then you go searching.

Each layer complements the others without replacing them. Unit tests check logic in isolation. Integration tests check that modules talk to each other. E2E tests check that critical business flows hold through the whole system — sparingly and deliberately.

For what a good unit test looks like in practice, read our unit test guide. For the middle, we go deeper in the integration test guide; for the tip, in the end-to-end test article.

Why Are UI and E2E Tests So Scarce at the Top?

E2E tests belong at the top because, per Fowler, they are "brittle, expensive to write, and time consuming to run" (Fowler, 2012). They carry real value — they check the system from the user's view — but each one costs a lot and breaks easily. So keep them few, and keep the most important ones.

Google sharpened the argument in 2015: end-to-end tests are slow, flaky, and hard to debug, since one red run can have dozens of possible causes (Google Testing Blog, 2015). Flakiness is the trickiest problem here: tests that pass sometimes and fail sometimes erode trust in the entire gate.

One honest clarification, because it gets misquoted constantly: the popular 70/20/10 split is community lore, not a Google quote. The Google post names no fixed percentages — it only says to "retain that pyramid shape" (Google Testing Blog, 2015). Use concrete numbers as rough orientation, not as law.

For keeping those brittle surface tests stable, see our regression testing article.

What Is the Ice-Cream-Cone Anti-Pattern?

Side-by-side of the upright test pyramid (green check, wide base, right) and the inverted ice-cream cone (red cross, top-heavy, anti-pattern).

Figure 3: Pyramid vs. ice-cream cone – the most common test-strategy anti-pattern.

The "software testing ice-cream cone" describes the inverted pyramid: wide at the top, narrow at the bottom. Alister Scott coined the term in 2012 for teams with many manual and automated UI tests but barely any unit tests (WatirMelon, 2012). The cone is the most common test-strategy failure pattern.

It usually creeps in. A team has little unit-test culture but wants to "test" — so it automates through the interface, because that is visible and intuitive. Every sprint, the UI tests grow and the base stays thin. The suite gets slower, more brittle, and more expensive.

In client projects we almost always find the cone where test automation started late and exclusively at the E2E level. The symptoms are unmistakable:

pipelines that run ten minutes or more before any feedback arrives
flaky tests that get re-run to green instead of being repaired
developers who ignore red builds because they no longer trust them
nobody dares to refactor, because the safety net is missing

The way out is not a big rewrite. Stop the tip from growing, and extend the base with every new feature. The shape slowly tips back.

Pyramid, Trophy, or Test Shapes — Which Model Is Right?

There is more than one picture, and the fight over them is often about definitions. Kent C. Dodds proposed the "Testing Trophy": "Write tests. Not too many. Mostly integration." — with a thicker integration belly instead of a wide unit base (Kent C. Dodds, 2021). For frontend and component work, that sounds compelling.

Martin Fowler framed the debate in 2021: the dispute between pyramid and trophy is largely a definitional one about what counts as "unit" versus "integration" (Fowler, 2021). Define "unit" narrowly and you need more integration tests; define it broadly and you land back at the pyramid.

Fowler's actual advice is therefore shape-agnostic: write tests that run quickly and reliably and fail only for useful reasons (Fowler, 2021). That is the common denominator across all models. The shape is a means; the goal is fast, dependable feedback.

For which test types belong in which layer, see our overview of testing types.

Does the Test Pyramid Still Hold in the AI Era?

Two columns on the test pyramid in the AI era: AI changes (generate tests, suggest gaps, self-healing) versus stays the same (shape of the pyramid, lots at bottom few at top, E2E stays slow).

Figure 4: The pyramid in the AI era – AI lowers the cost, not the physics.

Yes — and a peer-reviewed paper backs that up honestly. The "Test Pyramid 2.0" argues that AI improves *what* gets tested at each layer (generation, defect prediction) while the pyramid's *shape* — speed and volume ratio — stays the same (Frontiers in AI, 2025). AI shifts the how, not the what-goes-where.

This is the central, often misunderstood distinction. AI can generate unit tests, suggest gaps, and stabilize brittle surface tests via self-healing. What AI does not change: an E2E test stays slow and runs through the whole stack — whether a human or a model wrote it. Physics beats hype.

From that follows a sober expectation. AI lowers the *cost of writing* tests, not their *cost of running*. Inflate the tip with generated E2E tests and you build a faster ice-cream cone, not a better safety net. The sensible lever is to speed up the expensive base work and keep brittle surface tests low-maintenance.

This is exactly where Autemos comes in: self-healing locators keep the few but important UI and E2E tests at the tip stable when interfaces change — instead of letting them run red. The pyramid shape stays; the maintenance effort drops. More under test workflows.

Frequently Asked Questions

How many tests belong in each layer?

There is no universal number. The often-cited 70/20/10 rule is community lore, not a binding standard — Google's post names no percentages, only "retain that pyramid shape" (Google Testing Blog, 2015). Orient on the shape: lots of base, little tip, rather than on fixed quotas.

What is the difference between the test pyramid and the ice-cream cone?

The pyramid stands on a wide unit base with a narrow UI tip. The ice-cream cone is the exact reverse: many slow UI tests on top, barely any unit tests at the bottom (WatirMelon, 2012). The cone is an anti-pattern because it produces slow, brittle, and expensive test suites.

Is the Testing Trophy better than the test pyramid?

Not inherently — it depends on context. The trophy emphasizes integration tests and fits frontend components well (Kent C. Dodds, 2021). Fowler considers the dispute mostly definitional (Fowler, 2021). More important than the picture: fast, reliable tests that fail only for useful reasons.

Does AI make the test pyramid obsolete?

No. AI improves *what* gets tested at each layer but does not change the *shape* (Frontiers in AI, 2025). E2E tests stay slow and brittle regardless of who writes them. AI lowers writing and maintenance costs — the "lots at the bottom, few at the top" logic still holds.

Where do regression tests sit in the pyramid?

Regression testing is not its own layer but a purpose. Regression checks happen at every level — as unit, integration, or E2E tests (Fowler, 2018). In practice, most regression tests should run at the bottom so your regression suite stays fast. More in the regression testing article.

Conclusion

The test pyramid is over fifteen years old and surprisingly durable, because it describes not a tool but an economy: feedback must be fast, cheap, and reliable. Lots at the bottom, few at the top — that keeps pipelines fast and trust high. The ice-cream cone is the expensive flip side, and it almost always creeps in. Whether you say pyramid, trophy, or "test shapes" is secondary; Fowler's measure is what counts: tests that run quickly and fail only for useful reasons (Fowler, 2021). In the AI era the shape stays stable — AI lowers the cost, not the physics. Keep your tip lean and stable. Talk to our team about building your test strategy along the pyramid.

More Blogs for You

Audit-ready test automation inside a regulated Swiss bank

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

Jun 16, 2026

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

Jun 16, 2026

AI Test Automation: The Complete Guide for 2026

May 22, 2026

AI Test Automation: The Complete Guide for 2026

May 22, 2026

What Is AI Testing? Definition, Types, and Honest Limits

Jun 11, 2026

What Is AI Testing? Definition, Types, and Honest Limits

Jun 11, 2026

The Test Pyramid: How to Build a Test Strategy Right

What Is the Test Pyramid?

Where Does the Model Come From? Mike Cohn and Martin Fowler

What Are the Three Layers of the Test Pyramid?

Why Are UI and E2E Tests So Scarce at the Top?

What Is the Ice-Cream-Cone Anti-Pattern?

Pyramid, Trophy, or Test Shapes — Which Model Is Right?

Does the Test Pyramid Still Hold in the AI Era?

Frequently Asked Questions

How many tests belong in each layer?

What is the difference between the test pyramid and the ice-cream cone?

Is the Testing Trophy better than the test pyramid?

Does AI make the test pyramid obsolete?

Where do regression tests sit in the pyramid?

Conclusion

More Blogs for You

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

AI Test Automation: The Complete Guide for 2026

AI Test Automation: The Complete Guide for 2026

What Is AI Testing? Definition, Types, and Honest Limits

What Is AI Testing? Definition, Types, and Honest Limits

Experience Autemos. In just 30 minutes.

Experience Autemos.
In just 30 minutes.

Experience Autemos.
In just 30 minutes.