Cut Test Maintenance With AI: Less Upkeep

Jun 15, 2026

8 min

Cutting Test Maintenance With AI: How to Reduce the Burden

Roman Kirchmeier - Autemos

Average test automation coverage sits at just 33 percent, and only 8 percent of organizations have a fully established automation strategy (Capgemini World Quality Report, 2025). One reason stands out: maintenance. Every UI change, every renamed button breaks tests, and someone has to fix them. That's exactly where AI comes in. Self-healing mechanisms and AI-assisted test upkeep promise to defuse automation's biggest cost driver. This article shows how heavy the maintenance load really is, why tests break, and what AI can realistically deliver.

TL;DR: Test maintenance stalls automation, and average coverage is stuck at 33% (Capgemini, 2025). AI and self-healing repair UI-driven breaks automatically and can absorb much of the UI maintenance. But individual vendor claims aren't comparable; 70–85% is the realistic range.

Infographic on the state of test automation: 33% average coverage, 8% mature strategy, 64% legacy barrier, 57% no-strategy barrier.

Figure 1: The state of test automation in numbers – coverage, maturity and barriers.

How big is the test maintenance burden really?

The burden is big enough to stall automation outright: average coverage is just 33 percent, roughly half of organizations are still planning, and only 8 percent have a fully established strategy (Capgemini World Quality Report, 2025). Writing tests is easy. Keeping them green is the real work.

So what holds teams back? The year before, 64 percent of respondents named legacy systems as a key barrier, and 57 percent cited the lack of an automation strategy (Capgemini World Quality Report, 2024). Both hurdles share one root cause: tests that constantly break eventually stop paying off.

You may have heard that QA teams spend 50 to 80 percent of their time on maintenance. Practitioners commonly estimate this, but the range isn't well sourced; it mostly comes from vendor blogs. The 33 percent coverage figure is more reliable. When two-thirds of an application is tested manually, capacity is missing because maintenance eats it up.

Across our pilots with DACH teams, the recurring bottleneck wasn't creating tests but repairing them after releases, often several hours per sprint just for locator fixes.

It helps to separate two things that often get lumped together. Maintenance means repairing existing, broken tests. Upkeep means adapting tests to deliberately changed requirements. AI mainly helps with the former; the functional upkeep stays a human job. Blur the two and you'll quickly overestimate what automation alone can deliver.

Why do automated tests break so often?

Comparison table of maintenance versus upkeep: AI fixes broken locators and UI breaks, while humans handle changed requirements and test logic.

Figure 3: Maintenance vs. upkeep – what AI handles and what stays the team's job.

Automated tests break mostly because they're tied to fragile locators: when a DOM element changes, the test fails even though the feature still works. Flaky tests add to this. At Google, roughly 1.5 percent of all test runs were flaky, and about 16 percent of 4.2 million tests showed at least intermittent instability (Micco/Google, as far back as 2017).

Three causes dominate in practice:

UI and DOM changes: Renamed IDs, moved elements, or changed class names trigger most breaks.
Timing and asynchronicity: Tests don't wait long enough for loads and fail unpredictably.
Test data and environment: Shifted data states or configurations produce false alarms.

The cost of this instability is measurable. At Atlassian, flaky tests cause around 21 percent of Jira frontend build failures, and reruns in the Jira backend alone waste over 150,000 developer hours per year (Atlassian Engineering, 2025). What happens to that lost time? It's missing for new tests, so coverage stagnates.

If you want to understand why locators are the core problem, we cover it in our piece on self-healing locators and flaky tests.

How does AI reduce the maintenance effort?

Process diagram of self-healing in two steps: locator breaks, AI proposes a repair, a human confirms, and the heal is logged.

Figure 2: Self-healing as a traceable two-step flow with approval and logging.

AI reduces maintenance mainly through self-healing: when a locator breaks, the system uses multiple attributes to automatically find the right element and repairs the test at runtime. Vendors report substantial reductions, but these numbers aren't directly comparable because they rest on different tools and applications.

What can self-healing mechanisms do?

Self-healing addresses the most common break cause: UI changes. Instead of relying on a single rigid locator, the AI weighs several identifying attributes and picks the most likely target when something changes.

The vendor numbers warrant skepticism. Mabl cites "up to 95%" less maintenance, Functionize "85% less maintenance," Virtuoso/DXC "83%" (Functionize; Virtuoso QA). Squeezing these into a single metric would mislead, because they measure different things. A more honest, practical framing: locator self-healing typically addresses around 70 to 85 percent of UI-change failures; the rest involves data, timing, and architecture (Virtuoso QA).

Is self-healing alone enough?

No. Self-healing repairs UI breaks but not problems with test data, timing, or faulty test logic. Anyone hoping for a 100 percent solution will be disappointed. The AI takes over routine work; responsibility for correct tests stays with the team.

Transparency is decisive here. A repair nobody can trace is worthless in regulated industries. When every heal is logged and approved, you can see what changed and why. To see how traceable, non-blackbox healing works, look at our self-healing feature.

In practice, a two-step model works well: the AI proposes a repair, and a human confirms it on the next run. Effort drops noticeably without false heals slipping unnoticed into the suite. Banks and insurers in particular only accept AI maintenance once that approval step is documented.

Is AI test maintenance worth it economically?

Economically, the move pays off mainly because saved maintenance time flows straight into coverage, and because the market points in a clear direction. The AI-enabled testing market is projected to reach around $1.63 billion by 2030, at roughly 18.4 percent annual growth (Grand View Research, 2024); other analysts cite figures between $1.4 and $2.04 billion.

ROI is often justified with the rule of thumb that a bug costs 100 times more in production than in design, attributed to the "IBM Systems Sciences Institute." Honesty is warranted here. This "100x" figure traces back to internal training material from 1981, not a study (The Register, 2021). So the specific number is folklore. The direction still holds: bugs found late cost more, which is supported separately by sources like NIST and Capers Jones.

That leads to the real case for shift-left. The earlier and more stably you test, the cheaper fixes become, not by a factor of 100, but substantially. AI helps twice over: it keeps early tests alive and reduces the friction that otherwise keeps teams from testing early.

Why does this matter especially in the DACH region?

Infographic on the DACH leverage: 109,000 missing IT specialists, classic tools in only 50% of projects, 10.8% dedicated test managers.

Figure 4: Why the DACH region has the biggest leverage for AI test maintenance.

In the DACH region the leverage is especially large because skilled staff is scarce. Germany recently lacked around 109,000 IT specialists, with over 137,000 open IT roles (Bitkom, 2025). Teams already short on people can't pour them into repetitive test maintenance.

There's also a structural shift in test management. Classic test management tools are now used in only about 50 percent of DACH projects, and the share of dedicated test managers has fallen to 10.8 percent, down from roughly 28 percent a decade ago (Software Testing Survey 2024 via mgm-tp, 2025).

In conversations with Swiss and German QA leads, we hear the same tune: the will to automate more isn't missing, the capacity to maintain what exists is. AI closes exactly that gap, not as hype but out of necessity. For the broader picture, see our guide to AI test automation.

Test automation: the current state

Metric	Value
Average test automation coverage	33%
Organizations with a mature automation strategy	8%
Barrier: legacy systems	64%
Barrier: missing strategy	57%

Frequently Asked Questions

How much test maintenance can AI realistically save?

Realistically, locator self-healing addresses around 70 to 85 percent of UI-change failures (Virtuoso QA). Vendors advertise up to 95 percent, but those numbers aren't comparable. Data, timing, and architecture problems remain human tasks.

Why do automated tests break in the first place?

They mostly break because of UI and DOM changes that invalidate fragile locators. Flaky tests add to this: at Google, about 16 percent of 4.2 million tests showed instability (Micco/Google, 2017). Timing and test data also cause false alarms.

Is the rule that a production bug costs 100x more true?

The specific "100x" figure isn't proven; it comes from IBM training material from 1981, not a study (The Register, 2021). The direction holds, though: bugs found late cost more. Testing early pays off, just not exactly by a factor of 100.

Does self-healing make test upkeep entirely obsolete?

No. Self-healing repairs UI breaks automatically but doesn't handle faulty test logic, data problems, or architectural changes. In regulated industries, traceability is also decisive: every repair should be logged and approved, not run as a blackbox.

Conclusion

Test maintenance is automation's quiet cost factor. As long as average coverage stays stuck at 33 percent and legacy systems rank as the biggest barrier (Capgemini, 2025), more scripting won't solve the problem. AI and self-healing target the most common break cause and give teams capacity back, realistically across 70 to 85 percent of UI breaks, not the advertised 95.

In the DACH region, where 109,000 IT specialists are missing, this isn't a gimmick but a question of feasibility. One thing stays essential: repairs must be traceable, not a blackbox promise. If you'd like to see how logged self-healing performs in your environment, book a demo.

More Blogs for You

Audit-ready test automation inside a regulated Swiss bank

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

Jun 16, 2026

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

Jun 16, 2026

AI Test Automation: The Complete Guide for 2026

May 22, 2026

AI Test Automation: The Complete Guide for 2026

May 22, 2026

What Is AI Testing? Definition, Types, and Honest Limits

Jun 11, 2026

What Is AI Testing? Definition, Types, and Honest Limits

Jun 11, 2026

Cutting Test Maintenance With AI: How to Reduce the Burden

How big is the test maintenance burden really?

Why do automated tests break so often?

How does AI reduce the maintenance effort?

What can self-healing mechanisms do?

Is self-healing alone enough?

Is AI test maintenance worth it economically?

Why does this matter especially in the DACH region?

Test automation: the current state

Frequently Asked Questions

How much test maintenance can AI realistically save?

Why do automated tests break in the first place?

Is the rule that a production bug costs 100x more true?

Does self-healing make test upkeep entirely obsolete?

Conclusion

More Blogs for You

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

Test Automation in Regulated Banking: Staying Audit-Ready Under DORA and FINMA

AI Test Automation: The Complete Guide for 2026

AI Test Automation: The Complete Guide for 2026

What Is AI Testing? Definition, Types, and Honest Limits

What Is AI Testing? Definition, Types, and Honest Limits

Experience Autemos. In just 30 minutes.

Experience Autemos.
In just 30 minutes.

Experience Autemos.
In just 30 minutes.