Assumptions of the Test Pyramid

This may be a heresy to some… While the Test Automation Pyramid as a model may be right in many contexts, – but the model will be similarly wrong in other test automation contexts.

First let’s look at one of the assumptions of the Test Automation Pyramid:

Martin Fowler, 2012

Fowlers assumption (2012) is that UI automation is slow and expensive. Similarly Cohen (2009) writes that testing in the UI is “brittle, expensive, and time consuming“. Recently (2019) there have been developed at least two types of tools that break those assumptions – and make it relatively faster and cheaper to have automated GUI tests than before.

Example 1: Tools like Applitools Eyes let’s you do prepare test automation code that compare images of the UI. Angie Jones has an excellent code example of how to compare PDF files.

Example 2: Robot Desktop Automation tools gives the possibility of automating and autotomize end user business processes. These kind of tools can be used to write, maintain and schedule end user activities.

I have performed an analysis that shows that using RDA for test automation has similar costs and speed as with using Selenium for test automation … but then not all projects are web projects.

Still, the underlying assumption of both Applitools and the pyramid above is that the system under test consists of accessible code on the service and unit layers.

UI testing may be all there is!

In the context of Software-as-a-Service, standard commercially packaged applications and solutions – the business still want to test the system they are starting to use, but they have no access to the code. While they must reasonably expect the vendor to have tested the solution, the business implementing the IT package would want to test it in their setting using their own people.

As testing professionals we can help the business both not to request the kitchen sink, while also test all the things (that matter). As with all other testing – even the dreaded UAT – some of it is simple repeatable tasks (checks) while others are more subtle experiments (tests).

Perhaps we can estimate a ratio between the checks and the tests? Perhaps that ratio has more checks..? That would depend on what the business would like to know (what is their perception of quality) and how well the domain is codified (Genesis / Commodity).

There is a discussion and collection of alternative pyramids on “The Club”.

14 thoughts on “Assumptions of the Test Pyramid

  1. To propose a ratio between checks and tests would be, to me, like proposing a ratio between violin bow strokes and musical performances. That is, it wouldn’t make sense. The problem is one of reification; turning something into a thing inappropriately.

    A test is not an artifact, not a widget, not a unit of production, because so many important parts of a test cannot be encoded. A test includes a bunch of tacit knowledge and motivations and activity and context that can’t be written down.

    A check can be encoded (and therefore checks can be counted in some sense; as lines in a spreadsheet, or as assertions, or as function calls). But even to do that would be a mistake, since a check also has tons of context around it. The conditions under which a check is performed can vary enormously, and the relevance of those conditions change from one check to the next.

    If we want to talk about quality, let’s reduce our emphasis on number and increase our emphasis on story. What risks are we examining? What have we done to examine them? What experiments have we performed? What happened? What have we learned? What are we still worried about? To ask and answer those questions would represent real service to our clients.


    • Michael, The above blog post was not about the details of checking & testing. There are other posts on that topic. And similarly other post on the topic of asking the stakeholders open questions about risk and quality.

      When I mention counting tests/checks, I do know both spaces are infinite – but I must also work with the stakeholders to figure out which ones are the ones that matter. And that will always be a fixed set. In the context of the automation, it can be relevant to consider the ratio between the tedious task and the intricate parts. Some projects will have this ratio skewed to one side, other projects skewed to the other, thus breaking the assumption of the pyramid model.

      That being said, the Test Automation Pyramid have the underlying assumption too that the performance of a test is countable on each layer. That is an reduction/abstraction(?) we have to make to fit the theory to finite projects.


  2. Hi, Jesper…

    OK; I’m confused, and it seems I need your help in understanding something. The fancy word that social scientists use for the problem I’m having is “operationalization” — the ways in which we distinguish and define and categorize things so that we can quantify them and compare them.

    When you’re talking about ratios, it seems to me you’re talking about some kind of relationship between two things. That’s presumably a quantitative relationship, and you’re saying that the relationship involves counting test and checks. Can you help me understand how you count them or characterize them such that your client can be clear on what you mean by the ratio between one and another?

    Similarly, when you’re talking about “the ratio between the tedious task and the intricate parts”, can you help me understand how do you express that ratio? In units of what compared to what?


    —Michael B.


  3. Whenever we place things on a scale or attempt to depict a ratio, we all know that we are now measuring, and we all know that measuring a thing inevitably impacts that thing. I think it’s a deeper understanding when we explain that process to another human.

    I like Michaels’ deeper delve, but for anyone who has only been testing professionally for a single product, or for a single digit number of years, awareness of keeping this balance will be a new thing still. For example the scientific community judge their output based on number of journal published and thus peer reviewed articles, but fewer scientists will look at whether a scientist published a finding over a series of 4 papers versus all in one long paper to thus not boost a metric. I hope most testers stop playing this game after their first job promotion. Clients (in non regulated environments) are probably disinterested in the number of checks, or the triangle even, which does help.

    I am more interested in the reason the triangle inverted itself. I found myself about 5 years ago starting to do this inversion, but was always unclear what caused the inversion in the ecosystem. I just knew from my day-to-day, that defects rates per day I worked, that I had to invert it, to speed up regression-defect detection as well as get faster triage.


    • Thanks Conrad!
      bottom line for this post: the test pyramid may be wrong. in some contexts automating the iu is all there is.
      What to automate is another discussion – see “a ratio” link.

      all models are wrong, some are useful – in some contexts.:)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.