#265 – Using MTTR to Understand When to Test

It interests me deeply to explore why testing is happening. Often it’s because some decision-maker or framework dictates – “This is the Way“. And off we go on the quest to slay the dragon – or move items from point A to point B. Without much thinking about how the side quests help to move the main risks of the story.

The main risks are usually around something irreplaceable – and hence we test and try our best to shield it. But not all risks are equally dangerous. In IT we can build implicit testing into repeatable deliveries and reduce the time to fix things. The faster things are fixed, the better is time to information for the business needs.

Grogu agrees

Everything comes with a risk. Even for the best fighter and (beskar) amour. Not all risks make us prepare deliberate activities to be performed in advance of starting a fight. Not all risks make us prepare testing activities. The risks that we do prepare/test for are when our feelings about something that may happen will be hard to fix. If it wasn’t hard to fix it wouldn’t be much of a risk – if it was certain – and not a feeling, it wouldn’t be a risk either. It would be an addressable fact. Risks are about relative time to fix.

High MTTR = more explicit testing

Put in engineering terms, we usually prepare explicit testing activities when Mean Time To Repair (MTTR) is relatively higher than we would like it to be. It’s all relative – like quality. Mean Time to Repair is probably relatively high for:

  • Grogu
  • Lives of friends
  • Unique spaceships
  • Space shuttles and Space-X rockets
  • Other big things that there are only one of
  • Events that happens every once in a while
  • Live presentations
  • Quaterly earnings calls

And IT releases – when they are one of a kind – darlings of the CTO. Over the last 10 years, though, it has been proven both with research and plenty of practical examples that there is another side than the “Measure Twice, Cut Once” of high MTTR systems and solutions. Working with a relatively lower Mean Time To Repair is more effective and more adaptable to an ever-changing landscape.

Low MTTR = less explicit testing, yet testing happens

When did you last to explicitly test something that has a low MTTR? You shouldn’t have had to. If you did there is probably some feeling nagging around that sending this exact blog post out in the world will be hard to fix. Or shipping this exact feature will do more damage than good. What you might long for are some ways to control the experiment. Set the limits of impact on time, budget, quality – and a way to repair the situation should everything explode.

  • This is why successful IT teams use sprints as timebox experiments in delivering customer value.
  • This is why frequent and early feedback (unit tests and continouos integration) works
  • This is why even adding a little test automation in every sprint is better than not

When things are hard, that you have to do – do them again and again. Practise. Dare to take the first step. Build pervasive activities that test and confirm that you are on course. Fight the urge to postpone – and see only the dark side. Dare to evolve the things you work on from unique darlings and pets into established practices and commodity solutions. It’s just an IT system in the end.

IT systems are a side quest to the business – don’t lose track of the main goal. Use MTTR to understand when to add explicit testing

6 thoughts on “#265 – Using MTTR to Understand When to Test

  1. […] Back in the day defects needed to be accounted for, tracked, and distributed. Besides testing documents – defects were the only tangible delivery of the testers. The defects needed to be raised and closed. I recently wrote a guideline that stated that only observations that couldn’t be fixed within a day should be raised to the project manager for shared handling. In that context fixing things is within the same team. If it’s for another team to fix, defects are simply something communicated between the teams (check team topologies for team interactions). Sure you can still find a blocker or a P1 – what matters is how fast you can fix things. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.