This is a guest post by Peter G Walen.
Reliability engineering is an interesting thought exercise for me. I think that is because people often conflate quality with testing and quality assurance with quality control. They use terms interchangeably, ignoring what they mean.
I get it. It is easy to fall into a shorthand around complex ideas and default to buzzwords. Let’s take a look at what software reliability engineering (SRE) really is and what those engineers do.
SRE focuses on the dependability (or reliability) of the system in question. For most organizations, the ideas of reliability and availability are nearly synonymous. For people using the system, it is reliable if it is available. As reliability is often defined (for the mathematically inclined) as Reliability = 1 – Odds of failure, we can begin to see a relationship with concepts in software testing.
Both SRE and testing are looking at risks. Both are looking at areas of vulnerability and how we might model these to make them better understood so we can, in turn, mitigate them. Both have questions we need to consider, first and foremost being, “What happens if…?”
Can we see SRE, then, as a replacement for or an extension of software testing? Or something else?
Early in my career in software, I bought the book “Software Reliability” and studied it diligently. It had loads of heavy mathematical equations and lots of careful analysis around every aspect imaginable for how software might go wrong. It included models for predicting failure based on known behavior and limitations of the software in question. It was thick, and heavy — both in weight and in gravity.
In my first flush of eager confidence, I read this book, and when it was finished, I thought on it. The thinking took maybe five minutes. I decided it was absolute rubbish. It sat at the bottom of my bookshelf for some time, but still, I hung onto it.
A few years ago I was talking with an experienced mechanical engineer, the kind who designs equipment, tools, and physical things. In that conversation I realized that my younger self was the one full of rubbish, not the author of the book.
I thought about what the author was trying to tell me. He warned again and again on limitations due to an imperfect understanding of the system in question. He laid out how the equations and models used to predict how the system would behave in different circumstances could be turned topsy-turvy by the probable unknowns within the system.
This sent me down another thought path, and back to the question of whether SRE is the new testing.
I recognize that scope, nature and the approaches for how to measure and evaluate software system stability, availability and reliability have changed over the last 15 years, from when I first began looking at reliability engineering. I also acknowledge that practices will vary between organizations.
Still, the question of evaluation software for stability, dependability, resilience, and availability is not that dissimilar from evaluation software behavior for more broad usage and usability aspects. My working definition of software testing, for some time, has been “a systematic evaluation of the behavior of a piece of software, based on some model.”
With that definition of software testing and the general, albeit broad, definition of reliability engineering, these concepts are closely aligned. They are not identical, nor do they examine the same questions or use the same models to evaluate software.
But they do have the shared goal of understanding how the software works currently and determining whether this is acceptable.
Given that, then yes: System reliability engineering is software testing. However, it is not new. It has always been part of testing, even if SREs and general testers have not recognized it until now.
Peter G. Walen has over 25 years of experience in software development, testing, and agile practices. He works hard to help teams understand how their software works and interacts with other software and the people using it. He is a member of the Agile Alliance, the Scrum Alliance and the American Society for Quality (ASQ) and an active participant in software meetups and frequent conference speaker.
- TestRail Leads in the Spring 2020 G2 Grid for Test Management
- Announcing TestRail 6.2 with Fast Track Editing, Dynamic Filtering & Save Validation