Consider “Reasonable” UI Test Automation


The stories I hear about User Interface (UI) test automation are often fantastic or tragic. Some of those fantastic projects have thousands of tests running on parallel virtual machines that start with every new code check in. These stories usually involve sophisticated tools to discover whether a test is flaky, or if there is really a bug in the product. I have also heard stories of UI test automation projects gone wrong, such as the horrors of building a framework that has more bugs than it finds. People also talk about constant battles with management. Conflict arises over the return on investment of automating a user interface. Developers are sometimes asked to even hide problems to make reports to management look good, and to “just do what they are told”.

Some of these success stories are so fantastic that they are hard to believe, while the tragic stories are full of stress and disappointment. There seems to be a tendency for the tales of UI test automation to be either an unbelievably brilliant unicorn or utterly terrible. However, I’d like to tell you a different story based on my experience, about reasonable UI automation that is neither a technical wonderland or in complete shambles. This story demonstrates a context where UI test automation can be successful without the intense hype or tragedy, and the value that it brings.

Receive Popular Monthly Testing & QA Articles

Join 34,000 subscribers and receive carefully researched and popular article on software testing and QA. Top resources on becoming a better tester, learning new tools and building a team.

We will never share your email. 1-click unsubscribes.

The Context


Our story starts with a software product that has been around for years, that is currently in maintenance mode. Over time, the development team produced a main development branch, a few code branches that were for specific customers, and a few customer specific configuration branches. We tested three or four versions of the software for each release, and discovered problems whilst integrating features into the different customer branches. Little did we know that we were in for a few surprises in ways we could never have imagined.

It is fairly common practice today to perform some type of testing close to the code, normally unit testing, in addition to more customer-focused testing in the user interface. This product is built on a technology stack that is not very popular anymore — stored SQL procedures, a pinch of CSS, and a dash of JavaScript. Unit testing stored SQL procedures can still be done, but it is difficult, time consuming, and requires changes to the code to the point that that the tests become ineffective. The programming language and design was not built with test-ability in mind, but the software has paying customers so a rewrite is not in the cards any time soon. As a result, we had some unit tests, but not as many as we needed, and not at the level of effectiveness required.

Modern software shops usually aim to deliver new products to customers every couple of weeks. This strategy makes it easier to manage risk, easier to plan a release, and generally improves quality of life for the development team. Our release schedule tends to be quarterly, and sometimes longer. We release a new version of the product to our customers approximately once every three months. Rather than frequent releases with small changes, we have a longer release cycle with a lot of changes, along with the risk and uncertainty that goes with it.

The user interface of this product is relatively mature. It rarely changes, and when it does, it is not by much. Of course, there are occasional updates to add new features, and to refactor old features that are suffering from performance problems. The direction of development involves chasing what the customer needs, not the hottest JavaScript library or layout style. My automation project is not at risk of massive failure and refactor because someone decided to change from AngularJS to React, or a customer decided that a new feature is not what they imagined.

The Work


My daily work as an automation person can be broken into three broad categories:

  • Building tests
  • Refactoring
  • Bug hunts and investigation

As of today, I have a suite of about 150 tests that run nightly on three different environments. One Microsoft SQL Server release branch, one Oracle release branch, and one Oracle environment that we are preparing to release. Each test run takes about 2.5 hours to complete.

First thing in the morning I do two things; I check the email containing results from each run, and I read the application logs looking for anything interesting to investigate. The nightly test results give a list of everything that failed overnight. This includes some combination of software bugs, and tests that failed, either due to a problem with the test itself or because the product changed.

For me, bug hunts start with the application logs. The logs help me to associate a software problem to a failed test using the time stamp. When there is something interesting in the application logs, I’ll copy the time stamp into a text file and find the tests that were running around that time. I will start the test and observe the flow while watching the logs to discover exactly where the problem is. This usually gets me close to isolating the problem. After watching the test and looking for obvious failures, I’ll perform the test myself. This helps me observe the product, explore, and find exactly what is triggering the exception.

Running through the rest of the failed tests, leads me to refactoring. The tests normally fail because of a software change or a problem in the test itself. Last week, the development team made a change to our datepicker control. It was time for a new one, the old datepicker was slow and unattractive. The new datepicker was opened by clicking on an icon rather than setting focus on a date field, and it also had different IDs in the DOM. I spent 3/4 of a day updating test methods that used the datepicker to get them working again. Other tests that fail intermittently are usually from timing issues; this means that the test code is trying to manipulate the web browser somehow before the DOM is fully rendered. Refactoring is a regular activity, namely updating tests to make them stable again, and removing old tests that are no longer meaningful. This protects the test suite from decay and obsolescence. My test suite is small by some standards, with approximately 150 tests in total. This small number is what makes the suite manageable. I have enough time in a day to assess failures, and perform some refactoring for stability and performance. This isn’t possible when test suites are allowed to grow uncontrollably.

Adding new tests is done occasionally with careful consideration. Typically, the development team and I meet to discuss missing coverage. We try to design a test that will discover important problems in the product, while at the same time being as simple as possible. After building the test, I review it again with the development team to make sure it isn’t missing anything important. I get it checked into our source code repository and add it to the nightly test suites. There is a reasonable argument here that we are building simple, shallow tests because that’s what automation in the user interface affords. I combat the shallowness by exploring, layering coverage, and being alert to the different signals bugs can send (application log errors, and intermittent test failures for example).

The Value


Last month the development team made a change to a data lookup tool. This tool is used in many different places throughout the product; particularly in modal dialogs. The email from my nightly test run showed a high number of failures the morning after this change was introduced. Changing that lookup spoiled the layout of the modal dialogues where it was used. My automated tests failed every time they tried to find the field and enter a text string.

Legacy products often have a problem where it is common for new software changes to create new, and surprising problems. That creates a new kind of work, or I would say re-work, that only exists because the change failed in the first place. In our Lean Software Testing course, we call this “Failure Demand”; demand that only happens because of a failure. This UI automation suite helps us discover those problems much faster; a problem found the day after it was introduced is easier to track down and correct. The suite performs a nightly reconnaissance mission over a significant portion of the product. In the morning, I review the findings, and let people know about any important problems.

Ideally, consistency and quick coverage are key. This is of course, different from what a person investigating software can provide, but it is certainly no less valuable.

Reasonable Automation


Automating the User Interface doesn’t make sense in a lot of contexts. This strategy often introduces considerable costs to development organizations with questionable value. Reasonable automation isn’t always flashy, and doesn’t always use the latest tool set or technology. What it does provide is consistent value to the development team in the form of confidence and early, accurate bug reports.

This is a guest posting by Justin Rohrman. Justin has been a professional software tester in various capacities since 2005. In his current role, Justin is a consulting software tester and writer working with Excelon Development. Outside of work, he is currently serving on the Association For Software Testing Board of Directors as President helping to facilitate and develop various projects.