Desk Tests, Continuous Tests, and Everything Between: How Much Automation Different Tests Deserve

This is a guest post by Cameron Laird.

Some tests deserve automation. Some tests are worth running on developers’ desktops. And — rarely — some code doesn’t deserve to be tested at all.

Here’s how to get from “red” to “green” in each case.

Fully automated tests

Unit tests generally merit full automation. They typically run quickly (often under a second each); don’t depend on network connections or secrets such as passwords; are often portable enough to run in different environments, including development, staging, and production; and are largely parallelizable. Test-driven development (TDD) naturally generates unit tests of this sort.

Unit tests have their difficulties, though. Graphical user interfaces (GUIs), including native mobile applications and web applications, often deserve their reputation for being harder to drive automatically than such other deliverables as libraries or services. A rich ecosystem of tools, frameworks, and fixtures for driving GUIs exists. I largely leave those for consideration on another day, with one exception: GUIs often result in tests that deserve decomposition.

Suppose, for instance, that an application has been defined to provide the functionality to register a new user. An end-user points and clicks their way through a sequence of actions to create a database record of one individual profile. It’s natural to define a test that corresponds to that requirement, and several testing tools or platforms make it feasible, even when considering the difficulties GUIs introduce.

Consider, though, decomposition of this one requirement into two:

Collect user profile through GUI
Save user profile to database

You could even consider three or more steps:

Collect user profile through GUI
Validate user profile (confirm the username and password meet any restrictions on them, and so on)
Save the validated user profile to database

Even if complete automation through the GUI exists for the whole collect-and-save sequence, the GUI-free subtests can be valuable. The latter might be quick enough to run in more continuous testing (CT) contexts, perhaps are easier to run in parallel with other tests than anything involving a GUI, and are likely easier to maintain.

GUIs aren’t the only “inflexibility” that complicate testing; requirements bound to time or place also deserve refactoring and segmenting when tested. Think of a backup that is supposed to be run an hour past midnight each morning, or content rules that vary from country to country. The right test for the former is almost surely not to an end-to-end test. Instead, the backup part of the requirement should be tested to ensure it works correctly at all hours, while a separate verification establishes that the backup actually launches at 1 a.m. This example also illustrates that some tests are easier to write and maintain when considered more generally: It’s almost always easier to test a backup whose functionality works at all hours than one restricted to a single time each day.

One variation on a completely automated test, therefore, is a test that has several parts, where each of the parts is fully or partially automated.

Partial automation by policy

Another incompletely automated test we frequently encounter is one that relies on human judgment by policy.

A test of specific database actions might admit automation in principle, but a decision has been made to keep passwords or similar credentials separate from the test framework. Such a test might be completely automated, except that an individual tester must enter their credentials.

Expert system administrator Tom Limoncelli underlines that all tests live on a spectrum, from casual and unsystematic to fully automated. One of the healthiest things we can do is think of our tests in terms of a lifecycle, where they start as an idea, advance to a human-readable script and, sometimes, reach full automation.

Another common example of incomplete automation is a load test that might be entirely automated, except it’s only allowed to run when a senior engineer authorizes that network and related runtime conditions are favorable. This deserves a full outline for humans to read:

Engineer with role $ROLE authorizes test $NETWORK-LOAD-1
Tester $TESTER launches $NETWORK-LOAD-1.

All our tests, even the ones not automated by computer, should be reproducible by qualified humans. As Limoncelli explains, an effective tactic to achieve automation is to segment an individual test into sensible steps, then automate individual steps as possible.

Manual tests, automation, or both

Some tests are easier for humans to run than computers. Subjective requirements, such as that a webpage be “readable,” or pattern detections, like distinguishing photographs of dogs from those of cats, can be quick and easy for trained humans but hard to capture through automation.

Other requirements might appear routine for computers but then turn out to be unexpectedly thorny because their solutions don’t fit in a conventional testing framework. Consider these examples:

Remove records older than six months
Log a timeout if a particular operation lasts longer than 20 seconds, and report an exception to the end-user
Credentials must be renewed at least every 20 hours
Reliably handle financial transactions, no matter the order in which remote services (such as balance look-up, authorization, inventory availability, and so on) reply
Sensible reporting even during temporary loss of a network connection to back-end hosts

An experienced tester might come up with a valuable manual test — sometimes by disconnecting a cable! — for some of these, without a satisfying way to automate the manual test.

In fact, all of these individual requirements admit automation, given the right background or tooling. When we have trustworthy manual tests, though, for situations that are tedious to automate more, it’s time to recognize that full automation doesn’t deserve to be an immediate priority.

A requirement in an infrequently exercised segment of code or in part of an application where failure is easily recovered might get by with light testing at the time of first creation. It’s OK to concentrate automation efforts on more fragile parts of an application, at least until they reach reliability.

Complement testing with good inspection habits. While zealots sometimes argue the superiority of testing vs. inspection, smart engineers know to look for ways to make the most of both techniques, often in combination. Exhaustive tests of a software adder are impractical to write and too time-consuming to run. However, thoughtful inspection combined with automated verification of a few examples — “1 + 3 = 4,” “8 + (-8) = 0” and “100,000 + 1,000 = 101,000”, for instance — is a good way to succeed.

Take a measured approach

A sensible approach to testing, then, aspires to completeness achieved by tests that combine manual and automated strengths. Make sure any manual work is documented well enough that any knowledgeable worker can independently execute it. Also, break requirements into testable segments.

Over time, more and more of the manual tests can be captured through automation, liberating the time and effort of testers — so they can come up with even more tests and automation!

Cameron Laird is an award-winning software developer and author. Cameron participates in several industry support and standards organizations, including voting membership in the Python Software Foundation. A long-time resident of the Texas Gulf Coast, Cameron’s favorite applications are for farm automation.

Desk Tests, Continuous Tests, and Everything Between: How Much Automation Different Tests Deserve

Fully automated tests

Partial automation by policy

Manual tests, automation, or both

Take a measured approach

In This Article:

Sign up for our newsletter

Share this article

Other Blogs

How to Identify, Fix, and Prevent Flaky Tests

What is Continuous Testing in DevOps? (Strategy + Tools)

DevOps Testing Culture: Top 5 Mistakes to Avoid When Building Quality Throughout the SDLC