Drop a Little AI on It: Random Test Data Generation

This is a guest post by Matthew Heusser.

The world of software testing is awash in hype about Artificial Intelligence. On one hand, we have the people who seem to think that within a month there will be a tool that takes a web page address and returns test results, no humans involved. On the other there are the folks declaring the emperor has no clothes.

The truth, as always, will likely be somewhere in the middle.

Let’s talk about how to use the simplest of Artificial Intelligence (AI) in test tooling.

Artificial Intelligence (AI) But Not Machine Learning (ML)

If you look for a dictionary definition of AI, you may find it is any use of a computer to simulate intelligence. Thus, when you type a series of symptoms into WebMD and come back with a potential diagnosis, that is a kind of AI. When your test code has an “if” statement in it, that is a kind of AI. Most of the time, when people talk about AI, they mean a system that is capable of “learning” in some capacity. For example, a system could look at a series of numbers and make predictions about what might come next, look at what actually came next, and then use that data to make further predictions. We call that training, and it is a particular form of Machine Learning (ML).

Before we get to ML, we can inject randomness in our test data. This increases the coverage of the application, and can find errors, with very little effort. It might not be actual intelligence, but it certainly fits the definition of artificial intelligence.

Let’s begin with an example.

A Simple Password Example

Say, for example, that we are testing the create profile screen with automation tooling. One classic way to do this is to create a valid_account_create method that takes a table full of potential valid passwords. For our purposes, say a valid password must be eight characters, of which there must be at least three of the following: lowercase letters, uppercase letters, numbers, and symbols. As a team, before the project begins, we create a table like this:

Password	Is Testing
A!bc5678	All 4
Abcd5678	Capital, lower, number
!!bc5678	Symbol, lower, number
A!bcdefg	Upper, symbol, lower
A!CD5678	Upper, symbol, number

We would likely have another test for invalid passwords – seven characters, nine characters, no characters, only two-out-of-the-four, only one, and so on.

The problem here is what Dr. Cem Kaner calls the “problem of local optima.” It is possible that the programmer keyed in on something else, some other condition, that we cannot see from the requirements.

What if we could randomly generate valid passwords instead, put that value in a variable, and then use it for the rest of the test?

Let’s talk about it.

The Random Password Generator

First, for each of the eight characters, randomly select if the character will be upper, lower, symbol, or number. Then for that character type, select a random value. Then run through the string to see if it is valid. If it needs to be valid, you could stick it in a while () loop until it creates a valid password.

The amazing thing about this sort of test is that it will find bugs. Doug Hoffman, a founding member of the Association for Software Testing, once used a similar approach to test a 32-bit square root function. Instead of random data, he used a for loop to test every possible input – and found two problems. In this case, we will find a defect – the <> and / symbols, which are part of HTML, generate errors. The Javascript on the front-end that was checking the passwords calls the password valid, but the server removes the symbols. Thus, in the database, the passwords are encrypted without the symbols. Users will create the account with no error, but not be able to log in.

To be more specific, if someone’s password is “1234”, it will be saved, and work to login, as BR1234. During an overnight test run, the user will create an account but fail to login. By storing these in variables that are saved during the test run, the programmer can easily find and fix this issue.

That’s not an idle comment; I recently ran into this problem personally when changing my password at a regional bank.

Other Uses for Random Data

Bug Magnet is a free browser plugin that can generate random but valid data of various kinds, from names to dates to valid email addresses. Faker is a PHP library that can do similar things for automation. Phone numbers, addresses, colors, and lorem ipsum text are just a few examples of the kinds of easy-to-generate random values you might use. While it is a little dated, perlclip by James Bach and Danny Faught can generate random data of many types from the command line, and put it into the screen or the paste buffer. Because it runs at the command line, any program can run it and redirect the output and use it. For that matter, if your tool supports a programming language like C#, you can create your own.

There’s a handful of ways to create random test data, and a story of the kind of bugs they can find. Have you used random test data in your projects? If you did, what kind of errors did you find?

Matthew Heusser is the Managing Director of Excelon Development, with expertise in project management, development, writing, and systems improvement. And yes, he does software testing too.

Drop a Little AI on It: Random Test Data Generation

Artificial Intelligence (AI) But Not Machine Learning (ML)

A Simple Password Example

The Random Password Generator

Other Uses for Random Data

In This Article:

Sign up for our newsletter

Share this article

Other Blogs

How to Identify, Fix, and Prevent Flaky Tests

How to Report On Test Automation (Tools + Metrics)

What is Continuous Testing in DevOps? (Strategy + Tools)