Before we explain pairwise testing, let’s describe the problem it solves
Very large and complex systems can be very difficult and expensive to test. We inherit legacy systems with multiple man-years of development effort already in place. These systems are in the field and of unknown quality. With these systems, there are frequently huge gaps in the requirements documentation. Pairwise testing provides a way to test these large, existing systems. And on many projects, we’re called in because there is a quality problem.
We are faced with the challenge of quickly improving, or at least quickly demonstrating momentum and improvement in the quality of this existing software. We may not have the time to go re-gather the requirements, document them, and validate them through testing before our sponsor pulls the plug (or gets fired). We’re therefore faced with the need to approach the problem with blackbox (or black box) testing techniques.
For a complex system, the amount of testing required can be overwhelming. Imaging a product with 20 controls in the user interface, each of which has 5 possible values. We would have to test 5^20 different combinations (95,367,431,640,625) to cover every possible set of user inputs.
The power of pairwise
With pairwise programming, we can achieve on the order of 90% coverage of our code in this example with 54 tests! The exact amount of coverage will vary from application to application, but analysis consistently puts the value in the neighborhood of 90%. The following are some results from pairwise.org.
We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc. […] The pairwise tests gave over 90 percent block coverage.
Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases.
[…] a set of 29 pair-wise AETG tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage.
Got our attention!
How does pairwise testing work?
Pairwise testing builds upon an understanding of the way bugs manifest in software. Usually, a bug is caused not by a single variable causing a bug, but by the unique combination of two variables causing a bug. For example, imagine a control that calculates and displays shipping charges in an eCommerce website. The website also calculates taxes for shipped products (when there is a store in the same state as the recipient, sales taxes are charged, otherwise, they are not). Both controls were implemented and tested and work great. However, when shipping to a customer in a state that charges taxes, the shipping calculation is incorrect. It is the interplay of the two variables that causes the bug to manifest.
If we test every unique combination of every pair of variables in the application, we will uncover all of these bugs. Studies have shown that the overwhelming majority of bugs are caused by the interplay of two variables. We can increase the number of combinations to look at every three, four, or more variables as well – this is called N-wise testing. Pairwise testing is N-wise testing where N=2.
How do we determine the set of tests to run?
There are several commercial and free software packages that will calculate the required pairwise test suite for a given set of variables, and some that will calculate N-wise tests as well. Our favorite is a public domain (free) software package called jenny, written by Bob Jenkins. jenny will calculate N-wise test suites, and its default mode is to calculate pairwise tests. jenny is a command line tool, written in C, and is very easy to use. To calculate the pairwise tests for our example (20 controls, each with 5 possible inputs), we simply type the following:
jenny 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 > output.txt
And jenny generates results that look like the following:
1a 2d 3c 4d 5c 6b 7c 8c 9a 10c 11b 12e 13b 14d 15a 16c 17a 18d 19a 20e
1b 2e 3a 4a 5d 6c 7b 8e 9d 10a 11e 12d 13c 14c 15c 16e 17c 18a 19d 20d
1c 2b 3e 4b 5e 6a 7a 8d 9e 10d 11d 12a 13e 14e 15b 16b 17e 18e 19b 20c
1d 2a 3d 4c 5a 6d 7d 8b 9b 10e 11c 12b 13d 14b 15d 16d 17d 18b 19e 20a
1e 2c 3b 4e 5b 6e 7e 8a 9c 10b 11a 12c 13a 14a 15e 16a 17b 18c 19c 20b
1a 2a 3c 4e 5e 6a 7b 8c 9d 10b 11b 12b 13e 14a 15d 16d 17c 18c 19b 20d [...]
Where the numbers represent each of the 20 controls, and the letters represent each of the five possible selections.
What’s the catch?
There are two obvious catches. First, when you use a tool like jenny, we must run all of the tests that it identifies, we can’t pick and choose. Second, pairwise testing doesn’t find everything. What if our example bug before about taxes and shipping only manifested when the user is a first time customer? Pairwise testing would not catch it. We would need to use N-wise testing with N >= 3. Our experience has been that N=3 is effective for almost all bugs.
There is also a sneaky catch – test generators like jenny assume that the order of variables is irrelevant. Sometimes we are testing dynamic user interfaces, where the order of value selection in controls is relevant. There is a solution to this, and we will update this post with a link to that solution when it is available.
- – -
Check out the index of software testing series posts for more testing articles.