Tag Archives: pairwise testing

Test Smarter, Not Harder – Part 3

interconnected controls

This series is a reprint of an article by Scott Sehlhorst, written for developer.* in March 2006. A recent article on dailytech about “new” methods for software testing points to some very interesting research by the NIST (the National Institute of Standards and Technology) Information Technology Lab. We’ve split the original article into three articles to be more consistent in length with other Tyner Blain articles.

This is part 3 of a 3 part series.

Continue reading

Test Smarter, Not Harder – Part 2

interconnected controls

This series is a reprint of an article by Scott Sehlhorst, written for developer.* in March 2006. A recent article on dailytech about “new” methods for software testing points to some very interesting research by the NIST (the National Institute of Standards and Technology) Information Technology Lab. We’ve split the original article into three articles to be more consistent in length with other Tyner Blain articles.

This is part 2 of a 3 part series.
Continue reading

Test Smarter, Not Harder – A Detailed Article

Scott Sehlhorst

developer.* has just published a 15-page article, Test Smarter, Not Harder by Scott Sehlhorst on test automation. We present the background for that article here (why you should read it) as well as a summary of what is covered in the article. Check it out at developer.*, and thanks to Dan Read, both for starting developer.* (which is a GREAT site) and for publishing the article. Skip to the end of this post to see how you can help.

Automated Testing Must Be a Core Competence

Automated testing has become a critical component of software product success. Processes like continuous integration require test automation to be effective. Unit testing requires automated testing to be effective as a rapid-development tool. Whitebox testing is almost always done with automation. More and more blackbox testing is being done with automation. A team without a good approach to automated testing is working at a severe disadvantage, and is arguably doomed to extinction.

Functional Testing (or system testing) is where a team can achieve process differentiation with automation. 80% of teams rely on manual functional testing today. These teams argue that automating functional tests is too expensive. The reasons most cited are complexity and rapid change in the underlying product.

Testing Complex Software

Complex software presents very challenging problems for test automation. First – how do we deal with the extreme complexity of many of today’s software applications? Billions or trillions of combinations (scripts) of user actions can be made. And software that works with one combination may be buggy in hundreds of others.

We can’t realistically test all of the combinations. Even if we did have the hardware needed to run them we would not be able to validate the results of all of those tests. So we can’t achieve exhaustive test coverage.

Without exhaustive test coverage, how do we know when we’ve tested enough?

Test Smarter Not Harder

The article starts with an analysis of the complexity of modern software and a brief discussion of the organizational realities and typical responses. Then the article explores common approaches and techniques in place today:

  • Random Sampling
  • Pairwise Testing
  • N-wise Testing

The article explores the math, benefits, and limitations of each approach. While N-wise testing is the most effective approach in common use today, it has a limitation – it assumes that the user’s actions (input variables) happen in a prescribed sequence or are irrelevant. In most complex software this assumption is invalid. The article presents a technique for incorporating order-dependence into the statistical approaches for developing test coverage.

The article also demonstrates techniques for simplifying the testing representation using whitebox testing techniques to redefine the problem space, and then applying blackbox testing (statistical) approaches to execution of the resultant test parameters.

By combining the statistical techniques explained in the article with a communication plan for sharing the appropriate quality metrics with stakeholders, we can deliver higher quality software, at a lower cost, and with improved organizational support and visibility.

Why developer.*?

While many Tyner Blain readers are interested in test automation, most of our readers would not want to go into this much depth. The audience at developer.* has a more technical focus, and their site is a more natural place for this article. Personally, I am honored to be included as an article author there – and all of you who want more technical depth than we go into at Tyner Blain should really check out the stuff Dan Read has put together. They’ve also just published their first indie-publisher ‘real book‘, which you can get via their site. My copy is on the way as I type.

How can you help?

This is a great opportunity for PR for Tyner Blain – please share the article with your associates because we want our community to grow. Also, it would be great if you ‘digg’ the article at digg (just follow the link, and ‘digg it’), del.icio.us, blink, or any other networking site. In addition to being good content, I hope to make this an experiment in viral marketing. Lets see if we can collectively reach the tipping point that brings this article, and the Tyner Blain community to the next group of people.

[update: the article just made the front page at digg and has 374 diggs and 32 brutal comments :). Welcome to all diggers, mobile, agile, or hostile!]

Software testing series: Pairwise testing

testing equipment
Before we explain pairwise testing, let’s describe the problem it solves

Very large and complex systems can be very difficult and expensive to test. We inherit legacy systems with multiple man-years of development effort already in place.  These systems are in the field and of unknown quality. With these systems, there are frequently huge gaps in the requirements documentation. Pairwise testing provides a way to test these large, existing systems. And on many projects, we’re called in because there is a quality problem.

We are faced with the challenge of quickly improving, or at least quickly demonstrating momentum and improvement in the quality of this existing software. We may not have the time to go re-gather the requirements, document them, and validate them through testing before our sponsor pulls the plug (or gets fired). We’re therefore faced with the need to approach the problem with blackbox (or black box) testing techniques.

For a complex system, the amount of testing required can be overwhelming. Imaging a product with 20 controls in the user interface, each of which has 5 possible values. We would have to test 5^20 different combinations (95,367,431,640,625) to cover every possible set of user inputs.

The power of pairwise

With pairwise programming, we can achieve on the order of 90% coverage of our code in this example with 54 tests! The exact amount of coverage will vary from application to application, but analysis consistently puts the value in the neighborhood of 90%. The following are some results from pairwise.org.

We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc. […] The pairwise tests gave over 90 percent block coverage.

Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases.

[…] a set of 29 pair-wise AETG tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage.

Got our attention!

How does pairwise testing work?

Pairwise testing builds upon an understanding of the way bugs manifest in software. Usually, a bug is caused not by a single variable causing a bug, but by the unique combination of two variables causing a bug. For example, imagine a control that calculates and displays shipping charges in an eCommerce website. The website also calculates taxes for shipped products (when there is a store in the same state as the recipient, sales taxes are charged, otherwise, they are not). Both controls were implemented and tested and work great. However, when shipping to a customer in a state that charges taxes, the shipping calculation is incorrect. It is the interplay of the two variables that causes the bug to manifest.

If we test every unique combination of every pair of variables in the application, we will uncover all of these bugs. Studies have shown that the overwhelming majority of bugs are caused by the interplay of two variables. We can increase the number of combinations to look at every three, four, or more variables as well – this is called N-wise testing. Pairwise testing is N-wise testing where N=2.

How do we determine the set of tests to run?

There are several commercial and free software packages that will calculate the required pairwise test suite for a given set of variables, and some that will calculate N-wise tests as well. Our favorite is a public domain (free) software package called jenny, written by Bob Jenkins. jenny will calculate N-wise test suites, and its default mode is to calculate pairwise tests. jenny is a command line tool, written in C, and is very easy to use. To calculate the pairwise tests for our example (20 controls, each with 5 possible inputs), we simply type the following:

jenny 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 > output.txt

And jenny generates results that look like the following:

1a 2d 3c 4d 5c 6b 7c 8c 9a 10c 11b 12e 13b 14d 15a 16c 17a 18d 19a 20e
1b 2e 3a 4a 5d 6c 7b 8e 9d 10a 11e 12d 13c 14c 15c 16e 17c 18a 19d 20d
1c 2b 3e 4b 5e 6a 7a 8d 9e 10d 11d 12a 13e 14e 15b 16b 17e 18e 19b 20c
1d 2a 3d 4c 5a 6d 7d 8b 9b 10e 11c 12b 13d 14b 15d 16d 17d 18b 19e 20a
1e 2c 3b 4e 5b 6e 7e 8a 9c 10b 11a 12c 13a 14a 15e 16a 17b 18c 19c 20b
1a 2a 3c 4e 5e 6a 7b 8c 9d 10b 11b 12b 13e 14a 15d 16d 17c 18c 19b 20d […]

Where the numbers represent each of the 20 controls, and the letters represent each of the five possible selections.

What’s the catch?

There are two obvious catches. First, when you use a tool like jenny, we must run all of the tests that it identifies, we can’t pick and choose. Second, pairwise testing doesn’t find everything. What if our example bug before about taxes and shipping only manifested when the user is a first time customer? Pairwise testing would not catch it. We would need to use N-wise testing with N >= 3. Our experience has been that N=3 is effective for almost all bugs.

There is also a sneaky catch – test generators like jenny assume that the order of variables is irrelevant. Sometimes we are testing dynamic user interfaces, where the order of value selection in controls is relevant. There is a solution to this, and we will update this post with a link to that solution when it is available.

– – –

Check out the index of software testing series posts for more testing articles.