A look back at the best from this week in the past.
A look back at the best from this week in the past.
It was software testing week this time a year ago…
Check out the index of software testing series posts for more articles.
Software testing can be most simply described as “for a given set of inputs into a software application, evaluate a set of outputs.” Software testing is a cause-and-effect analysis.
Given those definitions, let’s look at the pros and cons of each style of testing.
There’s a piece of North American folklore about John Henry, who was a manual laborer during the expansion of the railroads in our country. His job was being replaced by steam-driven heavy equipment, as the railroad industry applied technology to become more efficient. The same dynamics are happening today with manual testers. We need to make sure that manual testers avoid John Henry’s fate – read on to see why.
Manual and Automated Testing
Software can not be tested efficiently with only manual testing. And it can not be tested completely with only automated testing. James Bach writes an insightful article about how he worked his way to enlightenment over several years, and how he has incorporated this perspective into his training classes.
My understanding of cognitive skills of testing and my understanding of test automation are linked, so it was some years before I came to understand what I now propose as the first rule of test automation:
Test Automation Rule #1: A good manual test cannot be automated.
James goes on to explain that if a test can be automated, it is not a good manual test.
We’ve discussed repeatedly the benefits of automated testing, one example is The Code Freeze is Killing the Dinosaurs. We’ve also stressed the importance of doing functional testing where we discuss the dynamics and tradeoffs of using manual and automated approaches to test software.
Generally, we’ve approached the problem from the perspective of blackbox versus whitebox testing (and more details), and thought about our development process from the perspective of continuous integration as a means to deliver more efficiently.
We’ve not thought about it from James’ perspective before. Even with our improved approaches to automated testing, we are still no different than the inventor of the steam-hammer in Henry’s fable.
James puts things in perspective:
Rather than banishing human qualities, another approach to process improvement is to harness them. I train testers to take control of their mental models and devise powerful questions to probe the technology in front of them. This is a process of self-programming. In this way of working, test automation is seen as an extension of the human mind, not a substitute.
He highlights a very interesting point – we are missing a key benefit of having manual testers – the ability to gather feedback on intangibles, heuristics, usability, affordances, and other elements that we might classify as design bugs.
A Broken Model
In the 1970’s, when the American automotive manufacturers were getting their clocks cleaned by the newly exporting Japanese companies like Honda, Datsun (now Nissan), and Toyota, a lot of people thought they understood the problem. Many more people pretended there wasn’t a problem, until the US government bailed out Chrysler.
The experts decided that the main problem was that quality was not “Job One”, that American manufacturers ignored the advice of W. Edwards Demming, and the Japanese did not. Toyota’s lean manufacturing philosophy is credited with much of the success.
It is true that the oil crisis of the time gave the Japanese companies an opportunity to penetrate the US market with smaller, more efficient cars. But that crisis ended (about the time that the American companies killed the muscle car and began building more efficient cars). But the Japanese cars didn’t go away. Once gas prices dropped, they lost a differentiating element – but they maintained two others. Cost and Quality.
Target of Opportunity Enables Strategic Advantage
Cost matters. But it matters tactically, as in “I can afford $X right now – what are my choices?” Quality creates Loyalty. And loyalty is strategic. The Japanese manufacturers gain and continue to gain success after success and award after award in the automotive industry. Why? Because they have good quality.
Automated and Manual Testing
As an engineer, I know that we can specify tolerances, inspect components, and test assemblies to make sure that products are “within specification.” And all of this testing can be automated – just like automated software testing. But passing these tests doesn’t make a product good, it merely indicates that the product is “within specification.” Did the Japanese manufacturers have tighter tolerances? Yes. But did they have better designs? Yes. And those better designs were about more than miles-per-gallon and horsepower and torque.
They were about qualitative items that many engineers struggle to absorb. “Feels Good” and “The controls are where I expect them to be” and “Makes sense” and “I like it.” Software developers often struggle with the same issues.
And this is where manual testing matters. Getting qualitative feedback on your product designs is the only way to improve qualitative elements of those designs. The American manufacturers showed their disdain and hubris in allowing their customers to provide that feedback. The Japanese companies got feedback before they sold their products.
We can use manual testing to provide us with the kinds of feedback that can’t be automated. Things like “This UI is confusing” and “Why won’t it let me…?”
Don’t give up on manual testing as we strive to achieve software product success. Automate everything we can, and use people to test what we can’t. We need to make sure that we don’t lose the ability (or make sure that we find the ability) to test manually, for qualitative insight into our products. We can’t let the testers burst their hearts like poor John Henry, trying to manually perform an automatable test.
Thanks again, James, for starting us down this path!
developer.* has just published a 15-page article, Test Smarter, Not Harder by Scott Sehlhorst on test automation. We present the background for that article here (why you should read it) as well as a summary of what is covered in the article. Check it out at developer.*, and thanks to Dan Read, both for starting developer.* (which is a GREAT site) and for publishing the article. Skip to the end of this post to see how you can help.
Automated Testing Must Be a Core Competence
Automated testing has become a critical component of software product success. Processes like continuous integration require test automation to be effective. Unit testing requires automated testing to be effective as a rapid-development tool. Whitebox testing is almost always done with automation. More and more blackbox testing is being done with automation. A team without a good approach to automated testing is working at a severe disadvantage, and is arguably doomed to extinction.
Functional Testing (or system testing) is where a team can achieve process differentiation with automation. 80% of teams rely on manual functional testing today. These teams argue that automating functional tests is too expensive. The reasons most cited are complexity and rapid change in the underlying product.
Testing Complex Software
Complex software presents very challenging problems for test automation. First – how do we deal with the extreme complexity of many of today’s software applications? Billions or trillions of combinations (scripts) of user actions can be made. And software that works with one combination may be buggy in hundreds of others.
We can’t realistically test all of the combinations. Even if we did have the hardware needed to run them we would not be able to validate the results of all of those tests. So we can’t achieve exhaustive test coverage.
Without exhaustive test coverage, how do we know when we’ve tested enough?
Test Smarter Not Harder
The article starts with an analysis of the complexity of modern software and a brief discussion of the organizational realities and typical responses. Then the article explores common approaches and techniques in place today:
The article explores the math, benefits, and limitations of each approach. While N-wise testing is the most effective approach in common use today, it has a limitation – it assumes that the user’s actions (input variables) happen in a prescribed sequence or are irrelevant. In most complex software this assumption is invalid. The article presents a technique for incorporating order-dependence into the statistical approaches for developing test coverage.
The article also demonstrates techniques for simplifying the testing representation using whitebox testing techniques to redefine the problem space, and then applying blackbox testing (statistical) approaches to execution of the resultant test parameters.
By combining the statistical techniques explained in the article with a communication plan for sharing the appropriate quality metrics with stakeholders, we can deliver higher quality software, at a lower cost, and with improved organizational support and visibility.
While many Tyner Blain readers are interested in test automation, most of our readers would not want to go into this much depth. The audience at developer.* has a more technical focus, and their site is a more natural place for this article. Personally, I am honored to be included as an article author there – and all of you who want more technical depth than we go into at Tyner Blain should really check out the stuff Dan Read has put together. They’ve also just published their first indie-publisher ‘real book‘, which you can get via their site. My copy is on the way as I type.
How can you help?
This is a great opportunity for PR for Tyner Blain – please share the article with your associates because we want our community to grow. Also, it would be great if you ‘digg’ the article at digg (just follow the link, and ‘digg it’), del.icio.us, blink, or any other networking site. In addition to being good content, I hope to make this an experiment in viral marketing. Lets see if we can collectively reach the tipping point that brings this article, and the Tyner Blain community to the next group of people.
Functional Testing, also referred to as System Testing of software is the practice of testing the completed software to confirm that it meets the requirements defined for the software. A functional test is typically a test of user interactions, but can also involve communication with external systems. We contrast functional testing with unit testing. We also show how functional testing provides different benefits than unit testing.
This is a relatively long post for a Foundation Series post, so sit back with some coffee and relax. This primer will be worth it if its a new topic. If you know this stuff already, check out the links to other articles that go into more depth on points we make.
An Application is a Series of Flows
We can think of an application from the perspective of a user, as a series of interactions, or flows through the user interface.
People are not usually forced to follow a fixed set of instructions, or a predefined sequence of actions in an application. They can interact with controls in a random order, skip controls entirely, or otherwise do stuff that developers don’t expect.
Unit Tests are Whitebox Tests
Unit testing, as we detailed in our telephone example, provides targeted test coverage of specific areas of the code inside the application. Unit tests are written by developers, to allow them to test that the implementation that they created is behaving as they intended. Unit tests don’t implicitly provide the start-to-finish coverage that functional tests usually provide. Unit tests are whitebox tests that assure that a specific behavior intended by the developer is happening. A weakness of using unit tests alone is that they will not identify when the developer misinterpreted the requirements.
Functional Tests are Blackbox Tests
A functional test, however, is designed without insight into how the implementation works. It is a blackbox test. A functional test represents a set of user interactions with the application. The concept behind a functional test is to validate something about the state of the application after a series of events. According to Aberro Software, 80% of all functional tests are performed manually. That means that the most common functional test involves a tester making selections in a series of controls, and then evaluating a condition. This evaluation is called an assertion. The tester asserts that the software is in a specific state (an output is created, a control is filtered in a specific way, a control is enabled, a navigation option is disabled, etc).
Good functional requirements are written as concisely as possible. A requirement that supports a particular use case might state that the user specifies A, B, and C, and the application responds with D. A functional test designed to validate that requirement will almost always mimic this most common flow of events. The script that the tester follows will be to specify A, then B, then C. The tester will then evaluate the assertion that D is true. If D is false, then the test has failed.
A functional test may not cover the entire set of likely user interactions, but rather a subset of them.
One problem with this approach is that it does not account for a user specifying (A, B, X, C) or (A, C, B). These variations in order of operations might cause the underlying code to execute differently, and might uncover a bug. For a tester to get complete coverage of the requirement (A + B + C => D), he would have to create multiple scripts. This is expensive, tedious, and often redundant. But a tester has no way to know if the multiple scripts are redundant, or required.
Combining Unit Tests and Functional Tests
When we combine both unit testing and functional testing approaches, we are implementing what is called graybox testing (greybox testing). This is also referred to as layered testing. Graybox testing provides two types of feedback into the software development process. The unit tests provide feedback to the developer that her implementation is working as designed. The functional tests provide feedback to the tester that the application is working as required.
Graybox testing is the ideal approach for any software project, and is a key component of any continuous integration strategy. Continuous integration is a process where the software is compiled and tested every day throughout the release cycle – instead of waiting until the end of the cycle to test. Read this plan for implementing continuous integration if you want more details.
Automating Functional Tests
Automating unit testing is both straightforward, and relatively inexpensive. Automating functional testing is more expensive to set up, and much more expensive to maintain. Each functional test represents a script of specific actions. A tester (with programming skills) can utilize software packages like WinRunner to create scripts of actions followed by assertions. This represents an upfront cost of programming a script to match the application, in parallel with the development of the application – and it requires a tester with specialized skills to program the script.
The maintenance cost of automating functional tests is magnified in the early development stages of any application, and throughout the life of any application developed with an agile process. Whenever an element of the user interface is changed, every script that interacts with that element can be broken (depending on the nature of the change). These broken scripts have to be manually updated to reflect these ongoing changes. In periods of heavy interface churn, the cost of maintaining the test suite can quickly become overwhelming.
In the real world, apparently 80% of teams find that this overwhelming cost of automated testing outweighs even the high cost of manual functional testing.
Improved Automation of Functional Tests
We can reduce the maintenance cost of keeping automated scripts current with the user interface by abstracting the script-coding from the script-definition. This is referred to as keyword and table scripting. A set of objects are coded by the tester and given keywords. Each object represents an element in the user interface. Script behavior (sequence of interaction) is defined in terms of these keywords. Now, when a UI element is changed, the keyword-object is updated and all of the scripts that reference it are repaired.
This, however, does not address issues where one control is refactored into two controls, the adding or removing of controls, or changes in the desired flow of interaction. There is still a very large (albeit smaller) maintenance burden. And the applications that use this approach (such as QTP) can cost in the tens of thousands of dollars. Another reason to do functional testing manually.
Functional testing is important to validating requirements. It is an important element of assuring a level of software quality. And it is still expensive with the best of today’s proven solutions. Even with the high cost, it is much cheaper than the risk of delivering a solution with poor quality. Plan on having functional testing as a component of any process to achieve software product success.
– – –
Check out the index of the Foundation series posts which will be updated whenever new posts are added.
We can reach the next step in our software process evolution by automating much of our process. Flying squirrels evolved a technique* to quickly move from one tree to another without all the tedious climbing and dangerous running. Software teams that automate their processes achieve similar benefits. Automation allows us to increase efficiency while improving quality. And we spend less time on tedious and mundane tasks.
Benefits of process automation
Tim Kitchens has a great article at developer.* where he highlights the benefits of process automation. Here are our thoughts on the benefits he lists:
What and when should we automate?
The short answer is automate everything, unless there’s not enough ROI. We have to examine each process that we use to make a final decision – some automation will not make sense due to uncommon situations. Also, if we’re nearing the end of an existing project, there is less time to enjoy the benefits of automation, so we may not be able to justify the costs. We may be under pressure to deliver ROI in a short payback period. We would suggest exploring the automation of the following activities:
Automate the build process
Most people underestimate the benefits of an automated build. The obvious benefit is time savings during the normal build cycle. Imagine the build takes an hour, happens monthly, and usually happens twice per month. Two hours per month doesn’t seem like a lot of savings. However, chasing down a bug caused by the build process is at best expensive, and at worst nightmarishly expensive (because we aren’t looking in the right place to find the problem). Use an estimate of the probability of this happening to the expected value calculation for the savings.
The largest potential benefit of an automated build is changing the way we support our customers. Monthly builds aren’t scheduled because the business only wants updates once per month. They are scheduled at a monthly rate because that’s a balance someone has achieved between the cost-of-delivering and the cost-of-delaying a delivery. When we automate our delivery process, we dramatically reduce the cost of delivery, and can explore more frequent release schedules.
Automate unit testing
We significantly improve the efficiency of our team at delivering by shortening the feedback loop for developers. On a Utopian dev team, we would run our test suite as often as we compiled our code. Realistically, developers should run relevant automated whitebox tests every time they compile. They should run the suite of whitebox tests every time they promote code. And an automated process should run the full suite against the latest tip on a nightly basis (to catch oversights). It would be great if the check-in process initiated an automated test run and only allowed a promotion if all the tests passed.
Automate system and functional testing
End to end and blackbox tests should be automated next. These are the big picture tests, and should be run nightly on a dedicated box against the latest code base. We’ve had the most success with teams that used a nightly testing process, which sent an email with test results to the entire team whenever results changed. We’ve had the pleasure of working with a team that included performance testing on the nightly runs, and reported statistically significant improvement or degradation of performance.
Generate tactical documentation whenever possible. Use javadoc or the equivalent to automatically generate well formatted and organized reference materials for future developers.
Marginally relevant reporting
If our team is asked to report metrics like lines of code, cyclomatic complexity, code coverage, etc. We should automate this. This work is the definition of tedium, while presenting tenuous value to the manager who requested it. If we can’t convince someone that they don’t want this data, we should at least eliminate the pain of creating it.
Code coverage statistics can provide better than nothing insight into how much testing is being done, or how much functionality is exercised by the test suite. But code coverage metrics have the danger of false precision. There’s no way to say that a project with 90% code coverage has higher quality than a project with 80% coverage.
Automation makes sense. We save time, increase quality, and ensure a more robust process. We also spend less time on turn-the-crank activities and more time creating differentiated software.
*Technically, they don’t fly – they fall. With style.
Before we explain pairwise testing, let’s describe the problem it solves
Very large and complex systems can be very difficult and expensive to test. We inherit legacy systems with multiple man-years of development effort already in place. These systems are in the field and of unknown quality. With these systems, there are frequently huge gaps in the requirements documentation. Pairwise testing provides a way to test these large, existing systems. And on many projects, we’re called in because there is a quality problem.
We are faced with the challenge of quickly improving, or at least quickly demonstrating momentum and improvement in the quality of this existing software. We may not have the time to go re-gather the requirements, document them, and validate them through testing before our sponsor pulls the plug (or gets fired). We’re therefore faced with the need to approach the problem with blackbox (or black box) testing techniques.
For a complex system, the amount of testing required can be overwhelming. Imaging a product with 20 controls in the user interface, each of which has 5 possible values. We would have to test 5^20 different combinations (95,367,431,640,625) to cover every possible set of user inputs.
The power of pairwise
With pairwise programming, we can achieve on the order of 90% coverage of our code in this example with 54 tests! The exact amount of coverage will vary from application to application, but analysis consistently puts the value in the neighborhood of 90%. The following are some results from pairwise.org.
We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc. […] The pairwise tests gave over 90 percent block coverage.
Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases.
[…] a set of 29 pair-wise AETG tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage.
Got our attention!
How does pairwise testing work?
Pairwise testing builds upon an understanding of the way bugs manifest in software. Usually, a bug is caused not by a single variable causing a bug, but by the unique combination of two variables causing a bug. For example, imagine a control that calculates and displays shipping charges in an eCommerce website. The website also calculates taxes for shipped products (when there is a store in the same state as the recipient, sales taxes are charged, otherwise, they are not). Both controls were implemented and tested and work great. However, when shipping to a customer in a state that charges taxes, the shipping calculation is incorrect. It is the interplay of the two variables that causes the bug to manifest.
If we test every unique combination of every pair of variables in the application, we will uncover all of these bugs. Studies have shown that the overwhelming majority of bugs are caused by the interplay of two variables. We can increase the number of combinations to look at every three, four, or more variables as well – this is called N-wise testing. Pairwise testing is N-wise testing where N=2.
How do we determine the set of tests to run?
There are several commercial and free software packages that will calculate the required pairwise test suite for a given set of variables, and some that will calculate N-wise tests as well. Our favorite is a public domain (free) software package called jenny, written by Bob Jenkins. jenny will calculate N-wise test suites, and its default mode is to calculate pairwise tests. jenny is a command line tool, written in C, and is very easy to use. To calculate the pairwise tests for our example (20 controls, each with 5 possible inputs), we simply type the following:
jenny 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 > output.txt
And jenny generates results that look like the following:
1a 2d 3c 4d 5c 6b 7c 8c 9a 10c 11b 12e 13b 14d 15a 16c 17a 18d 19a 20e
1b 2e 3a 4a 5d 6c 7b 8e 9d 10a 11e 12d 13c 14c 15c 16e 17c 18a 19d 20d
1c 2b 3e 4b 5e 6a 7a 8d 9e 10d 11d 12a 13e 14e 15b 16b 17e 18e 19b 20c
1d 2a 3d 4c 5a 6d 7d 8b 9b 10e 11c 12b 13d 14b 15d 16d 17d 18b 19e 20a
1e 2c 3b 4e 5b 6e 7e 8a 9c 10b 11a 12c 13a 14a 15e 16a 17b 18c 19c 20b
1a 2a 3c 4e 5e 6a 7b 8c 9d 10b 11b 12b 13e 14a 15d 16d 17c 18c 19b 20d […]
Where the numbers represent each of the 20 controls, and the letters represent each of the five possible selections.
What’s the catch?
There are two obvious catches. First, when you use a tool like jenny, we must run all of the tests that it identifies, we can’t pick and choose. Second, pairwise testing doesn’t find everything. What if our example bug before about taxes and shipping only manifested when the user is a first time customer? Pairwise testing would not catch it. We would need to use N-wise testing with N >= 3. Our experience has been that N=3 is effective for almost all bugs.
There is also a sneaky catch – test generators like jenny assume that the order of variables is irrelevant. Sometimes we are testing dynamic user interfaces, where the order of value selection in controls is relevant. There is a solution to this, and we will update this post with a link to that solution when it is available.
– – –
Check out the index of software testing series posts for more testing articles.
Should I use black box testing or white box testing for my software?
You will hear three answers to this question – black, white, and gray. We recently published a foundation series post on black box and white box testing – which serves as a good background document. We also mention greybox (or gray box) testing as a layered approach to combining both disciplines.
Given those definitions, let’s look at the pros and cons of each style of testing.
Black box software testing
White box software testing
Which testing approach should we use?
There is also the concept of gray box testing, or layered testing – using both black box and white box techniques to balance the pros and cons for a project. We have seen this approach work very effectively for larger teams. Developers utilize white box tests to prevent submission of bugs to a testing team that uses black box tests to validate that requirements have been met (and to perform system level testing). This approach also allows for a mixture of manual and automated testing. Any continuous integration strategy should utilize both forms of testing.
Weekend reading (links with more links warning):
White box vs. black box testing by Grig Gheorghiu. Includes links to a debate and examples.
Black box testing by Steve Rowe.
A case study of effective black box testing from the Agile Testing blog
Benefits of automated testing from the Quality Assurance and Automated Testing blog
What book should I read to learn more?
Here’s a review from Randy Rice “Software Testing Consultant & Trainer” (Oklahoma City, OK)
Software Testing is a book oriented toward people just entering or considering the testing field, although there are nuggets of information that even seasoned professionals will find helpful. Perhaps the greatest value of this book would be a resource for test team leaders to give to their new testers or test interns. To date, I haven?t seen a book that gives a better introduction to software testing with this amount of coverage. Ron Patton has written this book at a very understandable level and gives practical examples of every test type he discusses in the book. Plus, Patton uses examples that are accessible to most people, such as basic Windows utilities.
I like the simplicity and practicality of this book. There are no complex formulas or processes to confuse the reader that may be getting into testing for the first time. However, the important of process is discussed. I also have to say a big THANK YOU to Ron Patton for drawing the distinction between QA and testing! Finally, the breadth of coverage in Software Testing is super. Patton covers not only the most important topics, such as basic functional testing, but also attribute testing, such as usability and compatibility. He also covers web-based testing and test automation ? and as in all topics covered in the book, Patton knew when to stop. If you want to drill deeper on any of the topics in this book, there are other fine books that can take you there!
I love this book because it is practical, gives a good introduction to software testing, and has some things that even experienced testers will find of interest. This book is also a tool to communicate what testing and QA are all about. This is something that test organizations need as they make the message to management, developers and users. No test library should be without a copy of Software Testing by Ron Patton!
– – –
Check out the index of software testing series posts for more articles.