Tag Archives: graybox testing

John Henry, Manual Tester

John Henry

There’s a piece of North American folklore about John Henry, who was a manual laborer during the expansion of the railroads in our country. His job was being replaced by steam-driven heavy equipment, as the railroad industry applied technology to become more efficient. The same dynamics are happening today with manual testers. We need to make sure that manual testers avoid John Henry’s fate – read on to see why.

Manual and Automated Testing

Software can not be tested efficiently with only manual testing. And it can not be tested completely with only automated testing. James Bach writes an insightful article about how he worked his way to enlightenment over several years, and how he has incorporated this perspective into his training classes.

My understanding of cognitive skills of testing and my understanding of test automation are linked, so it was some years before I came to understand what I now propose as the first rule of test automation:

Test Automation Rule #1: A good manual test cannot be automated.

James goes on to explain that if a test can be automated, it is not a good manual test.

We’ve discussed repeatedly the benefits of automated testing, one example is The Code Freeze is Killing the Dinosaurs. We’ve also stressed the importance of doing functional testing where we discuss the dynamics and tradeoffs of using manual and automated approaches to test software.

Generally, we’ve approached the problem from the perspective of blackbox versus whitebox testing (and more details), and thought about our development process from the perspective of continuous integration as a means to deliver more efficiently.

We’ve not thought about it from James’ perspective before. Even with our improved approaches to automated testing, we are still no different than the inventor of the steam-hammer in Henry’s fable.


James puts things in perspective:

Rather than banishing human qualities, another approach to process improvement is to harness them. I train testers to take control of their mental models and devise powerful questions to probe the technology in front of them. This is a process of self-programming. In this way of working, test automation is seen as an extension of the human mind, not a substitute.

He highlights a very interesting point – we are missing a key benefit of having manual testers – the ability to gather feedback on intangibles, heuristics, usability, affordances, and other elements that we might classify as design bugs.

We talk about the importance of usable software, and different approaches to creating great software. But we’ve never really talked about how to test for software greatness.

A Broken Model

In the 1970’s, when the American automotive manufacturers were getting their clocks cleaned by the newly exporting Japanese companies like Honda, Datsun (now Nissan), and Toyota, a lot of people thought they understood the problem. Many more people pretended there wasn’t a problem, until the US government bailed out Chrysler.

The experts decided that the main problem was that quality was not “Job One”, that American manufacturers ignored the advice of W. Edwards Demming, and the Japanese did not. Toyota’s lean manufacturing philosophy is credited with much of the success.

It is true that the oil crisis of the time gave the Japanese companies an opportunity to penetrate the US market with smaller, more efficient cars. But that crisis ended (about the time that the American companies killed the muscle car and began building more efficient cars). But the Japanese cars didn’t go away. Once gas prices dropped, they lost a differentiating element – but they maintained two others. Cost and Quality.

Target of Opportunity Enables Strategic Advantage

Cost matters. But it matters tactically, as in “I can afford $X right now – what are my choices?” Quality creates Loyalty. And loyalty is strategic. The Japanese manufacturers gain and continue to gain success after success and award after award in the automotive industry. Why? Because they have good quality.

Automated and Manual Testing

As an engineer, I know that we can specify tolerances, inspect components, and test assemblies to make sure that products are “within specification.” And all of this testing can be automated – just like automated software testing. But passing these tests doesn’t make a product good, it merely indicates that the product is “within specification.” Did the Japanese manufacturers have tighter tolerances? Yes. But did they have better designs? Yes. And those better designs were about more than miles-per-gallon and horsepower and torque.

They were about qualitative items that many engineers struggle to absorb. “Feels Good” and “The controls are where I expect them to be” and “Makes sense” and “I like it.” Software developers often struggle with the same issues.

And this is where manual testing matters. Getting qualitative feedback on your product designs is the only way to improve qualitative elements of those designs. The American manufacturers showed their disdain and hubris in allowing their customers to provide that feedback. The Japanese companies got feedback before they sold their products.

We can use manual testing to provide us with the kinds of feedback that can’t be automated. Things like “This UI is confusing” and “Why won’t it let me…?”


Don’t give up on manual testing as we strive to achieve software product success. Automate everything we can, and use people to test what we can’t.  We need to make sure that we don’t lose the ability (or make sure that we find the ability) to test manually, for qualitative insight into our products.  We can’t let the testers burst their hearts like poor John Henry, trying to manually perform an automatable test.
Thanks again, James, for starting us down this path!

Foundation Series: Functional Testing of Software

Functional testing class

Functional Testing, also referred to as System Testing of software is the practice of testing the completed software to confirm that it meets the requirements defined for the software. A functional test is typically a test of user interactions, but can also involve communication with external systems. We contrast functional testing with unit testing. We also show how functional testing provides different benefits than unit testing.

This is a relatively long post for a Foundation Series post, so sit back with some coffee and relax. This primer will be worth it if its a new topic. If you know this stuff already, check out the links to other articles that go into more depth on points we make.

An Application is a Series of Flows

We can think of an application from the perspective of a user, as a series of interactions, or flows through the user interface.

Application flow

People are not usually forced to follow a fixed set of instructions, or a predefined sequence of actions in an application. They can interact with controls in a random order, skip controls entirely, or otherwise do stuff that developers don’t expect.

Unit Tests are Whitebox Tests

Unit testing, as we detailed in our telephone example, provides targeted test coverage of specific areas of the code inside the application. Unit tests are written by developers, to allow them to test that the implementation that they created is behaving as they intended. Unit tests don’t implicitly provide the start-to-finish coverage that functional tests usually provide. Unit tests are whitebox tests that assure that a specific behavior intended by the developer is happening. A weakness of using unit tests alone is that they will not identify when the developer misinterpreted the requirements.

unit testing

Functional Tests are Blackbox Tests

A functional test, however, is designed without insight into how the implementation works. It is a blackbox test. A functional test represents a set of user interactions with the application. The concept behind a functional test is to validate something about the state of the application after a series of events. According to Aberro Software, 80% of all functional tests are performed manually. That means that the most common functional test involves a tester making selections in a series of controls, and then evaluating a condition. This evaluation is called an assertion. The tester asserts that the software is in a specific state (an output is created, a control is filtered in a specific way, a control is enabled, a navigation option is disabled, etc).

full functional testing

Good functional requirements are written as concisely as possible. A requirement that supports a particular use case might state that the user specifies A, B, and C, and the application responds with D. A functional test designed to validate that requirement will almost always mimic this most common flow of events. The script that the tester follows will be to specify A, then B, then C. The tester will then evaluate the assertion that D is true. If D is false, then the test has failed.

A functional test may not cover the entire set of likely user interactions, but rather a subset of them.

Targeted functional testing

One problem with this approach is that it does not account for a user specifying (A, B, X, C) or (A, C, B). These variations in order of operations might cause the underlying code to execute differently, and might uncover a bug. For a tester to get complete coverage of the requirement (A + B + C => D), he would have to create multiple scripts. This is expensive, tedious, and often redundant. But a tester has no way to know if the multiple scripts are redundant, or required.

Combining Unit Tests and Functional Tests

When we combine both unit testing and functional testing approaches, we are implementing what is called graybox testing (greybox testing). This is also referred to as layered testing. Graybox testing provides two types of feedback into the software development process. The unit tests provide feedback to the developer that her implementation is working as designed. The functional tests provide feedback to the tester that the application is working as required.

Layered testing

Graybox testing is the ideal approach for any software project, and is a key component of any continuous integration strategy. Continuous integration is a process where the software is compiled and tested every day throughout the release cycle – instead of waiting until the end of the cycle to test. Read this plan for implementing continuous integration if you want more details.

Automating Functional Tests

Automating unit testing is both straightforward, and relatively inexpensive. Automating functional testing is more expensive to set up, and much more expensive to maintain. Each functional test represents a script of specific actions. A tester (with programming skills) can utilize software packages like WinRunner to create scripts of actions followed by assertions. This represents an upfront cost of programming a script to match the application, in parallel with the development of the application – and it requires a tester with specialized skills to program the script.

The maintenance cost of automating functional tests is magnified in the early development stages of any application, and throughout the life of any application developed with an agile process. Whenever an element of the user interface is changed, every script that interacts with that element can be broken (depending on the nature of the change). These broken scripts have to be manually updated to reflect these ongoing changes. In periods of heavy interface churn, the cost of maintaining the test suite can quickly become overwhelming.

In the real world, apparently 80% of teams find that this overwhelming cost of automated testing outweighs even the high cost of manual functional testing.

Improved Automation of Functional Tests

We can reduce the maintenance cost of keeping automated scripts current with the user interface by abstracting the script-coding from the script-definition. This is referred to as keyword and table scripting. A set of objects are coded by the tester and given keywords. Each object represents an element in the user interface. Script behavior (sequence of interaction) is defined in terms of these keywords. Now, when a UI element is changed, the keyword-object is updated and all of the scripts that reference it are repaired.

This, however, does not address issues where one control is refactored into two controls, the adding or removing of controls, or changes in the desired flow of interaction. There is still a very large (albeit smaller) maintenance burden. And the applications that use this approach (such as QTP) can cost in the tens of thousands of dollars. Another reason to do functional testing manually.


Functional testing is important to validating requirements. It is an important element of assuring a level of software quality. And it is still expensive with the best of today’s proven solutions. Even with the high cost, it is much cheaper than the risk of delivering a solution with poor quality. Plan on having functional testing as a component of any process to achieve software product success.

– – –

Check out the index of the Foundation series posts which will be updated whenever new posts are added.

Software testing series: A case study


This post is a test automation case study, but at a higher level.

We’ll talk about it in terms of defining the problem, and then discuss the objective (what we proposed to do to solve the problem), the strategy (how we went about doing it) and the tactics (how we executed the strategy). Since this happened in the real world, we’ll also identify the constraints within which we had to operate.

Our hope is that it will spur questions that allow us to dive deeper in conversation on topics mentioned, implied and inspired.

Why we needed something (aka The Problem)
I was working with a client manager in the past, and he had a “quality problem.” This manager was getting pressure from his VP, who was getting negative feedback from users about the quality of one of the manager’s software products. Bugs in this software regularly led to 10s to 100s of hours (per bug) in cost when they reached the field. These bugs would also introduce a risk of lost sales or profits. This manager was responsible for development and testing, but not requirements.

This existing enterprise application was written about ten years ago, had significant changes in every monthly release, and had a small development team averaging about five people, with regular rotations onto and off the project. There were over a quarter of a million lines of code in this application. The application had a very large user interface, and complicated integration with other systems. The team had an existing process of manual testing, both by developers and dedicated testers, and a large suite of automated blackbox system tests. The developers did not have practical experience in creating unit tests or applying unit test automation.

An analysis of the bugs revealed a majority of them as being introduced in the development cycle, with requirements bugs in second place.

The Objective

  1. Immediately improve the perception of quality of the software by outside organizations.
  2. Improve quality measurably for the long term.
  3. Reduce the cost of quality for the software from existing levels.

The Constraints

  1. No personnel changes – any changes must be supported by the current team (no permanent additions or replacements).
  2. No changes in existing committments to stakeholders – commitments are in place for 6 months (at the full capacity of the team).
  3. Small budget for the project – a one-time cost of less than 5% of the operating budget (for the current team), with long term costs offset by other gains in productivity.

The Strategy

  1. Improve existing automated regression testing to improve quality for the long term.
  2. Change the development process to include regression testing as part of code-promotion (versus the current practice of regression testing release candidates).

The Tactics

  1. Use unit testing (specifically whitebox testing) to augment existing test framework – overall, a gray box testing process. To minimize the maintenance effort over time, the testing framework was developed to use data-driven scripts that represent user sessions with the software. This allowed the team to easily create (and delegate creation of) scripts that represented user sessions. These scripts were combined with a set of inspections that tested the application for particular parameters, outputs, and behaviors. The intersections of scripts and inspections result in unit tests.
  2. Immediately start writing test for all new code. We flipped a switch and required developers “from this day forward” to replace their current manual feature testing for ongoing development with creation of automated unit tests for ongoing development. Kent Beck first suggested this technique to me about five years ago as a way to “add testing” to an existing application. His theory is that the areas of the code that are being modified are the areas of the code most likely to be broken – existing code is less likely to spontaneously break, and is not the top priority for testing. Over time, if all of the code gets modified, then all of the code gets tested.
  3. Jump start a small initial test suite. We timeboxed a small inital effort to identify the “high risk” areas of the application usage by defining the most common usage patterns. These patterns were then embodied in a set of scripts that were used in the testing framework. We also set aside time for creating a set of initial inspections designed to provide valuable insight into the guts of the application. The developers identified those things that they “commonly looked at” when making changes to the application. These inspections instrumented elements of the application (like a temperature gauge in your car – it tells you if the coolant is too hot, even if it doesn’t tell you why).

Unfortunately, we can’t share the results beyond the client’s internal team. Anecdotally, a very similar approach for a different client, team, and application netted a 10% reduction in development effort and had a dramatically positive affect on both quality and perceived quality. At Tyner Blain, we strongly encourage using this approach.

Top seven tips for rolling out this plan effectively

  • Set expectations. A key constraint for this approach was “don’t spend a bunch of money.” The process is designed to improve quality over time, with little (and dwindling) incremental cost. Every month, as more tests are added along with new code, the opportunity for bugs to be released to the test team (much less to the field) goes down. The rate of quality improvement will be proportional to the rate of change in the code base. Also point out that only those bugs introduced in the development cycle will be caught – requirements bugs will not be caught.
  • Educate the development team. When asking developers to change the way they’ve been writing and releasing code for a decade, it can be tricky to get acceptance. Responses can be as bad as “We don’t have a problem with quality” or “I know how to do my job – you’re telling me that I don’t?” if this isn’t done. Start with education about the techniques and highlight the tangible benefits to developers. There will be reduced complaints about the quality of the code – most developers are proud of their work, and will gladly adopt any technique that helps them improve it – as long as they don’t feel defensive about it.
  • Educate the managers. Help managers understand that unit testing isn’t a silver bullet – it can’t catch every bug, but done correctly, unit testing will catch the most bugs per dollar invested.
  • Educate the test team. No, we’re not automating you out of a job. A gray box testing strategy is comprehensive. Automating regression testing effectively allows manual testers to focus on system level testing and overall quality assurance. The time saved can be applied to testing that should be, but isn’t being done today.
  • Establish ownership. The developers are being asked to take ownership explicitly for something they already own implicitly. Before incorporating regression testing as part of the development cycle, the contract with the test team was “The new stuff works. Let me know if I broke any of the old stuff.” With this process in place, the contract between the development team and the test team becomes “The new stuff works, some of the old stuff still works, and the new stuff will continue to work forever.”
  • Provide feedback. Track the metrics, such as bugs versus lines of code (existing and modified) or bugs versus user sessions. Absolute numbers (bugs in the field, bugs found in test, numbers of inspections and scripts and unit tests) are also good. Communicate these and other metrics to everyone on the team – managers, developers, testers. Provide the feedback regularly (at least with every release). This will help the project gain momentum and visibility. That will validate the ideas, and help propogate the approach to other software development cycles.
  • Leverage the feedback cycle to empower the team members to make it even better.

[Update: The series of posts, Organizing a test suite with tags recounts a real-world followup to the solution implemented as described in this post. We explore a design concept associated with refactoring the solution from above. The first of those posts is here, or you can follow the links below]
– – –

Check out the index of software testing series posts for more articles.

Software Testing Series: Black Box vs White Box Testing


Should I use black box testing or white box testing for my software?

You will hear three answers to this question – black, white, and gray. We recently published a foundation series post on black box and white box testing – which serves as a good background document. We also mention greybox (or gray box) testing as a layered approach to combining both disciplines.

Given those definitions, let’s look at the pros and cons of each style of testing.

Black box software testing

Black box


  • The focus is on the goals of the software with a requirements-validation approach to testing. Thanks Roger for pointing that out on the previous post. These tests are most commonly used for functional testing.
  • Easier to staff a team. We don’t need software developers or other experts to perform these tests (note: expertise is required to identify which tests to run, etc). Manual testers are also easier to find at lower rates than developers – presenting an opportunity to save money, or test more, or both.


  • Higher maintenance cost with automated testing. Application changes tend to break black-box tests, because of their reliance on the constancy of the interface.
  • Redundancy of tests. Without insight into the implementation, the same code paths can get tested repeatedly, while others are not tested at all.

White box software testing

White box


  • More efficient automated testing. Unit tests can be defined that isolate particular areas of the code, and they can be tested independently. This enables faster test suite processing
  • More efficient debugging of problems. When a regression error is introduced during development, the source of the error can be more efficiently found – the tests that identify an error are closely related (or directly tied) to the troublesome code. This reduces the effort required to find the bug.
  • A key component of TDD. Test driven development (an Agile practice) depends upon the creation of tests during the development process – implicitly dependent upon knowledge of the implementation. Unit tests are also a critical element for continuous integration.


  • Harder to use to validate requirements. White box tests incorporate (and often focus on) how something is implemented, not why it is implemented. Since product requirements express “full system” outputs, black box tests are better suited to validating requirements. Carefull white box tests can be designed to test requirements.
  • Hard to catch misinterpretation of requirements. Developers read the requirements. They also design the tests. If they implement the wrong idea in the code because the requirement is ambiguous, the white box test will also check for the wrong thing. Specifically, the developers risk testing that the wrong requirement is properly implemented.
  • Hard to test unpredictable behavior. Users will do the strangest things. If they aren’t anticipated, a white box test won’t catch them. I recently saw this with a client, where a bug only showed up if the user visited all of the pages in an application (effectively caching them) before going back to the first screen to enter values in the controls.
  • Requires more expertise and training. Before someone can run tests that utilize knowledge of the implementation, that person needs to learn about how the software is implemented.

Which testing approach should we use?

There is also the concept of gray box testing, or layered testing – using both black box and white box techniques to balance the pros and cons for a project. We have seen this approach work very effectively for larger teams. Developers utilize white box tests to prevent submission of bugs to a testing team that uses black box tests to validate that requirements have been met (and to perform system level testing). This approach also allows for a mixture of manual and automated testing. Any continuous integration strategy should utilize both forms of testing.
Weekend reading (links with more links warning):

White box vs. black box testing by Grig Gheorghiu. Includes links to a debate and examples.

Black box testing by Steve Rowe.

A case study of effective black box testing from the Agile Testing blog

Benefits of automated testing from the Quality Assurance and Automated Testing blog

What book should I read to learn more?

Software Testing, by Ron Patton (the eBook version, which is cheaper).

Here’s a review from Randy Rice “Software Testing Consultant & Trainer” (Oklahoma City, OK)

Software Testing is a book oriented toward people just entering or considering the testing field, although there are nuggets of information that even seasoned professionals will find helpful. Perhaps the greatest value of this book would be a resource for test team leaders to give to their new testers or test interns. To date, I haven?t seen a book that gives a better introduction to software testing with this amount of coverage. Ron Patton has written this book at a very understandable level and gives practical examples of every test type he discusses in the book. Plus, Patton uses examples that are accessible to most people, such as basic Windows utilities.

I like the simplicity and practicality of this book. There are no complex formulas or processes to confuse the reader that may be getting into testing for the first time. However, the important of process is discussed. I also have to say a big THANK YOU to Ron Patton for drawing the distinction between QA and testing! Finally, the breadth of coverage in Software Testing is super. Patton covers not only the most important topics, such as basic functional testing, but also attribute testing, such as usability and compatibility. He also covers web-based testing and test automation ? and as in all topics covered in the book, Patton knew when to stop. If you want to drill deeper on any of the topics in this book, there are other fine books that can take you there!

I love this book because it is practical, gives a good introduction to software testing, and has some things that even experienced testers will find of interest. This book is also a tool to communicate what testing and QA are all about. This is something that test organizations need as they make the message to management, developers and users. No test library should be without a copy of Software Testing by Ron Patton!

– – –

Check out the index of software testing series posts for more articles.

Foundation Series: Black Box and White Box Software Testing

Blackbox tests and whitebox tests.

These terms get thrown about quite a bit. In a previous post, we referenced Marc Clifton’s advanced unit testing series. If you were already familiar with the domain, his article could immediately build on that background knowledge and extend it.Software testing can be most simply described as “for a given set of inputs into a software application, evaluate a set of outputs.”
Software testing is a cause-and-effect analysis.

Continue reading Foundation Series: Black Box and White Box Software Testing