Advanced PERT Estimation

…Continued from the previous page

Why You Cannot Add Two PERT Estimates

Consider two independent tasks with PERT estimates. The first task is estimated at 2/4/6. The second task is independently estimated at 2/4/6. How long will it take you to complete both tasks?

This is where most people make their mistake by just adding all the values. That mistake would create a PERT estimate of (2+2)/(4+4)/(6+6) = 4/8/12. At first glance, this looks like it makes sense. Here’s proof that it does not work this way.

  • There is a less than 1% chance that the first task will take less than two hours with a 2/4/6 PERT estimate.
  • There is a less than 1% chance that the second task will take less than two hours with a 2/4/6 PERT estimate.

The statements above reflect the definition of what a PERT estimate represents. If the new PERT estimate (4/8/12) would explicitly state that there is the exact same chance that performing both tasks would take less than four hours. Because the tasks are independent, this is impossible. Here’s where the statisticians help you. The two tasks are independent – that means that the amount of time one task takes has no bearing on the amount of time the second task will take.

This is the same math, by the way, that applies to dice-rolls in craps. Each die is independent. There’s 1 chance in 6 that the roll of  a single die will be a 1.  If this math were correct, there would also be 1 chance in 6 of rolling two dice and having them both be 1s.

You can think of combining PERT estimates like rolling dice. Each estimate you wish to combine is a die. When you want to combine independent tasks into a combined PERT estimate, it is like rolling multiple dice in the same throw. Each project (combination of tasks) is a handfull of dice, thrown once.

The same logic applies to the last number in a PERT estimate. If you will complete the first task in under 6 hours over 99% of the time, and complete the second task in under 6 hours over 99% of the time, you will complete both taks in under 12 hours much more frequently.

You can see from the above that you can not just add the PERT estimate values for the “best case” and “worst case” scenarios, and still maintain the integrity of what a PERT estimate represents. The two numbers (4 and 12, in this example) would no longer represent the same chance of beating or exceeding the estimate range.

This “intuitive math” does not always work for the middle number of the PERT estimate. It only works when the estimate is symmetrical.

Based on the above, adding a 2/4/6 PERT estimate to a 2/4/6 PERT estimate results in a new PERT estimate of “more than 4″/8/”less than 12.”

The following diagrams show visually the distributions that result from the bad math described above, and the good math (described later in this article).

[larger image]

And in CDF form

[larger image]

How You Can Add Two PERT Estimates

To properly add two PERT estimates you have to understand one thing – are they independent?  In other words, for the two 2/4/6 tasks – does the amount of time it does end up taking you (say 3.5 hours) to complete the first task have anything to do with the amount of time it takes you to complete the second task?  The answer to that question determines how you have to do the math to combine them.

An example: You estimate that you can write a 2000 word article  in 2/4/6 hours (as a PERT estimate).  You determined those numbers by thinking about your past experiences:

  • Sometimes, the ideas just flow, and you can write 1000 words per hour.
  • Sometimes, you have to do research to back up your arguments, and that takes time.
  • Generally, you write about 500 words an hour, except when interrupted.

If you’re asked to write one article, you would provide a 2/4/6 hour PERT estimate.  What if you were asked to write two articles?  Those are independent estimates.  What if you are asked to write two articles on the same topic (where you can presumably leverage the same research investment across both articles).  What if you’re asked to write them on the same day (where the likelihood of interruption is “the same” during the writing of both articles)?  Depending on the details of a particular estimation activity, the estimates may be independent, or they may be correlated (because they are both similarly influenced by the same circumstances).

Perhaps you’re estimating programming tasks.  You’re estimating the creation of a database for a CMS, and the creation of an AJAX UI for that CMS.  If the bulk of the variation in your individual PERT estimates comes from “shared unknowns” like being interrupted, there is likely a correlation between the estimates.  If the variations in your PERT estimates come from independent factors (learning AJAX and learning MySQL), the estimates may be independent.

In both situations, you need to be able to combine the individual estimates, to produce an overall estimate.  Here’s how to do that math.

Combining Independent PERT Estimates

When adding two independent normal distibutions (combining two PERT estimates), you need to do five things:

  1. Determine the mean of each PERT estimate (find the center of each curve)
  2. Determine the standard deviation of each PERT estimate (describe the shape of each curve).
  3. Calculate the combined mean of the combined PERT estimate (find the center of the combined curve).
  4. Calculate the standard deviation of the combined PERT estimate (describe the shape of the combined curve).
  5. Calculate the PERT values for the combined PERT estimate (express the center and shape of the combined curve as a PERT estimate).

Determining the mean of a PERT estimate is not just picking the middle value, it is picking the average value.  If we describe a three-value PERT estimate as B/L/W for best case / likely case / worst case, the following equation is used:

The mean, M, equals (B+ 4L + W)/6.  For a 2/4/6 estimate, M = (2 + 4*4 + 6)/6 = 4.  For a 3/5/10 estimate, M = (3 +4*5+10)/6 = 5.5.

The definition of the PERT estimate is to provide a six-sigma range, so one standard deviation, StdDev = (W-B)/6.  For a 2/4/6 estimate, StdDev = (6-2)/6 = 0.67.  For a 3/5/10 estimate, StdDev = (10-3)/6 = 1.17.

When combining a 2/4/6 PERT estimate with another 2/4/6 PERT estimate, the combined mean = the sum of the two means.  4+4=8.

The standard deviation of combining two 2/4/6 PERT estimates requires more complicated math.  The combined standard deviation equals the square root of the sum of the squares of the individual standard deviations.  StdDev = (StdDev1^2 + StdDev2^2)^0.5.  StdDev = (0.67^2 + 0.67^2)^0.5 = 0.94.

The new PERT values are the mean plus or minus three standard deviations.  B = L – 3*StdDev = 8 – 3*0.94 = 5.2.  W = L + 3*StdDev = 8+ 3*0.94 = 10.8.

The combined PERT estimate is 5.2/8/10.8.

Notice that the variation in the combined estimate is higher (+/- 2.8) than the individual estimates, but lower than the “bad math” described above would predict (+/-3).  This is because of the central limit theorem (the math that predicts offsetting opposing events, like rolling a 2 with one die and a 5 with the other).

Combining Correlated PERT Estimates

When PERT estimates are correlated, you still combine them with the following similar set of steps:

  1. Determine the mean of each PERT estimate (find the center of each curve)
  2. Determine the standard deviation of each PERT estimate (describe the shape of each curve).
  3. Determine the correlation coefficient (how much the variation in one estimate is related to the variation in the other estimate).
  4. Calculate the combined mean of the combined PERT estimate (find the center of the combined curve).
  5. Calculate the standard deviation of the combined PERT estimate (describe the shape of the combined curve).
  6. Calculate the PERT values for the combined PERT estimate (express the center and shape of the combined curve as a PERT estimate).

Notice that step 3 has been added, but the rest of the list is the same.  Step 5 – calculating the new standard deviation, will be different, as it takes into account the results from step 3.

Calculating a correlation coefficient (step 3) is an entirely subjective effort when it comes to PERT estimation.  The correlation coefficient (rho) is a value between 0 and 1, expressing the degree of “shared unknowns” in your estimates.  Zero represents completely independent estimates.  One represents completely dependent estimates.

As an example, you may estimate two programming tasks in building a new CMS.  You create two PERT estimates,  2/4/6 and 3/5/10, where the variation is mostly based on using a new (to you) programming language.  You speculate that 75% of the variation is due to the same source – in other words, there’s a 0.75 correlation between taking extra time to do the first task and taking extra time to do the second task.

To calculate the new standard deviation (step 5), you need to use a more complicate equation that includes this correlation coefficient.  StdDev = (StdDev1^2 + StdDev2^2 + 2*rho*StdDev1*StdDev2)^0.5

Combining two PERT estimates of 2/4/6 and 3/5/10 with rho=0.75 works as follows:

  1. M1 = (2+4*4+6)/6 = 4.  M2 = (3 + 4*5+10) = 5.5
  2. StdDev1 = (6-2)/6 = 0.67.  StdDev2 = (10-3)/6 = 1.17
  3. Rho = 0.75 (estimated)
  4. M = M1 + M2 = 9.5
  5. StdDev = (0.67^2 + 1.17^2 + 2*0.75*0.67*1.17)^0.5 = 1.72
  6. L= 9.5 – 3* 1.72 = 4.33.  W = 9.5 + 3*1.72 = 14.67

The combined PERT estimate = 4.3/9.5/14.7.

Imagine if the two PERT estimates were much more independent – say rho=0.1.  Using the same process, you would calculate a combined PERT estimate of 5.3/9.5/13.7.

If the two PERT estimates were completely independent, rho = 0.  This would yield a combined PERT estimate of 5.5/9.5/13.5.

Combining Multiple PERT Estimates

What if you need to combine multiple PERT estimates (say 200 of them)?  You can use a combination of the above approaches.  One way to approach this is to identify pairs of related PERT estimates, and combine them using the correlated form of the equation.  Keep doing this until all of the remaining activities are independent.  The other approach is to assume that all of the estimates are independent.

You can combine more than two independent PERT estimates using the same math as when combining two independent PERT estimates.

  • The combined mean is the sum of the means of all estimates.  M = M1 + M2 + M3 + …
  • The combined standard deviation is the square root of the sum of the squares of all standard deviations.  StdDev = (StdDev1^2 + StdDev2^2 + StdDev3^2 + …)^0.5.

This becomes very powerful when estimating the overall effort for a large collection of tasks.  When you have a large task – say “deploy a CMS”, you decompose it into many smaller tasks.  Then you create PERT estimates for those smaller tasks, then use this approach to combine the estimates into a single PERT estimate.  The math will show a much smaller PERT range than you would otherwise expect for the large task.

I initially developed / applied the correlated distributions models as part of some product design & process control work I did for Texas Instruments in 1994-95.  The findings were published as Six Sigma Design and Statistical Process Analysis, 1995 (pdf, 2MB) in the TI Technical Journal Vol 12, #6. I refined and validated the application of this math to project management over the course of a couple years leading teams and implementing complex software solutions in the early 2000’s.


I last used this approach on a project where I lead half a dozen people estimating and completing hundreds of tasks in a series of three-week sprints.  We were able to use our estimates to very effectively time-box our deliverables, manage customer expectations, and deliver “full content” on time for each sprint.  My management team was very skeptical of the initial estimates (the estimates were more precise than they were used to seeing), but completely convinced as delivery after delivery met commitments and expectations.  The people on the team had varying degrees of experience (both at development and at estimation) and we had a large variation in the quality and size of individual estimates.  Because of the central limit theorem, most of that variation cancelled itself out as “overly ambitious” and “overly pessimistic” estimates cancelled each other out.

For me, this was the crucible through which I became sold that this math would work effectively for team management.

24 thoughts on “Advanced PERT Estimation

  1. Pingback: Scott Sehlhorst
  2. Pingback: Yama
  3. Pingback: thepmp
  4. Pingback: Andre
  5. Pingback: Andre B.
  6. Pingback: david prince
  7. Pingback: david prince
  8. Pingback: Rolf Götz
  9. Thanks Scott.

    If you’re doing iterative development you might find interesting the calculators (“Release Calculator” and “Wiggle Room Calculator”) that you can find in the sidebar of my website Tweet me (@asplake) if you find them interesting/useful or have suggestions for others.


    1. Hey, thanks Mike. Pretty neat calculators.

      Somewhere (or some-when) I wrote up an analysis of agile estimation, based on Mike Cohn’s work. Maybe an old blog article that I can’t find, but I think it was something for a client. The key element was that using velocity to manage / predict “rough estimates” based deliverability distinctly from detailed (PERT) estimates for tasks made sense, because you still get the feedback loop that removes uncertainty from the estimation process. Can’t find the write-up.

      Long story short, I think the calculator can come in handy for that ‘planning estimation’ phase.

  10. Pingback: Brain Washer - PMP
  11. Pingback: Askar Baybuzov
  12. Hi Scott,

    Your articles about PERT are the best ones I have ever read! Very well explained. Your advanced article is great explaining why PERT estimates cannot be added. No other article I have read has mentioned this.

    I have looked all over internet to find a spreadsheet that calculates PERT correctly and doesn’t just add the estimates. Despite my efforts I have not been able to find a good one. Do you have a good spreadsheet or can recommend a good PM tool for this?


    Kind Regards

    1. Thanks, Cecilia!

      I don’t have a template-spreadsheet for doing this, I’ve just built them on the fly as projects required them. I’ll put one together for a future article. It will be tricky to come up with a general way to identify related tasks (versus unrelated tasks) – but I have an idea about how to do it. If you subscribe to the blog, you’ll get an email when it shows up.

      Thanks again,

  13. Hi Scott,

    How do I combine 3 or more correlated PERT estimates, e.g. 2/4/6 and 3/5/10 and 4/6/12 with rho=0.75?



    1. Hey Nic,

      Thanks for the question. What I’ve learned about correlation between distributions (e.g. the amount of shared-variation) has always been in terms of looking at two distributions, never three. So, you raised an interesting question. I sort of alluded to the answer (when I talked about repeatedly combining pairs of estimates), but definitely didn’t explain it.

      I’m assuming that your rho=0.75 is the same between A (2/4/6) and B (3/5/10), as it is between A and C (4/6/12), as it is between B and C.

      I couldn’t find any “real math” (e.g. from other people, or otherwise formally proven) to back me up on it, but my suspicion is that you just combine any pair (AB, BC, or AC) to create a new distribution, then combine that resultant distribution with the remaining initial set (C, A, or B, respectively).

      The combinations of distributions are combinations that input normal distribution approximations of the beta distributions (represented by the individual PERT estimates) and output normal distributions. So, the math should work for repeated combinations.

      I did a quick test, trying the above approach with all three sequences, and got the same result, regardless of the order in which I combined distributions.

      My resultant estimate for your example is 7.6/16.2/24.8 (std of 2.9).

      My approach was to determine the combined distribution for A+B -> AB (4.3/9.5/14.7) and for B+C -> BC (5.1/12.2/19.2) and C+A -> CA (5.0/10.7/16.3), then combine them again with C, A, and B, respectively. The second combination for ABC yielded 7.6/16.2/24.8 with all three approaches.

      That makes me feel confident that the order is independent.

      My “sniff test” makes sense too. Combining the distributions causess you to incorporate the first relationship (between A and B, in AB), but does not address the other correlations (between C and either A or B). That correlation is accounted for in the second combination of distributions.

      Does that help?

  14. Pingback: fifth.sentinel

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.