Advanced PERT Estimation

Creating a PERT estimate for a single task is both easy and straightforward. Creating an estimate for a set of tasks is still easy, but requires a little bit of math. Combining PERT estimates for tasks is easy, but not as obvious. Roll up your sleeves and dive in.

PERT Estimate Refresher

When estimating how long it will take you to complete a task, you shouldn’t estimate with a single value like “four hours.” “Four hours” does not provide enough information. Estimates reflect that there is uncertainty, and that single value does not give you any insights into how uncertain your estimate is. Your estimate could be “four hours plus or minus one hour” or it could be “at least four hours, maybe as much as sixteen hours.”

A PERT estimate, as we showed in our earlier PERT estimation article represents a distribution of likely effort for a particular task. To create a PERT estimate, you create three values –

  1. The “best case” (shortest) amount of time it will take to complete the task.
  2. The most likely amount of time it will take to complete the task.
  3. The “worst case” (longest) amount of time it will take to complete the task.

A PERT estimate is presented in the form “Best Case / Most Likely Case / Worst Case.” If your estimate is “four plus or minus two hours” you would write 2/4/6 as a PERT estimate. Sharing “2/4/6” as an estimate still doesn’t tell anyone else what you mean. A PERT estimate embodies some probability around completion time. You are precisely saying (for a 2/4/6 PERT estimate):

  1. There is a less than 1% chance that this task will take less than 2 hours.
  2. There is a 50% chance that this task will take less than 4 hours.
  3. There is a greater than 99% chance that this task will take less than 6 hours.

This explicit statement of probabilities represents a distribution of likely outcomes. Statisticians would phrase it in a confusing (but mathematically important) way. They would say “if we sampled a large population of people (like you) doing this task, no more than 1% of them would complete the task in under 2 hours, no more than 1% of them would spend more than 6 hours on this task, and half of the people would complete the task in under 4 hours.” For the purpose of estimation, you can ignore the statistic-speak and just look at the “odds” of how long it will take you to complete the task once.

[larger image]

If you were to create a histogram of thousands of executions of the task, it would look like the diagram above.  This is the traditional bell curve shape we are used to associating with a normal distribution, but it actually represents a PERT estimate.  A PERT estimate is a distribution of possible outcomes, based on a beta distribution.  Another way to look at a PERT estimate is in terms of cumulative probability of a task being completed.  The same data in the graph above, when presented as a cumulative distribution function (CDF) looks like the following:

[larger image]

Here’s the rub – you don’t really know if the beta distribution is the right one.  Primarily because you can’t precisely know the actual probabilities.  So, you have to pick a distribution function to represent those probabilities.  There is debate about the right distribution function to use for estimation – check out this great article for hard core stats guys.  Here’s a cautionary warning from their conclusions:

Although bias stemming from misspecified activity time probability models is rarely mentioned in introductory discussions, we have seen several instances of this bias in simple examples. First, and perhaps most important is the uncertainty as regards the underlying activity time probability models. The literature offers no less than five procedures for translating the subjective estimates (a,m,b) into specific β-distributions. As shown, the methods lead to distinct β-distributions, and the PERT approximation need not satisfactorily estimate any of them.

Essentially, they are saying “you can’t really know the right shape for a distribution curve of possible outcomes – so don’t get carried away.”  And they then go on to suggest using a simpler model than the beta distribution for estimation.  Their suggestion is to use a triangular distribution (looks like a triangle, instead of a bell curve), because the math is easy.  With spreadsheets, we can still do “easy” math with a better distribution

The beta distribution for a symmetrical PERT estimate looks very much like a normal distribution.  You can see this by comparing the cumulative distribution function of the PERT and normal models for this 2/4/6 example.

[larger image]

Based on this similarity in distribution, and the inherent uncertainty in what the shape of the distribution really looks like, there are some benefits to treating the PERT estimate (2/4/6) as a normal distribution where the high and low values represent +/- 3 sigma bounding.  This approximation provides us some benefits when aggregating individual task estimates into combined project estimates.

Continued on the next page…

24 thoughts on “Advanced PERT Estimation

  1. Pingback: Scott Sehlhorst
  2. Pingback: Yama
  3. Pingback: thepmp
  4. Pingback: Andre
  5. Pingback: Andre B.
  6. Pingback: david prince
  7. Pingback: david prince
  8. Pingback: Rolf Götz
  9. Thanks Scott.

    If you’re doing iterative development you might find interesting the calculators (“Release Calculator” and “Wiggle Room Calculator”) that you can find in the sidebar of my website Tweet me (@asplake) if you find them interesting/useful or have suggestions for others.


    1. Hey, thanks Mike. Pretty neat calculators.

      Somewhere (or some-when) I wrote up an analysis of agile estimation, based on Mike Cohn’s work. Maybe an old blog article that I can’t find, but I think it was something for a client. The key element was that using velocity to manage / predict “rough estimates” based deliverability distinctly from detailed (PERT) estimates for tasks made sense, because you still get the feedback loop that removes uncertainty from the estimation process. Can’t find the write-up.

      Long story short, I think the calculator can come in handy for that ‘planning estimation’ phase.

  10. Pingback: Brain Washer - PMP
  11. Pingback: Askar Baybuzov
  12. Hi Scott,

    Your articles about PERT are the best ones I have ever read! Very well explained. Your advanced article is great explaining why PERT estimates cannot be added. No other article I have read has mentioned this.

    I have looked all over internet to find a spreadsheet that calculates PERT correctly and doesn’t just add the estimates. Despite my efforts I have not been able to find a good one. Do you have a good spreadsheet or can recommend a good PM tool for this?


    Kind Regards

    1. Thanks, Cecilia!

      I don’t have a template-spreadsheet for doing this, I’ve just built them on the fly as projects required them. I’ll put one together for a future article. It will be tricky to come up with a general way to identify related tasks (versus unrelated tasks) – but I have an idea about how to do it. If you subscribe to the blog, you’ll get an email when it shows up.

      Thanks again,

  13. Hi Scott,

    How do I combine 3 or more correlated PERT estimates, e.g. 2/4/6 and 3/5/10 and 4/6/12 with rho=0.75?



    1. Hey Nic,

      Thanks for the question. What I’ve learned about correlation between distributions (e.g. the amount of shared-variation) has always been in terms of looking at two distributions, never three. So, you raised an interesting question. I sort of alluded to the answer (when I talked about repeatedly combining pairs of estimates), but definitely didn’t explain it.

      I’m assuming that your rho=0.75 is the same between A (2/4/6) and B (3/5/10), as it is between A and C (4/6/12), as it is between B and C.

      I couldn’t find any “real math” (e.g. from other people, or otherwise formally proven) to back me up on it, but my suspicion is that you just combine any pair (AB, BC, or AC) to create a new distribution, then combine that resultant distribution with the remaining initial set (C, A, or B, respectively).

      The combinations of distributions are combinations that input normal distribution approximations of the beta distributions (represented by the individual PERT estimates) and output normal distributions. So, the math should work for repeated combinations.

      I did a quick test, trying the above approach with all three sequences, and got the same result, regardless of the order in which I combined distributions.

      My resultant estimate for your example is 7.6/16.2/24.8 (std of 2.9).

      My approach was to determine the combined distribution for A+B -> AB (4.3/9.5/14.7) and for B+C -> BC (5.1/12.2/19.2) and C+A -> CA (5.0/10.7/16.3), then combine them again with C, A, and B, respectively. The second combination for ABC yielded 7.6/16.2/24.8 with all three approaches.

      That makes me feel confident that the order is independent.

      My “sniff test” makes sense too. Combining the distributions causess you to incorporate the first relationship (between A and B, in AB), but does not address the other correlations (between C and either A or B). That correlation is accounted for in the second combination of distributions.

      Does that help?

  14. Pingback: fifth.sentinel

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.