Failure To Launch (Your Product)

Jump forward in time to the day of your next big product launch (first release, new features, new market segment, etc).  And your site/application crashes due to the “unexpected” demand.  All you can do now is look for a bucket of water to put out the fire.  What could you have done to prevent this disaster?  Jump back to today and start doing it!

Backwards Planning

Depending on how you look at things, this is a backwards planning exercise, or a variation of the  remember the future innovation game, or risk management, or proactive product management.  You can avoid a disaster by imagining what might happen, then hypothetically figuring out why it (would have) happened.  That leads to planning how you could prevent it.  And now you’ve left the dream world of a Gedanken experiment and returned to the real world of product management.

Problem Triage

The way to approach this is straightforward.  Imagine some failure scenarios and the importance of preventing them:


  1. Imagine a failure scenario.
  2. “Predict” the likelihood of the failure.
  3. “Estimate the impact of the failure.
  4. Repeat for each scenario

You can prioritize your failure scenarios by multiplying the likelihood of each with the impact of each, and sorting them from largest to smallest.  Then determine which ones you’re willing to address, and which ones you’re willing to risk.  You may not be able to predict the likelihood of some failures (at least until you do a root cause analysis).  Take each of these and put them directly above the scenario with the next highest impact.  The rationalle is that these are so bad, that you really want to find out how likely they are to happen.  Once you predict likelihood (see below) you can reprioritize.

Root Cause Analysis

For the failure scenarios you choose to address, the next step is to do a root cause analysis that identifies why it might have happened.  The best tool for capturing this analysis is an Ishikawa diagram.  Consider that one problem you might face is your website crashing.

[click for larger version]

Essentially, you can crash your site by having too many users, too many concurrent users, or too many concurrent sign-ups.  Developing a cause-and-effect diagram (another name for an Ishikawa diagram) is usually an iterative and exploratory process.  You probably won’t create the simple version above first.  You may ask your implementation team “What can cause the website to crash?”  For each of their answers, you identify when that situation can happen.  Or you start top down.  Most likely, a mix of the two.  Your completed root cause analysis may look like the following:

[click for larger version]

At this point, your team can probably predict many (maybe all) of the root causes of a website crash.  The predictions may be conditional – “we can handle 10 concurrent users, but 20 probably kill us, and 100 definitely would.”  Developers are notoriously good at answering questions with conditional statements that reveal the nuances of their thinking.

Remember that you’re looking back from the future.  At product launch, what are you hoping for / reasonably expecting?  For this example, assume it is 10,000 total users, with 100 concurrent users (normally) and 500 concurrent signups.  You determine these numbers by working with your PR, marketing, or mar-com people (or wearing those hats, when it is all you).  Your plan is to do a big launch with a demo and a promo code for signup.  You know your audience will have internet connections, and will have twitter running at the time of your presentation.  You expect/dream of an immediate burst of signups, followed by tweets and word of mouth, and eventually blog articles causing additional growth over the next couple of weeks.

Use this data to feed back into the developer’s conditional responses.  If you’re like me, you will have found “absolute certainty of failure” from something.  And you may have even identified the thresholds for each element.  For example, database loading can handle 75 concurrent users, but with the current implementation, you only have enough database connections available to support 25 concurrent users.

Jumping back to the present, you now have some very discrete, and very important things to do before your launch.  If you need to, revisit the prioritized list of failure scenarios.  By looking at the next level of detail, have you found that the order of importance (to fix) has changed?  What about the “must fix” versus “willing to risk” line?  Has it moved?

Fold the “must fix” items into your backlog, and prioritize them relative to the other capabilities on your roadmap.  As a side note – make sure you’ve built in some testing to make sure you actually prevent the problems.  This might even be a great opportunity to implement “performance regression tests” – it is not enough to prevent bugs, you have to prevent slowdowns.

Rethinking the Problem

Without going into details on how the team will solve each problem, make sure that together you keep the Ishikawa diagrams in mind, and see how any proposed solutions might “reappear” on the diagram.  For example, rewriting your database connections to use asynchronous processes and a set of pooled connections may prevent a crash, but it may really hurt performance.  You may not have time to find an elegant solution.  So stop and rethink the problem.

At this point, you’ve said

  1. Given a marketing plan / launch strategy, we would crash the website.
  2. We can make changes between now and the launch that will double the number of concurrent users we can support (or whatever), but that is not enough to support the launch strategy.
  3. Solution: Change the launch strategy.

Maybe you can’t support a wide-open promo-code based signup.  You should modify your launch so that it can only create as much demand as your product (including pending improvements) can support.  Maybe you limit it to the first 1,000 new users (probably more code to write to enforce the limit).  Maybe you launch with per-user invitations, where you can control the speed of propagation of invites (start with 100, when those have been sent, make another 100 available, etc).

Entire Team Problem

This is a problem that is solved collaboratively, by the entire team.  It is not just a “go write the code” problem.  What your product can support at a launch should drive how you choose to launch, just as how you choose to launch should drive what you want your product to support.  

You may have to delay a key capability in order to scale.  Does your marketing team know this?  Slightly less bad than crashing would be announcing a feature that is disabled.  Still need to announce the feature?  Pre-announce it: “Coming in a month…”

This stuff is important for every company and product, but it is especially critical for start-ups.  As a start-up, you have limited opportunities to grow, and a limited safety-net to catch you when you fail to capitalize on those opportunities.  So make sure everyone (not just the development team) is aligned to make the best use of each opportunity.


You have an opportunity to prevent problems.  All you have to do is imagine that they have happened in the future, figure out why they would have happened, then do what it takes (in software, or organizationally) to prevent them.

20 thoughts on “Failure To Launch (Your Product)

  1. Sure sounds like a Risk Exercise.
    1) Identify your potential risk events.
    2) Determine the probability of each event happening
    3) Establish the expected impact if the event occurs
    4) Score all events
    5) Develop mitigation events for the risks that keep you up

    Only problem is, identifying potential events or problems that might happen.
    “If only I’d seen it coming”

    1. Yup – back in my waterfall days, that was how we dealt with it too. The spin feels different when working with an agile team, but it is still the same stuff. Thanks, Val! The other difference is that now we explicitly fold stuff into the backlog and prioritize (helping the other parts of the company to course-correct if needed), instead of just tracking line items in a spreadsheet.

  2. Each of those problems identified in your Ishikawa diagrams is part of either your offer or your marketing. Regardless, they are part of your user’s experience. The user’s experience drives your P&L regardless of your profit models and business model.

    All of those problems are within the scope of the product manager’s responsibilities. They are well beyond the scope of development and marketing. They are why the product manager is the CEO of the (product or) offer. Some of the problem owners won’t feel like they are on your team, but in a price-based competition market, the offer broadens and reaches the desks of the non-customer contact people. You, as the PM, will need them, so start gaining influence with them now.

    Reach out and grab your organization, then reach further.

    1. Thanks David! Yes, these are definitely things that I consider to be “on my plate” when playing a product owner / product manager role. There are so many long term, and strategic things we focus on as product managers, that I thought it was important to highlight that some things that appear to be tactical really are strategic. I remember the epiphany I had (or heard, can’t remember, so I’ll assume “had”) when I was a technical consultant for an enterprise software provider: People don’t get credit for preventing fires, only extinguishing them. I definitely saw that in compensation and accolades, and I never liked it. The problem is that putting out a fire is just mitigating damage. Preventing it can be huge.

      Most of the start-ups I know of hire primarily talented but inexperienced people, who usually haven’t had the experiences to put this stuff on their radar. And the “gray hairs” they do have are usually too diluted by the other thousand things that have to happen for the company to succeed. If one company reads this, immediately thinks about their next big marketing event, and asks a couple questions to get the ball rolling, then this article is a “success.”

      Thanks again for reading and commenting!

  3. If you mitigate the problems you can think of, then you’ll have fewer to mitigate later. It’s the nature of proactivity that you mitigate things that might never happen. Reactive crisis management is no fun.

    In operations, ITIL change management can reduce staff beeper moments. They make less mistakes when they sleep more.

    Eventually, over a career, you’ll have built up a catelog of risks. You can put those in MS Project and just roll them in. With each new project, you explore risks yet mitigated or discovered. Have a risk day.

  4. I like this approach. I am a pessimist by nature and I spend a lot more time thinking about the bad things that might happen than the good things. This should be pretty easy for me to follow. Thanks.

  5. How true Risk and mitigation are so vital before the launch. I have been reading your posts for some time now, appreciate every article you’ve posted. Co Incidentally I am stuck in a similar situation, the launch is somewhere near and we have uncovered major roadblocks, my doubt is whether the roadblocks need to be cleared wholly and fully and put off the launch to a later date, or keeping the launch date the same as thought off earlier, and handling the roadblock in a controlled manner. Your thoughts on this are appreciated

  6. What’s interesting about your approach is that it’s very similar to how business continuity planners go about preparing an entire business to deal with the unexpected.

    The one additional point that I’d add to your list is that as a PM you need to test, test, test your ability to deal with a disaster should it occur.

    – Dr. Jim Anderson
    The Accidental PM Blog
    “Home Of The Billion Dollar Product Manager”

    1. Thanks, Dr. Jim! Yeah, I saw similar approaches to defining disaster recovery requirements for enterprise clients. What I didn’t see in those situations was a “compelling event” like a product launch as a means of working backwards into specific deliverables per release. If those had been agile projects, then yes, I guess we would have used the same approach.

      The other difference was that the disaster recovery priorities were focused on “be able to fail over” and not “predetermine our limitations / capabilities.” You’re right that it is a very similar thought process, though. Great observation. I wrote this article with one of my start-up clients in mind, where Maslov’s hierarchy of needs usually has the company focused on “get it built” and “find users.” Thanks for the insight.

  7. Hey Scott,

    “…assume it is 10,000 total users, with 100 concurrent users (normally) and 500 concurrent signups.”

    I am a little confused on why the product manager in this example didn’t know this information during the business proposal phase of market readiness, and therefore would have included user loads as a non-functional requirement for the development team ahead of building the software.

    Was it just a miss? If so, a launch risk mitigation strategy would certainly be appropriate to catch it. If those sorts of numbers were known ahead of time, though, I would believe that this is a good candidate for a non-functional requirement.

    1. Hey Patrick – great question. It wasn’t really a miss, just a change in the market – in the most recent case I’m thinking about. There was already a product launch (and load to near expected levels). There was a decision to add a public launch of some potentially compelling capabilities with some fanfare, added recently. As part of adding the new event, this new analysis was done based on new expectations. Basically revisiting the “business proposal phase” for the near term.

      These are definitely good candidates for non-functional requirements, and that’s how they have been handled.

  8. That’s exactly what risk management is. You project what may go wrong and look for ways to avoid the problem.

    I really like your approach to look for solutions not only in the code but also in the whole product management machine (including changing a launch date). It happens oh so often development is forced to launch the product which isn’t ready. Then it takes a lot more of effort to straighten things up than it would if all was done properly.

    1. @Pawel – thanks! In a start-up, it can be a lot easier to remember that the whole company is trying to achieve something. Larger companies often have too much overhead and reporting structures and middle-management stakeholders getting in the way of global optimization.

  9. Many of the failure scenario types referenced would be common problems expereinced by various members of the team. You would hope that the people working on the project would have these types of issues in mind, and raise the risks during the business requirements gathering phase. If they are discussed with the product owner, many of these risks can be mitigated initially, or at least be known upfront to not be a shock during testing, with the view of having everything working correctly for the final product.


    1. @David: Thanks for the comment! I have seen organizations with mature processes that have frameworks that catch all this stuff. I’ve also seen individuals raise these issues, as you suggest. I’ve also seen people raise these issues “too late” to really address them. Raising them early (gathering the inputs from the team, asking the probing questions) with 3 sprints worth of runway to address them made a huge difference for one of my clients. SXSW was a successful launch, with no signs of the “fail whale” and no need to say “we’ve been so popular that our site crashed.”

Leave a Reply

Your email address will not be published. Required fields are marked *