## Friday, August 31, 2018

### Probability of Ruin in Pictures

William Bengen calculated sustainable withdrawal rates (SWR) using historical S&P500 market returns since 1928 leading to the “4% Rule.”[1] More recently, Robert Shiller published stock market  returns data back to 1871 using the S&P Composite Index[2]. In this post, I’ll explore the “probability of ruin” using the more extensive Shiller data.

Probability of ruin is typically used in retirement planning to estimate the probability that a retiree will outlive her portfolio based on some set of assumptions such as a fixed planning horizon (often 30 years), market return expectations and a constant-dollar spending strategy.  Bengen studied rolling 10-, 20- and 30-year retirements using historical S&P500 market returns and a constant-dollar spending strategy[3].

He found that assuming a fixed 30-year retirement and annual withdrawals of 4% of the retiree’s portfolio value at retirement the worst-case historical scenario (someone retiring for 30 years beginning in 1966) would have depleted a portfolio in less than 30 years for about 5% of the rolling periods. Hence, the “4% Rule.”

The following chart shows the terminal portfolio value (TPV) after 30 years for a retiree spending \$42,000 (4.2%)  annually from an initial portfolio valued at \$1M for 110 overlapping thirty-year periods from 1872 to 1982. (Shiller’s data ends in 2012 so the last 30-year period began in 1982.) The red bars indicate years of retirement that funded less than 30 years.

(Click on the charts to zoom in.)

Six of the 110 periods (5.5%, the historical “probability of ruin”) were depleted in fewer than 30 years. TPV charts typically and reasonably assume a retiree’s portfolio can’t drop below zero but I continued withdrawals for the full 30 years to show the extent to which they failed. Another way to read this is that the deeper the red column, the sooner the portfolio was depleted.

Take a longing glance at those tall columns, the ones with really large terminal portfolio values. Then, compare them to the little stubby blue guys. Both are probability of ruin “successes”.

Probability of ruin assumes that you’ll be happy simply not retiring in one of those red years. You’re either in the 5% of scenarios that start a losing period or the 95% of winners and so as long as your bar turns out blue, you’re good, right?

Not really. Wouldn’t you be at least a little happier with a tall blue bar than a short, stubby blue bar, even though both avoid portfolio depletion? I would. Probability of ruin assumes that you’ll be just as happy successfully funding retirement and leaving a hundred bucks to your heirs as you would be leaving them a million. And, that you’d be as dissatisfied with a portfolio that funds 29 years as with one that only funds 15.

I wouldn’t. If a planner said, “Hey, great news! Your retirement is funded 95% of the time”, my response would be, “That sounds great but how well does it turn out when it is completely funded and how badly when it isn’t?”

Sequence risk affects all outcomes, sometimes positively and sometimes negatively. Probability of ruin flags only the worst outcomes. Probability of ruin is sort of an upside-down “tip of the iceberg” in that most of the information is hidden from view by condensing all that information into a single data point, the percentage of failures.

(For a better iceberg effect, turn your phone upside down while you view the chart below. If you’re reading this on an iMac or PC, probably better to just use your imagination.)

In Figure 2 below, I increased spending from 4.2% of initial portfolio value to 4.75% which, of course, creates more red bars indicating more depleted portfolios.

Note that the red bars appear in four distinct clusters in both Figures 1 and 2. A “95% probability of ruin” might suggest that ruin appears sporadically about every 20 years (5% of periods). It does not, although that is how sequence risk is most often (incorrectly) modeled.

When I increase spending to 5.5%, the result is even more red bars, as expected, but they’re still all within those four clusters. Ruin isn’t a uniformly-distributed event. Probability of ruin is quite high in certain periods of economic distress but relatively low any other time.

Here's an analogy. Kentucky averages about 12 snowfall days per year but we don’t predict snowfall in July. It’s more likely to snow in winter in Kentucky and high sequence risk is more likely to deplete a portfolio when spending starts in an "economic winter". Many models of sequence risk predict snow in July.

Unless you retired just prior to the Panic of 1910, the Great Depression, a bad 1937 bear market (squeezed between two really good market years, by the way) or during the inflationary 1965 to 1975 period, the 4% Rule would not have depleted your portfolio. Unfortunately, these periods are not predictable. The jury is still out on the 2000s.

Probability of ruin in pictures via @Retirement_Cafe.
[Tweet this]

In the next chart, Figure 3, the y-axis scale changes from \$M to \$K so we can better see the near misses. I arbitrarily set the definition of success in this test to include TPVs greater than \$150,0000 and the definition of failures to include TPVs worse than -\$150,000. My reasoning is that given the margin of error in a 30-year retirement plan these scenarios might have gone either way IRL (in real life, as Millennials say). This is arbitrary but so is drawing the failure line at precisely zero dollars and this definition factors in more of the uncertainty of the analysis.

Note the number of portfolios that barely avoided depletion (3) and the number that very nearly avoided depletion (2). If we omit these five scenarios from the calculation because they are too close to call, the probability of ruin becomes 3.8% instead of 5.5%. That’s more than a 30% change in the estimate of ruin and represents a big change in sustainable spending.

I'm not advocating ignoring these data but simply viewing them in three categories instead of two: probably succeeded, probably failed and too-close-to-call, based on our degree of confidence in the outcomes.

When you have only a few failures, a few close calls make a large difference in probability of ruin.  Portfolio’s that come up just a little short probably aren’t losers and a small bequest left to heirs is probably too close to call a winner, as well. Thinking we can predict a 30-year retirement much more accurately than plus or minus a few years is overconfidence.

Why do I question “near misses”? Because they probably would have funded most of the 30 years. Only 6% of men and 13% of women aged 65 live another 30 years and all of those who died sooner would have successfully funded their retirements in these scenarios.

The following chart, Figure 4, brings bear markets (the yellow bars) into the picture.

Retirees are often told that retiring into a bear market is deadly, but bear markets don’t appear to be particularly highly correlated with failing portfolio periods. Robert Shiller doesn’t even consider the 1960’s and 1970’s to be bear markets because they were so gradual[4]. Paint those bars blue and the correlation of bear markets to portfolio ruin is even less obvious.

If portfolio depletion isn’t necessarily caused by bear markets, what does cause it? The EarlyRetirementNow.com website found that the sustainable withdrawal rate is nearly completely explained by portfolio returns for the first five and first ten years of 30-year periods.[5] This explains SWR but not ruin — portfolio depletion is completely explained by sequence risk.

Nonetheless, a chart of SWRs is informative. Figure 5 shows the SWRs that would have depleted a portfolio in precisely 30 years from 1872 to 1982.

This is the view of the iceberg below the surface. Sustainable withdrawal rates that deplete portfolios in precisely 30 years are unpredictable and vary widely from 3.8% to 12.6% historically.

Figure 5 above provides a visual explanation of the “4% Rule” probabilist school of retirement finance. That approach recommends spending the amount that would only fail in no more than 5% of retirement periods. Using the Shiller data, that amount of spending would be about 4.2% of initial portfolio value.

There are two potential risks with this strategy. The obvious one is that you might fall into the unlucky 5% (one in twenty) and outlive your savings but an equally important concern is that you would almost always underspend. All of the blue bars above the red line represent underspending. You would have spent 4.2% if you retired in 1950, planning to live 30 years, for example, when you could have spent 11.8%. Of course, you couldn’t have known that in 1950.

Some planners have suggested that sequence risk goes away after 10 years. Alas, it does not. The following chart shows the value of portfolios at the end of the first 10 years for historical data.

The smallest TPV after 10 years was \$340,000 (retirement in 1973) and the largest was \$3.8M (1949). Surely the latter has less sequence risk ten years into retirement.

If both scenarios are assumed to complete the remaining 20 years of a 30-year retirement and both continue to spend the \$42,000 they calculated as sustainable back in year one, the larger portfolio would have survived all rolling 20-year historical periods with continued annual spending of 1.1% (42,000 / 3,800,000), while the smaller portfolio would have failed nearly all of those periods with 12.4% annual spending (42,000 / 340,000).

Sequence risk might appear to go away after 10 years from the perspective of the start of a 30-year period but after 10 years much will have changed. Sequence risk will change accordingly and become greater or smaller. We can’t know which.

As I mentioned above, the EarlyRetirementNow blog found that the returns for the first 5 years of a 30-year retirement best explain the sustainable withdrawal rate.  Figure 7a shows 5-year annualized market growth rates with the same time period on the x-axis. The panel below, Figure 7b, shows 30-year TPV with portfolio failures in red in the top chart. Note how well very low growth rates for the next five years align with portfolio depletion.[6]

Portfolio failures are caused by poor market returns early in a series of returns. The low returns can result from a quick, precipitous shock like The Crash of October 1929, from a single terrible year of returns like 1937, or from a long, gradual sideways series of mediocre real returns like 1966 to 1975.

These growth rates are explanatory, not predictive. In these charts we are explaining the past, not predicting the future. We have no idea what the next five years of market returns will bring but we can see that low early returns — sequence risk — are not a good way to start.

To summarize, probability of ruin is an interesting rule of thumb with severe limitations. Sequence risk affects all portfolios from which the retiree periodically spends but probability of ruin only measures the extreme outcomes, those that result in premature portfolio depletion. It treats all failures alike and all success alike, ignoring the extent of the success or failure. The thin line separating success from failure is arbitrary. It hides the extent of success and the extent of failure.

Portfolio ruin isn’t sporadic and doesn’t uniformly occur once every 20 years or so as a 5% failure rate might imply. Most of the time, sequence risk is quite low but during major economic upheavals, it occurs in bouts.

Models of probability of ruin are not robust. They provide a significantly different answer every time they are run even when nothing changes except the Monte Carlo random number draw.

Probability of ruin is based on some strange assumptions about human behavior, like assuming we will continue to spend the same amount when ruin becomes apparent or that we don’t care how much wealth we have as long as it’s more than zero. It’s also based on less than five unique sequences of 30-year historical returns, a truly small sample.

Put all this together and probability of ruin looks like a very poor metric by which to predict, model, or manage retirement finances.

REFERENCES

[2] Annual Data on US Stock Market, Robert. J. Shiller.

[3] This analysis uses the S&P Composite Index, data from 1871 to 2012, and 100% equity allocation.

[5] The Ultimate Guide to Safe Withdrawal Rates – Part 15Early Retirement Now blog.

[6] The market grew about 0% from 1927 to 1931 as shown in the bottom panel, for example, and portfolios with spending beginning in 1927 failed sooner than 30 years with 4.75% spending, as the top chart shows.