Saturday, September 28, 2019

The Prevalent but Problematic Probability of Ruin

About 10 years ago, in the course of a conversation with two retirement researchers whom I greatly respect, someone mentioned the 4% Rule. One of those researchers said, "William Bengen did great work showing us that sequence risk exists but trying to turn it into a retirement plan was a huge mistake."

Bengen's work gave us the 4% Rule, derived from the so-called probability of ruin. Probability of ruin, or p(ruin) for short, is the estimated probability that a retiree spending a fixed real dollar amount from a volatile portfolio will outlive her portfolio. Somehow, despite its many shortcomings, p(ruin) has become the most common metric in retirement planning.

The 4% Rule provides a "sustainable withdrawal rate" (SWR) that a retiree can supposedly spend from a volatile portfolio with a 95% probability of not outliving his savings. How much is the SWR? Bengen estimated a range around 4.4%. Wade Pfau, Michael Finke and David Blanchett[1] found that the SWR is currently closer to 3%, primarily due to a low-interest-rate regime. If they are correct, that would result in annual withdrawals nearly 32% lower than Bengen's estimate. That's quite a range.

Some question the implications of that research, notably Michael Kitces, but interestingly, William Bengen believes that valuations are probably important and that "Pfau may be on to something."

The Shiller CAPE 10 ratio[2], a measure of stock market valuation, was around 10 when Bengen's data series began in 1926 and today suggests a much higher market valuation of around 30. A higher CAPE 10 suggests lower future market returns and vice versa. Had the market return data series studied by Bengen begun when valuations were relatively high, the results may have suggested a lower SWR. (It is not uncommon for economics studies to improperly ignore initial conditions like market valuations.)

I will toss yet another monkey wrench into these analyses and note that both studies make assumptions about future asset returns so neither can be proven to be correct ex-ante. Still, Pfau et al.  provides evidence that Bengen's SWR may be overestimated. This uncertainty is the essence of risk.

What are these shortcomings of p(ruin)? Let's start with p(ruin) being a one-dimensional measure of risk. By that I mean it estimates the probability (risk) of outliving a consumption portfolio, which I will define as a volatile portfolio of investments from which a retiree withdraws cash periodically to pay his bills, without measuring the magnitude of that risk.

Some research I'm currently coauthoring serves as an example. We compare two consumption-portfolio spending strategies. Each estimates a p(ruin) near 5%. On this basis, we would say that the two strategies are equally risky. However, when scenarios fail using the first strategy, the mean number of underfunded years is about 15. When scenarios fail using the second strategy, the mean number of underfunded years is about 21. The second strategy is riskier because when it fails, it leaves the retiree underfunded for 6 more years on average. This magnitude of risk isn't captured by p(ruin).

Another problem with p(ruin) is that it is based on a very limited sample of historical equity returns. Robert Shiller has reconstructed equity returns back to 1871, providing a little less than 150 years of data but this historical data contains very few unique long-term sequences of returns of 30 years or more that we need for retirement studies. We simply don't have enough data to draw statistically significant conclusions about the future probability of ruin. Many argue that only the more recent years of Shiller's historic returns are truly reliable.

Researchers have tried multiple strategies to get around this lack of data. Bengen used overlapping 30-year periods of returns. This strategy is flawed because the first and last years of the equity return time series are each used only once, the second and next-to-last twice, etc., while the returns in the middle of the series are included up to 30 times.

Another strategy is to generate 30-year series of returns by resampling, or randomly choosing returns from the entire historical data set with replacement. This strategy will provide results similar to the experience of the handful of available unique historical 30-year sequences of returns but doesn't generate "out-of-sample" series.

In other words, it assumes that the limited number of 30-year historical periods of data we have contain all of the information we will ever need to know about future market returns. It is more likely that the future will likely throw something at us that we have never seen before. Said a third way, our limited amount of historical long-term data series has very little predictive power. It can only tell us what might happen in the future if the future is very much like our limited past.

Let's focus now on a term I just introduced, "sequence of returns." The success or failure of a consumption portfolio is primarily a function of the sequence of the portfolio returns and not on the returns themselves. To quote BigErn at, "Precisely what I mean by SRR (sequence of returns risk) matters more than average returns: 31% of the fit is explained by the average return, an additional 64% is explained by the sequence of returns!"[4]

While we can generate realistic market returns from historical data using statistical methods like resampling, we cannot capture the most important characteristic of that data relative to portfolio ruin, the sequence of those returns. Resampling and most Monte Carlo models simply create random uniform sequences of returns and these are often quite unlike the few long sequences we observe from historical data.

This leaves two possibilities. One possibility is that the sequence of market returns is truly purely random as we most commonly model, in which case we have been extremely lucky not to have received a catastrophic sequence of returns over the past 150 years. Another possibility, and the one I favor is that sequences of returns are not purely random but are limited by market forces that we don't yet understand. In that case, we may never see catastrophic sequences of returns but our models are wrong.

I can't leave this topic without noting that consumption-portfolio failure doesn't require really bad negative returns. A long sequence of sub-par returns will do the trick. The worst-case series of 30-year returns beginning in 1964 that defines the 4% rule was simply a long period of mostly-positive but mediocre real returns.

Not long after the Great Recession, some SWR advocates were quick to note that the market had rebounded rather quickly, supporting the idea of a 4.5% SWR. While this is true, there are two important caveats. First, consumption portfolios recover much more slowly than a market index because we aren't spending from the market index. Second, the Great Recession was a three-year sequence and, as I note in the previous paragraph, portfolio failure typically results from long periods of mediocre returns and not short periods of negative returns.The Great Recession may not portend future portfolio failure for today's recent retirees.

Lastly, I think it is important that we consider the ability of humans to "internalize" probabilities. Clearly, there are some of us like Nate Silver, who can see a probability and intuitively interpret it. Most of us can't.

Most people tend to round small percentages to zero and large percentages to 100. The 2016 presidential election is a perfect example. On November 9, 2019, Nate Silver published a prediction that Trump had a 28.6% probability of winning the election and Hillary Clinton had a 71.4% probability. Many read this and concluded that Trump had no chance of winning, i.e., they rounded 28.6% to zero and 71.4% to 100%. When Trump won, they were outraged at Silver. I saw a poster at the Women's March saying, "I will never believe Nate Silver again."

The election was a one-time event and clearly not random. Silver's probabilities weren't based on counting who won past elections between Trump and Clinton. They represented Silver's belief that these were the odds and he believed that Trump's chances of winning were significantly greater than zero. It appears that many people didn't understand that.

This raises the issue of one-time events like a presidential election or your retirement. It's simple enough to look at a roomful of one hundred 65-year olds and say that a 4% Rule strategy means five of them will outlive their savings but it is impossible to say in advance which ive it will be. It is, therefore, difficult to internalize what 5% of retirees outliving their savings translates to your individual probability of failure.

(This is a poor analogy in one sense but I hope it makes the point. The 4% Rule says that 5% of 30-year periods will result in a failed portfolio, so if everyone in that room were 65 years old, they presumably all would go broke or none would. They will all experience the same future market returns.)

Your retirement differs from the 2016 election, although both are one-time events. We can use historical market data to count how often you might have succeeded in the past, given some withdrawal rate. The problem is that we don't have nearly enough of that data. Even if we did, we could only predict how many retirees would fail and not whether you would be one of them.

The point of our ability or inability to intuitively understand probabilities is that many people will round a 5% chance of ruin to zero and feel perfectly safe, while others (like me) will feel that a 1-in-20 chance of ending up destitute in their dotage is completely unacceptable. In either case, p(ruin) is frequently problematic because of our inability to intuit it.

There are a couple of other shortcomings of p(ruin) that I will briefly mention in conclusion. Many argue that no retiree would ever do what the 4% rule requires, that is, to continue to spend the same amount from a consumption portfolio even when it is obviously failing. First of all, I would note that if the retiree doesn't do this, then the 4% Rule is not predictive at all because the retiree isn't adhering to the strategy but I also have anecdotal evidence that there are rational reasons a retire would continue spending the same amount.

At some point, a retiree with a failing portfolio will reach an amount of spending that is necessary to meet non-discretionary expenses and spending too much to pay necessary expenses will be the rational response even if it will undoubtedly lead to portfolio depletion in the near future (see Why a Rational Retiree Might Keep Going Back to that ATM).

If the 4% Rule says I can spend no more than $1,000 or else I will probably go broke in the near future but my necessary expenses total $1,500, I will spend the $1,500. In this scenario of continued fixed spending, portfolio behavior is either chaotic or behaves chaotically and it doesn't matter much which (see Retirement Income and Chaos Theory).

Economist, Laurence Kotlikoff believes the 4% Rule estimates both the wrong amount to save and the wrong amount to spend compared to an economics approach. He explains it better than I could in The 4% Retirement-Asset Spend-Down Rule Is Rubbish.[5]

Lastly, probability of ruin is a number that we intentionally try to make as small as practical. It's a measure of "tail risk", or the area of low-probability outcomes of a model. Nassim Taleb, in testimony before Congress no less[6], stated that "the more remote the event, the less we can predict it." Taleb goes on to say, "Financial risks, particularly those known as Black Swan events cannot be measured in any possible quantitative and predictive manner; they can only be dealt with non-predictive ways." But, predicting unlikely events is precisely what p(ruin) purports to do.

The 4% Rule has achieved cult status to the extent that I hear retirees with virtually no other knowledge of retirement finance casually refer to it as if it is a universal law. It is not. It is a questionable but unfortunately prevalent retirement finance metric.

A better approach is recommended by life-cycle economics (see, for example, Risk Less and Prosper by Zvi Bodie), sometimes referred to as "safety-first." The safety-first strategy is to assume that portfolio failure is a (perhaps) small — Taleb would say unquantifiable — probability of an unacceptable outcome. It deals with the risk of portfolio depletion "in non-predictive ways." The retiree is encouraged to plan for an acceptable standard-of-living in the event of that outcome without having to roll the dice and simply hope the future looks a lot like the past.


[1] The 4 Percent Rule Is Not Safe in a Low-Yield World , Michael Finke, Ph.D., CFP®; Wade D. Pfau, Ph.D., CFA; and David M. Blanchett, CFP®, CFA.

[2] Shiller PE Ratio,

[3] Online Data, Robert Shiller, Yale Economics.

[4] The Ultimate Guide to Safe Withdrawal Rates – Part 15: More Thoughts on Sequence of Return Risk,

[5] The 4% Retirement-Asset Spend-Down Rule Is Rubbish, Laurence Kotlikoff,

[6] The Risks of Financial Modeling: VAR and the  Economic Meltdown, House Subcommittee on Investigations and Oversight, GPO.