The Retirement Café: The Prevalent but Problematic Probability of Ruin

Saturday, September 28, 2019

The Prevalent but Problematic Probability of Ruin

About 10 years ago, in the course of a conversation with two retirement researchers whom I greatly respect, someone mentioned the 4% Rule. One of those researchers said, "William Bengen did great work showing us that sequence risk exists but trying to turn it into a retirement plan was a huge mistake."

Bengen's work gave us the 4% Rule, derived from the so-called probability of ruin. Probability of ruin, or p(ruin) for short, is the estimated probability that a retiree spending a fixed real dollar amount from a volatile portfolio will outlive her portfolio. Somehow, despite its many shortcomings, p(ruin) has become the most common metric in retirement planning.

The 4% Rule provides a "sustainable withdrawal rate" (SWR) that a retiree can supposedly spend from a volatile portfolio with a 95% probability of not outliving his savings. How much is the SWR? Bengen estimated a range around 4.4%. Wade Pfau, Michael Finke and David Blanchett[1] found that the SWR is currently closer to 3%, primarily due to a low-interest-rate regime. If they are correct, that would result in annual withdrawals nearly 32% lower than Bengen's estimate. That's quite a range.

Some question the implications of that research, notably Michael Kitces, but interestingly, William Bengen believes that valuations are probably important and that "Pfau may be on to something."

The Shiller CAPE 10 ratio[2], a measure of stock market valuation, was around 10 when Bengen's data series began in 1926 and today suggests a much higher market valuation of around 30. A higher CAPE 10 suggests lower future market returns and vice versa. Had the market return data series studied by Bengen begun when valuations were relatively high, the results may have suggested a lower SWR. (It is not uncommon for economics studies to improperly ignore initial conditions like market valuations.)

I will toss yet another monkey wrench into these analyses and note that both studies make assumptions about future asset returns so neither can be proven to be correct ex-ante. Still, Pfau et al. provides evidence that Bengen's SWR may be overestimated. This uncertainty is the essence of risk.

What are these shortcomings of p(ruin)? Let's start with p(ruin) being a one-dimensional measure of risk. By that I mean it estimates the probability (risk) of outliving a consumption portfolio, which I will define as a volatile portfolio of investments from which a retiree withdraws cash periodically to pay his bills, without measuring the magnitude of that risk.

Some research I'm currently coauthoring serves as an example. We compare two consumption-portfolio spending strategies. Each estimates a p(ruin) near 5%. On this basis, we would say that the two strategies are equally risky. However, when scenarios fail using the first strategy, the mean number of underfunded years is about 15. When scenarios fail using the second strategy, the mean number of underfunded years is about 21. The second strategy is riskier because when it fails, it leaves the retiree underfunded for 6 more years on average. This magnitude of risk isn't captured by p(ruin).

Another problem with p(ruin) is that it is based on a very limited sample of historical equity returns. Robert Shiller has reconstructed equity returns back to 1871, providing a little less than 150 years of data but this historical data contains very few unique long-term sequences of returns of 30 years or more that we need for retirement studies. We simply don't have enough data to draw statistically significant conclusions about the future probability of ruin. Many argue that only the more recent years of Shiller's historic returns are truly reliable.

Researchers have tried multiple strategies to get around this lack of data. Bengen used overlapping 30-year periods of returns. This strategy is flawed because the first and last years of the equity return time series are each used only once, the second and next-to-last twice, etc., while the returns in the middle of the series are included up to 30 times.

Another strategy is to generate 30-year series of returns by resampling, or randomly choosing returns from the entire historical data set with replacement. This strategy will provide results similar to the experience of the handful of available unique historical 30-year sequences of returns but doesn't generate "out-of-sample" series.

In other words, it assumes that the limited number of 30-year historical periods of data we have contain all of the information we will ever need to know about future market returns. It is more likely that the future will likely throw something at us that we have never seen before. Said a third way, our limited amount of historical long-term data series has very little predictive power. It can only tell us what might happen in the future if the future is very much like our limited past.

Let's focus now on a term I just introduced, "sequence of returns." The success or failure of a consumption portfolio is primarily a function of the sequence of the portfolio returns and not on the returns themselves. To quote BigErn at EarlyRetirementNow.com, "Precisely what I mean by SRR (sequence of returns risk) matters more than average returns: 31% of the fit is explained by the average return, an additional 64% is explained by the sequence of returns!"[4]

While we can generate realistic market returns from historical data using statistical methods like resampling, we cannot capture the most important characteristic of that data relative to portfolio ruin, the sequence of those returns. Resampling and most Monte Carlo models simply create random uniform sequences of returns and these are often quite unlike the few long sequences we observe from historical data.

This leaves two possibilities. One possibility is that the sequence of market returns is truly purely random as we most commonly model, in which case we have been extremely lucky not to have received a catastrophic sequence of returns over the past 150 years. Another possibility, and the one I favor is that sequences of returns are not purely random but are limited by market forces that we don't yet understand. In that case, we may never see catastrophic sequences of returns but our models are wrong.

I can't leave this topic without noting that consumption-portfolio failure doesn't require really bad negative returns. A long sequence of sub-par returns will do the trick. The worst-case series of 30-year returns beginning in 1964 that defines the 4% rule was simply a long period of mostly-positive but mediocre real returns.

Not long after the Great Recession, some SWR advocates were quick to note that the market had rebounded rather quickly, supporting the idea of a 4.5% SWR. While this is true, there are two important caveats. First, consumption portfolios recover much more slowly than a market index because we aren't spending from the market index. Second, the Great Recession was a three-year sequence and, as I note in the previous paragraph, portfolio failure typically results from long periods of mediocre returns and not short periods of negative returns.The Great Recession may not portend future portfolio failure for today's recent retirees.

Lastly, I think it is important that we consider the ability of humans to "internalize" probabilities. Clearly, there are some of us like Nate Silver, who can see a probability and intuitively interpret it. Most of us can't.

Most people tend to round small percentages to zero and large percentages to 100. The 2016 presidential election is a perfect example. On November 9, 2019, Nate Silver published a prediction that Trump had a 28.6% probability of winning the election and Hillary Clinton had a 71.4% probability. Many read this and concluded that Trump had no chance of winning, i.e., they rounded 28.6% to zero and 71.4% to 100%. When Trump won, they were outraged at Silver. I saw a poster at the Women's March saying, "I will never believe Nate Silver again."

The election was a one-time event and clearly not random. Silver's probabilities weren't based on counting who won past elections between Trump and Clinton. They represented Silver's belief that these were the odds and he believed that Trump's chances of winning were significantly greater than zero. It appears that many people didn't understand that.

This raises the issue of one-time events like a presidential election or your retirement. It's simple enough to look at a roomful of one hundred 65-year olds and say that a 4% Rule strategy means five of them will outlive their savings but it is impossible to say in advance which ive it will be. It is, therefore, difficult to internalize what 5% of retirees outliving their savings translates to your individual probability of failure.

(This is a poor analogy in one sense but I hope it makes the point. The 4% Rule says that 5% of 30-year periods will result in a failed portfolio, so if everyone in that room were 65 years old, they presumably all would go broke or none would. They will all experience the same future market returns.)

Your retirement differs from the 2016 election, although both are one-time events. We can use historical market data to count how often you might have succeeded in the past, given some withdrawal rate. The problem is that we don't have nearly enough of that data. Even if we did, we could only predict how many retirees would fail and not whether you would be one of them.

The point of our ability or inability to intuitively understand probabilities is that many people will round a 5% chance of ruin to zero and feel perfectly safe, while others (like me) will feel that a 1-in-20 chance of ending up destitute in their dotage is completely unacceptable. In either case, p(ruin) is frequently problematic because of our inability to intuit it.

There are a couple of other shortcomings of p(ruin) that I will briefly mention in conclusion. Many argue that no retiree would ever do what the 4% rule requires, that is, to continue to spend the same amount from a consumption portfolio even when it is obviously failing. First of all, I would note that if the retiree doesn't do this, then the 4% Rule is not predictive at all because the retiree isn't adhering to the strategy but I also have anecdotal evidence that there are rational reasons a retire would continue spending the same amount.

At some point, a retiree with a failing portfolio will reach an amount of spending that is necessary to meet non-discretionary expenses and spending too much to pay necessary expenses will be the rational response even if it will undoubtedly lead to portfolio depletion in the near future (see Why a Rational Retiree Might Keep Going Back to that ATM).

If the 4% Rule says I can spend no more than $1,000 or else I will probably go broke in the near future but my necessary expenses total $1,500, I will spend the $1,500. In this scenario of continued fixed spending, portfolio behavior is either chaotic or behaves chaotically and it doesn't matter much which (see Retirement Income and Chaos Theory).

Economist, Laurence Kotlikoff believes the 4% Rule estimates both the wrong amount to save and the wrong amount to spend compared to an economics approach. He explains it better than I could in The 4% Retirement-Asset Spend-Down Rule Is Rubbish.[5]

Lastly, probability of ruin is a number that we intentionally try to make as small as practical. It's a measure of "tail risk", or the area of low-probability outcomes of a model. Nassim Taleb, in testimony before Congress no less[6], stated that "the more remote the event, the less we can predict it." Taleb goes on to say, "Financial risks, particularly those known as Black Swan events cannot be measured in any possible quantitative and predictive manner; they can only be dealt with non-predictive ways." But, predicting unlikely events is precisely what p(ruin) purports to do.

The 4% Rule has achieved cult status to the extent that I hear retirees with virtually no other knowledge of retirement finance casually refer to it as if it is a universal law. It is not. It is a questionable but unfortunately prevalent retirement finance metric.

A better approach is recommended by life-cycle economics (see, for example, Risk Less and Prosper by Zvi Bodie), sometimes referred to as "safety-first." The safety-first strategy is to assume that portfolio failure is a (perhaps) small — Taleb would say unquantifiable — probability of an unacceptable outcome. It deals with the risk of portfolio depletion "in non-predictive ways." The retiree is encouraged to plan for an acceptable standard-of-living in the event of that outcome without having to roll the dice and simply hope the future looks a lot like the past.

REFERENCES

[1] The 4 Percent Rule Is Not Safe in a Low-Yield World , Michael Finke, Ph.D., CFP®; Wade D. Pfau, Ph.D., CFA; and David M. Blanchett, CFP®, CFA.

[2] Shiller PE Ratio, Multpl.com.

[3] Online Data, Robert Shiller, Yale Economics.

[4] The Ultimate Guide to Safe Withdrawal Rates – Part 15: More Thoughts on Sequence of Return Risk, EarlyRetirementNow.com.

[5] The 4% Retirement-Asset Spend-Down Rule Is Rubbish, Laurence Kotlikoff, Forbes.com.

[6] The Risks of Financial Modeling: VAR and the Economic Meltdown, House Subcommittee on Investigations and Oversight, GPO.

9 comments:

AnonymousOctober 1, 2019 at 11:07 AM
I found your blog recently and have read through your posts with interest. I appreciate your sharing your research and insights with those of us hoping to retire with a good degree of security. Interestingly, with regard to Kotlikoff's article, when I put my data into his MaxFi planner, it comes up with about an 8% withdrawal rate when Social Security is included in the calculations. (I'm looking at a 30 year retirement). When I subtract out SS, it reduces to about a 4.2% withdrawal rate, not so different from the 4% rule. Of course, I take his point, that it entirely depends on individual circumstances. In my case his Monte Carlo gives a 1 in 510 trajectory of failure with a 50:50 equity:bond split. The lower standard of living in the worst trajectories that don't fail is ca. 2.5% of assets, which is within my ability to maintain but might not be for others. Clearly, this requires more attention than blind application of a SWR rule.
ReplyDelete
Replies
AnonymousOctober 2, 2019 at 10:11 AM
I have viewed the entire recent usage of the 4% safe withdrawal rate in the financial planning community to be quite frustrating. I think it highlights many of the challenges going on in planning for retirement.

I think the 4% Rule developed by Bill Bengen in 1994 to be one of the most important findings in personal finance, up there with Charlie Ellis's "The Loser's Game" and Bogle's development of the S&P 500 index fund. Not because it was "right" but because it uncovered the fundamental sequence of returns risk in retirement income which was generally not previously considered.

Bill Bengen moved on with publishing additional articles and by 2001 was leaning towards a dynamic "floor and ceiling" approach as appropriate for most cases. Unfortunately, most of personal finance doesn't appear to have moved on with him but are still citing his initial 1994 finding of 4% as a "Rule" despite its data set limitations as people like Pfau have been pointing out as they update it using recent data.

Another major issue with it is the data available to Bengen was seriously limited (which he acknowledges). Very few portfolios are structured just with the Ibbotson database of US common stocks and Treasury bonds. Most people's portfolios now will have a significant percentage of foreign stocks. Bengen tried to model that in thel late 90s and concluded the data was insufficient to do much modeling. We have some additional data now, but it is still a pretty small data set to look at 30 year time frames. It is also typically from the 1970s to now and misses many of the big shocks that would have impacted international investing in the 20th century (world wars, rise of communism, nationalism of industries in some countries etc.). So ultimately, inclusion of international stocks and bonds is likely to be beneficial if sufficiently diversified, but is generally more of an article of faith than data-driven.

So in general, my perspective of the 4% safe withdrawal rate is it is a useful tool for off-the-top of the head evaluation of a potential long-term income from a portfolio with a certain current value, but should be viewed with great skepticism for use. It is basically a crude estimate based on a limited data set of a limited portion of a typical portfolio, so it will have significant error bars around any conclusions. In today's low interest rate, high stock valuation environment, it might even be dangerously over-optimistic and could lead people to ruin (we'll find out in 30 years).

My big disappointment with the personal finance industry is that it generally appears to keep flogging this over-simplistic this old, disreputable tool as the Gold Standard. It appears to be just a few solo practitioners and academics that are really looking seriously at realistic and usable alternatives. Its almost as if the medical professional were still resorting to bleeding for addressing many maladies.
ReplyDelete
Replies
David MerkelOctober 2, 2019 at 1:22 PM
I've written about this topic a lot, and my shorthand anwer on withdrawal rates is that they can't be too far above 1% over the 10-year Treasury yield. If you want the link, I can provide it. When the CIO of the JOhns Hopkins endowment heard that one, she said "I have enough problems getting the trustees to think lower than 4%."

But I really came to write about return sequences, and why resampling is bogus. It is just another way of say the future will be like the past, but ignoring valuations which create a degree of mean reversion. The sequence of returns is not random. Processes that generate returns are not random. We may not be able to forecast the returns, but they are not random. It also assumes that the macroeconomics of the past will be that of the future.

Simulation analyses are tough, and almost always reflect the idea that the future will be like the past, with some slight modification like the current yield curves and valuations.

Good article.
ReplyDelete
Replies
AnonymousOctober 6, 2019 at 1:19 PM
Your example of the Trump-Clinton election highlights the fact that in the case of "safe withdrawal rate" economists are providing probabilities whereas would-be retirees are looking for accurate predictions applicable to them. From a personal viewpoint a one in twenty likelihood of bankruptcy is very scary, but in population terms that meets the standard for reliable advice.

The problem is using a one component model (a single withdrawal rate) for a complicated situation. If you turn the situation on its head, while a 5% probability of failure to meet requirements is the scary thing to consider, actually that also represents something like a 90% percent probability of failure by having worked longer than you needed to create a retirement fund (or forgoing more while working in order to save into the fund).

Unfortunately having two components will greatly complicate model building, let alone provide advice that can be summarised in a headline. But surely it is time the experts moved to trying it – despite the shortcomings of back testing on overlapping historical periods – to establish approaches involving a baseline withdrawal for the anticipated needs of daily life (ideally obtained from social security, plus annuities or bond ladders) and discretional withdrawal for the desired but not absolutely essential quality of life. My guess is it would be possible to come up with a strategy that like the 4% rule required a retirement fund of 25 times desired expenditure but would have a miniscule failure rate in the bankruptcy direction.
ReplyDelete
Replies
AnonymousOctober 18, 2019 at 11:13 AM
I think something that is not factored into the models is that we have somewhat random stock market returns that are coupled with multi-decade secular trends in interest rates and economic environments. We can see the secular trends in the rearview mirror but it is hard to predict what it will be for the next couple of decades.

The late 1800s and early 1900s were an unregulated financial system with frequent booms and "panics". Pretty much anything associated with money careened around precipitously during that period and it is probably the most random period (at least in the US) in our financial history.

That period culminated in the Great Depression and WW II, out of which arose the modern financial system with regulation, strong central banks, etc. Stocks, real estate, and interest rates had low valuations in the mid-30s and steadily rose for decades culminating the inflationary 70s and double digit interest rates.
Stock re-started at relatively low values in the early 80s but this time interest rates were at very high rates. The next 35 years were a steady decline in interest rates to the lowest in US history and negative rates in Europe and Japan. Stocks have gyrated wildly, but have maintained a pretty high valuation plateau over the past 20 years.

However, over the past 20 years, financial systems have deregulated and not coincidentally, we are seeing much more chaos in anything to do with money while productivity growth has been systematically lower than the previous 50 years.

So I have been very cautious about any predictions I see based on data from the post WW II period (the best quality data) or the past 35 years (secular interest rate decline and demographic bulge of post WW II babies). The focus on deregulation, nationalism, US withdrawing from the world, etc. may make the next 30 years look more like the 50 years prior to WW II than the 50 years after. This is just speculation, but it is why I am leary of the "permanently high plateau" and "this time is different" crowd of predictions.

The one thing I am pretty certain about is that 2% interest rates and relatively high PE ratios make achieving historical average returns in stocks, bonds, or diversified portfolios highly unlikely over at least the next 10 years and possibly 30 years. So my working assumption has been that, at best, we will have an OK sequence of returns over the next 10 years with the distinct possibility of something like the 1970s or late 1800s/early 1900s lurking in the wings. I think the central banks have learned enough that they should be able to avoid 1932 (2008 was primed to go there).

So I think Monte Carlo simulations, etc. assuming average conditions are unlikely to be successful. However, a dynamic withdrawal system with the ability to be conservative in spending at times with portfolio growth assumed to be low single digits will likely be the optimum approach for somebody retiring in the next few years.
ReplyDelete
Replies

Add comment

Resources

Saturday, September 28, 2019

The Prevalent but Problematic Probability of Ruin

9 comments: