The Retirement Café: Monte Carlo and Tales of Fat Tails

I recently read a white paper[1] claiming to show that Monte Carlo (MC) simulation "creates fat tails" and suggesting that constant-dollar withdrawals (the "4% Rule") are historically 100% safe.

Before you log onto E*TRADE for that stock-buying binge, let me explain how I come to a totally different conclusion.

The paper asserts that the reason Monte Carlo models produce different results than the historical data model is the absence of mean reversion in the paper's MC model or perhaps a general flaw in the Monte Carlo technique. The paper presents no statistical evidence, however, of either fat tails or mean reversion and I can't find any in the paper or in my own MC models.

Let's start with a definition of "fat tails." The term has multiple meanings[2] but in this context, it describes a sample that is more likely to include extreme draws than a normal distribution would predict. A few extreme draws from a normal distribution isn't evidence of fat tails; it is simply evidence of tails.

For example, it is possible (though improbable) to draw an annual market return of 80% from a normal distribution with a mean of 5% and a standard deviation of 12% because a normal distribution has tails that are infinite. A single draw, however, tells us nothing about the probability of extreme draws, which is the definition of fat tails. If our model were to produce many extreme draws – more than a normal distribution would predict – then we would have evidence of fats tails. There are also statistical measures that indicate fat tails, though the paper doesn't report any.[2]

The major flaw in the analysis appears to be the use of a naive Monte Carlo model based solely on normally-distributed market returns. (I say "appears" because the paper reveals little about how the model was constructed but the results are telling). Portfolio survivability is too complex to be modeled by such a simple strategy and it is wrong to blame "Monte Carlo" for the results of a poorly constructed model that happens to use Monte Carlo.

David Blanchett and Wade Pfau wrote on this topic in 2014[3]:

"But this argument is like saying all cars are slow. There are no constraints to Monte Carlo simulation, only constraints users create in a model (or constraints that users are forced to deal with when using someone else's model). Non-normal asset-class returns and autocorrelations can be incorporated into Monte Carlo simulations, albeit with proper care. Like any model, you need quality inputs to get quality outputs."

There are no normal distributions in the real world, only samples that seem likely to have been drawn from a normal distribution. Historical annual market returns, as you can see in the following histogram, appear to be such draws.

The historical data model doesn't use this distribution to create sequences of returns, though. It uses rolling 30-year sequences of these returns, changing only the first and last of 30 years for each new sequence, which distorts the distribution significantly, as shown below. That red distribution doesn't look very normal, does it? Rolling sequences also reduce sequence risk, so we won't find as much as we might otherwise. MC-generated sequences of market returns will be independent and that is a primary reason that MC provides different results than the historical data model, not fat tails or mean reversion.

While our only available sample of historical annual returns data seems likely to have been drawn from a normal distribution, not all draws from that normal distribution create a realistic market return sample. A draw from a normal distribution of annual market returns might legitimately represent a theoretical 120% annual market loss or gain but the former would be impossible for a real portfolio and the latter extremely unlikely.

These are not draws that should be used by an MC model of retirement portfolio returns, at least not when the goal is to measure tail risk. As Blanchett and Pfau note above, "There are no constraints to Monte Carlo simulation, only constraints users create in a model. . ." There is no constraint that says an MC model must use unrealistic scenarios simply because they are drawn from a normal distribution. This MC model is meant to model real-life capital markets, not a distribution that exists only in theory.

The sequence of market returns is critical to portfolio survivability. The historical data shows no strings of more than four market losses or more than 15 consecutive annual gains. This isn't predicted by a normal distribution in which the sequence of returns is purely random but it can be modeled with Monte Carlo. There appear to be market forces that constrain normally-distributed market return sequences and a model based solely on a normal distribution of market returns will not account for these market forces.

Blanchett and Pfau note that autoregression can be incorporated into MC models. This is important for interest rates and inflation rates, which tend to be persistent. Mean reversion, or "long-term" memory of market returns, can also be modeled if one has a strong opinion regarding the existence of mean reversion in the stock market and a strong opinion of the lag time. The authors further note that a proper MC retirement model also incorporates random life expectancy rather than assuming fixed 30-year retirements.

In short, the things the paper complains about "Monte Carlo" not doing are all things an MC model can do but the researcher's model simply doesn't.

An MC model that limits market returns and sequences of returns to appropriately reflect empirical market performance will eliminate most of the anomalies cited in the white paper but it raises another concern: the paper's analysis appears to be a comparison of the historical data model results to a single MC simulation.

I refer to the reference to the (single) maximum "$26M" terminal portfolio value generated by the MC model and to a single probability of failure. MC models should provide a distribution of possible maximum TPVs and probabilities of ruin, not a single result, and that requires running the model many times.

Running the MC model once might produce a maximum TPV of $26M but a second run with different random market returns might produce a maximum TPV of $6M. We run the MC model many times to estimate how likely various TPVs and probabilities of ruin are. There is no single answer.

(To explain more simply, I have a basic MC probability of ruin model much like the one in the paper. I set it to run 1,000 thirty-year scenarios. The first time I ran this model it calculated a maximum terminal portfolio value of $6.8M. I ran the same model again with nothing changed except that it calculated a new set of random market returns for another 1,000 scenarios. The maximum TPV was $10.4M. The third time it produced $9.5 M. The maximum TPV changes each time the random market returns are updated.

I automated the process and ran the MC model 1,000 times with 1,000 different random market returns each. Maximum TPVs ranged from $4.7M to $41M but the most common maximum TPV was around $10M. This is why we don't stop after running the MC model once and estimating a maximum TPV (in this case) of $6.8M, or a single probability of ruin, for that matter.)

This extremely large, improbable terminal portfolio value is not a fault of Monte Carlo analysis but the result of a naive model of market returns and sequences of those returns that poorly approximates capital markets as we currently understand them. It is also a point estimate.

(As an aside, I'm not sure why we should be concerned about overly-optimistic TPVs in this context. This is an analysis of portfolio survivability, which is a function of poorly-performing scenarios.)

Is a $26M terminal portfolio evidence of fats tails? Many portfolios that large over many MC simulations might be but a single result tells us nothing about whether it is more or less likely than a normal distribution would predict. Then there's the other issue – terminal portfolio values aren't normally distributed.

Following is a histogram of TPVs created by the historical data model and a log-normal distribution of those results in red.

The white paper notes that some MC-generated terminal portfolio values are larger than a normal distribution would predict. However, TPVs, as you can see in the chart above, are log-normally distributed, not normally-distributed, and should be expected to be larger than a normal distribution predicts. A log-normal distribution is the expected result of the product of n (30) annual normal distributions and a fat right tail is the expected probability density of a log-normal function. If TPVs were normally distributed, some would be less than zero.

Is accepting unrealistic scenarios always a bad thing? This depends on the model's purpose. William Sharpe's RISMAT model[5], for instance, doesn't bother excluding them nor does the research I'm currently co-authoring. The same unrealistic scenarios are included in every strategy tested and filtering them out wouldn't change the comparisons. A small number of unrealistic scenarios is easy to deal with.

The paper in question, however, uses Monte Carlo analysis specifically to measure probability of ruin and this purpose is overly sensitive to unrealistic scenarios because they're the ones that generate results counted as portfolio failures (and large TPV). There will probably be only a relative handful of failed scenarios and adding in a few more failures from unrealistic scenarios can have a dramatic impact on the percent of failures (probability of ruin). If you insist on trying to estimate tail risk this way, then you should use only realistic scenarios.

To my earlier point, the questionable validity of using MC models specifically to estimate tail risk doesn't disqualify all MC models of retirement finance. As Blanchett and Pfau say, not all cars are slow.

Back to the white paper's claims, no statistical evidence of fat tails or mean reversion is provided and I can find neither of these in these results. I certainly see no evidence of 100% success in the results. I mostly see evidence that a naive MC model provides strange results but I would have guessed that.

Joe Tomlinson wrote a follow-on post[4] to that Blanchett-Pfau piece in which he raised several important points. One is that the selection of metrics is critical when analyzing MC results. In fact, I would argue that estimating a probability of ruin metric is a poor use of MC models since low-probability events are unpredictable.

Tomlinson also makes the point that "The measures being applied by researchers may be more useful than those provided in financial-planning software packages, which provides an opportunity for software developers to introduce new measures to improve the usefulness of their products." So, perhaps an important finding of this paper can be gleaned from the phrase "Monte Carlo analysis (as typically implemented in financial planning software). . ."

If most MC models available to planners are indeed as naive as this white paper suggests and we are using those models to calculate probability of ruin (not my preferred use), then we really do have an MC problem. But it isn't fat tails or the lack of mean-reversion modeling.

So, do Monte Carlo models of retirement finance generate fat tails? I don't see evidence of that. Do they create unrealistic scenarios? Maybe, but that depends on the specific software you're using and its purpose, not on the Monte Carlo statistical tool.

Monte Carlo can be a powerful tool for retirement planning but only if used correctly and for the right application. Estimating tail risk is probably not a good application.

REFERENCES

[1] Fat Tails In Monte Carlo Analysis vs Safe Withdrawal Rates. Nerd's Eye View blog.

[2] Fat Tail Distribution: Definition, Examples.

[3] [The Power and Limitations of Monte Carlo Simulations, David Blanchett and Wade Pfau, Advisor Perspectives.

[4] The Key Problem with Monte Carlo Software - The Need for Better Performance Metrics, Joe Tomlinson.

[5] Retirement Income Scenario Matrices (RISMAT), William F. Sharpe.

2 comments:

FrancisJuly 26, 2018 at 8:30 PM
I don't think much of MC analysis as a retirement planning tool, because the one I received did not do a lot for me as I planned my escape from the 40-hour work-week. I know that “experience is not evidence,” and n=1 doesn’t mean much. Still, about all I gleaned from the MC I received was that the odds of outliving my savings probably were small, while the odds of my savings growing probably were large. So, I’m afraid I shrugged when I read your conclusion that, “the major flaw in the analysis appears to be the use of a naive Monte Carlo model based solely on normally-distributed market returns.”

However, I do have a comment and a question. Both are more statistical nits than they are responses to your critique of the Nerd's Eye paper.

My first statistical nit is your assertion that, "... I would argue that estimating a probability of ruin metric is a poor use of MC models since low-probability events are unpredictable." I think that is an overly strong statement. Low-probability events are *difficult* to predict, but they are not unpredictable. It is common to need to predict whether an event or outcome that lands in the far end of a tail (e.g., fraud detection, anomaly detection, medical diagnosis, oil spillage detection, fault detection). There are lots of techniques (e.g., over-sampling, under-sampling, hybrids) to overcome the “class imbalance” problem.” I expect some of them eventually will show up in the retirement finance literature. Perhaps there eventually will be a machine-learning formula relating MC predictions of ruin to the characteristics of those who exceed a specified probability of failure.

Second, in commenting on terminal portfolio values you say that, “TPVs, as you can see in the chart above, are log-normally distributed, not normally-distributed, and should be expected to be larger than a normal distribution predicts.” I suspect you have a good reason for fitting a log-normal curve to the histogram of TPVs created by the historical data model. Still, I could visualize other types of distributions fitting that curve. Are you fitting a log-normal distribution because of the number of trials in your simulations, or did you perhaps use some goodness-of-fit statistic? It seems that other types of distributions (Bernoulli?) might be used, since you are asking whether outcomes are successes or failures.

Best regards,

Francis

The Retirement Café

Resources

Thursday, July 12, 2018

Monte Carlo and Tales of Fat Tails

2 comments: