What Regression to the Mean Does and Doesn't... Mean

Apr 26, 2023

Hey guys, Barnes & Noble is doing a promotion taking 25% off select book preorders on 4/26, 4/27, 4/28, which is a great way to preorder my new book if you haven’t already. Use coupon code PREORDER25 to get the deal.

The sports analytics movement has proven time and again to help teams win games, across sports and leagues, and so unsurprisingly essentially every team in every major sport employs an analytics department. I in fact find it very annoying that there are still statheads that act like they’re David and not Goliath for this reason. I also think that the impact of analytics on baseball has been a disaster from an entertainment standpoint. There’s a whole lot one could say about the general topic. (I frequently think about the fact that Moneyball helped advance the course of analytics, and analytics is fundamentally correct in its claims, and yet the fundamental narrative of the book was wrong1.) But while the predictive claims of analytics continue to evolve, they’ve been wildly successful.

I want to address one particular bugaboo I have with the way analytical concepts are discussed. It was inevitable that popularizing these concepts was going to lead to some distortion. One topic that I see misused all the time is regression/reversion to the mean, or the tendency of outlier performances to be followed up by performances that are closer to the average (mean) performance for that player or league. (I may use reversion and regression interchangeably here, mostly because I’m too forgetful to keep one in my head at a time.) A guy plays pro baseball for five years, he hits around 10 or 12 homeruns a year, then he has a year where he hits 30, then he goes back to hitting in the low 10s again in following seasons - that’s an example of regression to the mean. After deviation from trends we tend (tend) to see returns to trend. Similarly, if the NFL has a league average of about 4.3 yards per carry for a decade, and then the next year the league average is 4.8 without a rule change or other obvious candidate for differences in underlying conditions, that’s a good candidate for regression to the mean the next year, trending back towards that lower average. It certainly doesn’t have to happen, but it’s likely to happen for reasons we’ll talk about.

Intuitively, the actual tendency isn’t hard to understand. But I find that people talk about it in a way that suggests a misunderstanding of why regression to the mean happens, and I want to work through that here.

So. We have a system, like “major league baseball” or “K-12 public education in Baltimore” or “the world.” Within those systems we have quantitative phenomena (like on-base percentage, test scores, or the price of oil) that are explainable by multiple variables, AKA the conditions in which the observed phenomena occur. Over time, we observe trends in those phenomena, which can be in the system as a whole (leaguewide batting average), in subgroups (team batting average), or individuals (a player’s batting average). Those trends are the result of underlying variables/conditions, which include internal factors like an athlete’s level of ability, as well as elements of chance and unaccounted-for variability. (We could go into a big thing about what “chance” really refers to in a complex system, but… let’s not.) The more time goes on, and the more data is collected, the more confidently we can say that a trend is an accurate representation of some underlying reality, again like an athlete’s level of ability. When we say a baseball player is a good hitter, it’s because we’ve observed over time that he has produced good statistics in hitting, and we feel confident that this consistency is the product of his skill and attributes rather than exogenous factors.

However, we know that good hitters have bad games, just as bad hitters have good games. We know that good hitters have slumps where they have bad three or five or ten etc game stretches. We even acknowledge that someone can be a good hitter and have a bad season, or at least a season that’s below their usual standards. However, if a hitter has two or three bad seasons, we’re likely to stop seeing poor performance as an outlier and change our overall perception of the player. The outlier becomes the trend. There is no certain or objective place where that transition happens.

Here’s the really essential point I want to make: outliers tend to revert to the mean because the initial outlier performance was statistically unlikely; a repeat of that outlier performance is statistically unlikely for the same reasons, but not because of the previous outlier. For ease of understanding let’s pretend underlying conditions stay exactly the same, which of course will never happen in a real-world scenario. If that’s true, then the chance of having an equally unlikely outcome is exactly as likely as the first time; repetition of outliers is not made any less likely by the fact that the initial outlier happened. That is, there’s no inherent reason why a repetition of the outlier becomes more unlikely, given consistent underlying conditions. I think it’s really important to avoid the Gambler’s Fallacy here, thinking that a roulette wheel is somehow more likely to come up red because it’s come up black a hundred times in a row. Statistically unlikely outcomes in the past don’t make statistically unlikely outcomes any less likely in the future. The universe doesn’t “remember” that there’s been an outlier before. Reversion to the mean is not a force in the universe. It’s not a matter of results being bent back into the previous trend by the gods. Rather, if underlying conditions are similar (if a player is about as good as he was the previous year and the role of variability and chance remains the same), and he had an unlikely level of success/failure the prior year, he’s unlikely to repeat that performance because reaching that level of performance was unlikely in the first place.

I see this casual mistake all the time, so I repeat: the universe does not remember or care that a statistically unlikely thing happened last game/month/year, and the fact that it did doesn’t make it any less likely that the statistically unlikely thing will happen again. It’s just that statistically unlikely things were, are, and remain unlikely. If you’re a gambler and you think that you’ve developed a really powerful model to predict future outcomes, and you’ve tried to accommodate for every possible variable, but then you notice that a player outperformed their trend last year and artificially adjust your projection down, you’re making a mistake. Again, remember the Gambler’s Fallacy - if I flip a coin and get heads fifty times in a row, the odds that I get tails the next time are 50/50, exactly the same as the first time. If I flip it another fifty times it’s very unlikely that I’ll get fifty straight heads again, for the exact same reasons it was unlikely the first time. And that’s the heart of reversion to the mean.

Additionally…

Reversion to the mean does not, does not, does not imply reduced variability over time.

This video is a 42-minute exploration of this point, but the short of it is this: the fact that we see regression to the mean does not mean that everyone is converging to the mean at all times. It’s very easy to think “OK, this player is far from the league average, so they’re going to regress closer to the mean, and so is this player, and actually every player will be moving towards the mean over time….” That would suggest that over a long enough time frame, every MLB player would hit for the exact same league-average batting average, which is not the case. We live in a world of variability; regression to the mean doesn’t change that. We also live in a world where different players/teams/eras have different underlying conditions, most prominently the differences in ability (speed/strength/coordination/awareness/etc) between individual players. Remember, a player can set a trend in the first five years of their career, outperform that trend in the sixth year, regress to the mean in the seventh, and then go right back to outperforming the trend in the 8th year. Because of the nature of variability, even if we say that underlying conditions make sticking to trend likely, it would also be very unlikely to see too much conformity to trend. It’s like that old trick stats professors play on their students when they make them flip a coin 50 times in a row, and identify the students who faked their results by looking for a lack of long runs of heads or tails - variability makes unlikely results likely when you have enough repetitions/observations.

Conditions change. We expect reversion to the mean because past performance is a pretty good guide to future performance, but conditions change and so we also expect changing results. The NBA has made a lot of changes to increase scoring in the past decade, and scoring has indeed increased. In the context of historical trends, a high-scoring year following a rule change like that might look like an outlier. But it would be foolish to assume that we would see reversion to the mean (or at least to a particular degree) because conditions have changed. Rules change underlying conditions, strategies and playcalling change underlying conditions, juiced balls and juiced players change underlying conditions, and so nothing stays static. But changes to trend that are the product of statistically unlikely expressions of variability are likely to return to trend, to some degree.

Changing conditions apply to individual players too. It would be very dopey to look at Johnny Unitas’s disastrous final 1973 season and that season’s quarterback rating of 40.0, note that his career rating had been twice that, and conclude that he would be sure to get better the next year. Because conditions changed; he got old and injured. Rick Allen played a lot fewer fills after 1984. Things change.

You revert to the new mean! It’s very important to say that we don’t throw out the data of the outlier. In other words, the data that we perceive to be an outlier isn’t “bad” data and we don’t just ignore it because we call it an outlier. The outlier data gets folded into career or league averages and our perception of the overall trend. (I always hated it when an announcer on TV would say “you know, if we throw out that one long run, he’s only averaging X yards per run….” The thing is, we can’t throw out that run!) So if you have a third-year player and he just leaves his first two years in the dust, statistically, it’s likely that there will be some regression to the mean. But it will be to the new mean, which includes his dramatically better performance and will thus represent a significantly better trend, dependent on number of data points and the size of the improvement. If we could say conclusively that this outlier year was purely the product of variability effects or chance, we’d probably discount it in our analysis, but we can’t know that. A batter who’s a career .250 hitter that one year hits .350 is a batter we’d assume is more like a .250 hitter than a .350 hitter, but not one who we’d think of as exactly a .250 hitter. He’d be at least a little better than that.

There has to be adequate data to assess an outlier. I find it very odd when people look at a second-year wide receiver who’s popping, compare it to an underwhelming first season, and conclude that we’re likely to see regression. There just hasn’t been enough data. Of course we have expectations about who’s going to be good based on college or minor league performance, but those expectations are subverted all the time, and the difference between those levels of play and the majors are too great to establish trends, I think. I’ve sometimes seen Jeremy Lin as an example of reversion to the mean - he was a guy who was not expected to play well, he had a sudden burst of all-star caliber play, and then he spent the rest of his career as a journeyman. But Lin only played 29 games before the Linsanity year with the Knicks, averaged less than 10 minutes a game in the ones he did play in, and never started. That isn’t sufficient data to suggest that we had a trend to establish context for his level of play. I also think that his career performance was more like his level of play with the Knicks than people assume.

You can’t assume periodicity. I think there’s an unconscious tendency to assume that digression from the mean will be periodic, like a sine wave - that a guy who averages out to a career .300 hitter might have a .275 season, then a .300 season, then a .325 season, then a .300 season…. But there’s no reason that outliers that are really outliers have to fall into a particular pattern. In other words, if you’re psychic and you have a guy who you know will have a 15 year career, and you know that ten of those years will be right on trend for a given stat but five of them will be outliers, you can’t assume that those outliers will be spaced out in any particular or regular intervals. Again, the fact that one year was an outlier does not influence whether the following year will be an outlier, assuming underlying conditions are the same. Two outliers can (and do) come one after another and then you might get a return to the trend that’s very consistent.

This is well-trod territory at this point, but the quick version is

the A’s not only were not a uniquely bad franchise~~, they had won the most games of any team in major league baseball in the ten years prior to the~~ ~~Moneyball~~ ~~season~~
major league baseball had entered an era of unusual parity at that time, belying Michael Lewis’s implication that it was a game of haves and have-nots
readers come away from the book convinced that the A’s won so many games because of Scott Hatteberg and Chad Bradford, the players that epitomize the Moneyball ethos, but the numbers tell us they were so successful because of a remarkably effective rotation in Tim Hudson, Barry Zito, and Mark Mulder, and the offensive skill of shortstop Miguel Tejada - all of whom were very highly regarded players according to the old-school scouting approach that the book has such disdain for.
Art Howe was not an obstructionist asshole.

Carina

Great post! A related point is that people try interventions after an extreme outcome—such as a goalie having a horrible game—and if performance improves, we give credit to the intervention when regression to the mean is a likely explanation.

Any time an intervention is a reaction to unusually poor performance, this is an issue.

It’s even harder for people to wrap their minds around this when we don’t have trend data. For example, I’ve seen a few news stories about racism in home appraisals using the following anecdote: A Black homeowner gets a low appraisal, then stages the house to make it look like a white family lives there, and gets a much higher appraisal. They conclude that “being Black cost me $200k in the appraisal.”

But homeowners only seek a second appraisal when they feel they have been lowballed, for example if the neighbor’s houses have higher values. It was probably an unusual / extremely low appraisal. So it makes sense that the second appraisal is higher—it’s probably closer to the average you’d get if the home were appraised over and over--but you can’t attribute the extra $200k to photographs of white people.

Of course I’m willing to believe there is racial bias in home appraisals, but to measure it we would need a study with many data points and (ideally) randomization. We definitely need more than anecdotes about people who got an unusually low appraisal and then tried again.

Expand full comment

3 replies by Freddie deBoer and others

Robert Foster

One other potential meaning of "regression to the mean" comes from the process of using Bayesian inference to estimate a player's ability. Using Bayesian inference, the estimate of a player's ability is a weighted combination of the population mean (over all players) and the player's mean (over their respective performance), with the weights being determined both by the sample size and by the respective variances of the distribution of population abilities and the variance of performance for a player. In that way, for small sample sizes the estimate for a player's ability is "regressed" heavily towards the population average, for larger sample sizes less "regressed" because we have more confidence that the average truly represents the players ability and is not just noise.

42 more comments...

Freddie deBoer

What Regression to the Mean Does and Doesn't... Mean

44 Comments