Hey guys, Barnes & Noble is doing a promotion taking 25% off select book preorders on 4/26, 4/27, 4/28, which is a great way to preorder my new book if you haven’t already.
A minor tweak: yes, if you flip a coin fifty times and get Heads each time, then *conditional on the coin being fair* there’s a 50/50 chance of Heads on the 51st throw. But conditional on the coin being fair, the chance of getting a straight row of fifty Heads on the first 50 throws is less than a quadrillion to one against (similar to the odds of winning the lottery twice in a row). So that initial string of Heads is pretty good evidence that the coin actually isn’t fair, but has some bias that causes it to land Heads all the time. If I flip a coin fifty times and get Heads each time, I’m betting on Heads the 51st time too.
Well, yes. Obviously that's a needlessly extreme example and would offer good evidence that the coin is weighted. I'm just trying to underline the point that the coin doesn't "remember."
Seeing 50 heads in a row and then betting on heads is definitely good Bayesian reasoning (as you say, unless you're infinitely confident in your prior). The way I read the post though, it's critical of someone who sees 50 heads and then bets on tails because of ideas about regression - which is essentially a more confused version of "it's about due".
This has me thinking of the Michael Jordan fans who insist that we omit his Washington Wizard years when we consider the greatness of his career. This omission, of course, makes his stellar career even more stellar. It also has me thinking of the Jordan fans who use his six titles as the sole or primary argument for his greatness—an argument that omits the seasons when his teams were average or below average. So what are the mean and median of Jordan's career?
I always argue that Jordan averaging 20-6-4, at 38-39 years old, after not playing basketball for three years, on a terrible team, is not something that should diminish his greatness. I love when athletes keep playing when everyone says they should hang it up.
Mean and median of his career is that he's GOAT. He did things other people simply couldn't, and he did it consistently. And he had a knack for coming up big in big moments. And he wasn't a sell-out bitch like LeBron. The more time passes, the more I respect that.
Freddie, I hate to pick nits but this is a good piece so to make it more perfect, about halfway down (in the section below the video) the phrase "in the seventh year." should really be "in the eighth year.". Normally do not worry about little things like that, but it is in the development of the core examples, so should probably be fixed. And thanks for all you do. Always go to your pieces first because they are virtually always interesting and I almost always learn something.
One thing that drives me absolutely nuts is how people use statistics to prove a point, and they have no idea what the statistic means, or they don't understand the numbers driving the statistics.
A common (but sadly true) situation is when a new company, or a company with a new product, experiences 'double digit sales growth'. Well heck, if I sold one product the first year, and two the next year, I'm achieving 100% sales growth. It would be absurd for me to project that rate of growth for very long.
But, I've been in many meetings where people who should know better show these kinds of growth forecasts. Fortunately, there have been people in the meetings who understand statistics and the real world, and these forecasts get shot down. But, apparently, this doesn't always happen (Enron), and the results can be embarrassing at best, catastrophic at worst.
One other potential meaning of "regression to the mean" comes from the process of using Bayesian inference to estimate a player's ability. Using Bayesian inference, the estimate of a player's ability is a weighted combination of the population mean (over all players) and the player's mean (over their respective performance), with the weights being determined both by the sample size and by the respective variances of the distribution of population abilities and the variance of performance for a player. In that way, for small sample sizes the estimate for a player's ability is "regressed" heavily towards the population average, for larger sample sizes less "regressed" because we have more confidence that the average truly represents the players ability and is not just noise.
Great post! A related point is that people try interventions after an extreme outcome—such as a goalie having a horrible game—and if performance improves, we give credit to the intervention when regression to the mean is a likely explanation.
Any time an intervention is a reaction to unusually poor performance, this is an issue.
It’s even harder for people to wrap their minds around this when we don’t have trend data. For example, I’ve seen a few news stories about racism in home appraisals using the following anecdote: A Black homeowner gets a low appraisal, then stages the house to make it look like a white family lives there, and gets a much higher appraisal. They conclude that “being Black cost me $200k in the appraisal.”
But homeowners only seek a second appraisal when they feel they have been lowballed, for example if the neighbor’s houses have higher values. It was probably an unusual / extremely low appraisal. So it makes sense that the second appraisal is higher—it’s probably closer to the average you’d get if the home were appraised over and over--but you can’t attribute the extra $200k to photographs of white people.
Of course I’m willing to believe there is racial bias in home appraisals, but to measure it we would need a study with many data points and (ideally) randomization. We definitely need more than anecdotes about people who got an unusually low appraisal and then tried again.
What I find interesting is how markets are supposedly efficient, and baseball is about as close as one can get to a theoretically perfect market (winners and losers are clearly determined and winners get more dough, number of teams remains relatively constant, all statistics going back to the dead-ball era are all publicly available and nobody has any stats that are not available to everyone else, etc.) and yet it took over 100 years for Sabermetrics to be invented.
For that matter, Sabermetrics was scoffed at for decades after its invention.
I think that's taking the market metaphor quite a bit too literally. Professional sports are *far* from perfect markets; there is a *lot* of nepotism and ego happening among decision makers.
That sounds like pretty much every business. If anything, there's less in sports, because the players typically don't have any capital tied up in the business.
Yes, and terrible strategies that used up outs prevailed for decades. You sort of see the same thing with sub-optimal chess openings historically. My sense is that football is better in this respect than baseball but that’s probably because I don’t know enough about football.
One of the otherwise-smartest guys I ever knew just could *not* not assume periodicity. It took him real, genuine, veins-popping-out physical effort not to say things like "Yeah but take it from the last two years..." whenever we discussed anything statistical. It was fascinating to see such a massive failing take place in real-time.
On the subject of the universe remembering, and outliers being checked off a list by the stat gods, it's like the old joke: you should always take a bomb on the airplane, because after all, what are the odds of there being two on one flight?
Interestingly, there seems to be some reversion to the mean with the analytics themselves as well. Take the 3pt shot or the shift in baseball, analytics departments correctly identified the value of prioritizing these concepts to the point of insisting on a reduction in coaching input during games. We went through a really long time of "see, the oldheads don't know what they are talking about!", when people were skeptical about some of the surface level stats, but now that we've been in this experiment for some time its becoming more obvious that the old guard was right on many accounts. Analytics based teams were clearly advantaged in the beginning, however being the most analytically analytical team to ever analyze doesn't seem to amount to much now.
The 3pt revolution really started with Mike Dantoni in PHX, which failed because it still relied on a heliocentric PG (Nash) in order to run the quick paced offense that created so much spacing. The next evolutionary step was in S.A. with coach Pop, who realized that everyone on the court had to buy into the offense, and that hunting for the best look could make up for massive talent gaps and physical prowess. I wouldn't say this failed, but the analytics weren't able to figure out that in most cases, a decent three is just more valuable than a great 2pt, and you didn't have to build a roster full of Jacks. Golden State with Steve Kerr and Curry are largely credited with perfecting the system, although I think that that's actually a little premature. They still had many of S.A. great principles of ball movement and efficient scoring, however they looked outside-in vs. inside-out, a product of their differently aged coaches.
The last evolution, and truly what made it a league wide formula was Mike Dantoni's redemption tour in Houston with James Harden. They took 3300 threes, almost 800 more than Golden State had when they set the record the previous year. They combined the outside-in approach with the heliocentric offense with the catch and shoot principles. If we sent them a tape, a 90s NBA fan would probably assume they were watching basketball from the 2100s on a lunar colony.
Strictly speaking that's not "reversion to the mean with the analytics themselves...", it's just that once everyone is exploiting an edge that edge will tend to go away. Sort of a variant of the efficient market hypothesis for sports.
Semi-related: curious about thoughts on the MLB rule changes from an entertainment perspective? I'm finding the games to be much more fun to watch/listen to on the radio and in person due to the faster pace of play. (The secondary benefit of the pitch clock is way less down time between pitches means less time for crappy TV announcer blather.) My hope is that the impact of analytics moving forward will reinforce the positive changes (more steals, for example) but we'll have wait and see.
Only downside thus far is the abbreviated walk up songs... which would be a great article: best walk up songs. I'd go with Can I Kick It by Tribe Called Quest and Iron Man by Black Sabbath.
The process deBoer describes has been mathematically explored in Stochastic Partial Differential Equations. Turns out, as processes evolve over time, it is not chaotic, but with some semblance of order and predictability. SPDE are a difficult subject, even by standards of modern mathematics. But it is still interesting that what deBoer describes with intuition is actually a very rigorous mathematical subject.
I really loved this article and the overall conversation around statistical probability. I think the most underrated point, as it pertains to real life, is that outliers are expected to happen. So go for jobs, or relationships, or situations with high upside potential. While it might not be statistically likely to have a $1M earning year in any given year, by joining a field like Software sales, you have created the possibility of it happening. And if you stick to this field for a long enough time frame, it becomes unlikely that you don't have a good year at some point. The money does not count any less because it came during a "lucky" outlier year. Give yourself as many at bats as possible in fields with high upside, and some wildly cool shit can occur.
"the A’s not only were not a uniquely bad franchise, they had won the most games of any team in major league baseball in the ten years prior to the Moneyball season"
Not the point, but it struck me immediately that this can't possibly be true (having grown up and followed baseball in the 90's). So I downloaded team total wins from baseball-reference, and it looks like the A's had 779 wins from 1992-2001, which is fewer than Atlanta, Baltimore, Boston, Chicago (AL), Cincinnati, Cleveland, Houston, LAD, NYY, San Francisco, Seattle, Texas, and Toronto. They were basically exactly middle of the pack in wins over that span. Unless I'm completely losing my mind here.
It’s also a tad disingenuous, since none of the players that drove ‘88-‘90 AL Champ teams were on the 2000’s teams. McGwire was probably the last one, and he had foot issues before he was traded in ‘97.
I recognize that "writers, journalists, and other take-havers in the humanities don't like math" is a hasty and unfair stereotype.
That being said, much like the conversation with the totally fictional and definitely not real crime activist from a couple of weeks ago, I feel like this was subtly pointing fingers at a certain kind of person that Freddie has almost certainly interacted with consistently enough to be an archetype.
As you mention in the post, a lot of this can be applied to sports as well as education.
You have written extensively about the limitations of educators to change the underlying fundamentals regarding the abilities of the students that they have. Do you feel that a coach and a teacher have the same ability to affect outcomes?
In my opinion (based on no analysis, just my experience) a good teacher or coach may help the student or athlete achieve their personal best. But, given a bad teacher or coach, it can be really hard for even the best students or athletes to achieve their best.
This is not to say that a good athlete won't do better than a poor athlete in the same situation. It is to say an athlete's career can certainly be hampered by having a bad coach.
And, I don't think this is at odds with what Freddie has written about the limitations of educators to affect relative outcomes.
There’s been a lot of reporting on baseball players who have used outside coaching from places like Driveline (for pitchers; don’t remember the complement for hitters) and dramatically changed their career paths, JD Martinez and Justin Turner being the two examples I can think of top of mind. The book *The MVP Machine* goes into this in some detail, I believe.
While caveats about self-selecting samples apply (certainly with “signed by an MLB team”, probably also with “went to outside groups to improve”), it does also make me think of post-pandemic interventions on struggling students by giving them one-on-one tutoring. So while the central point of Freddie’s work is inevitable (differences in innate talent), outcomes could improve further with *intense* interventions. This, however, is unrealistic (those one-on-one tutors for literally everyone).
It’s just interesting to see a population where the incentives to improve are *so* strong, on both an individual and organizational basis, that you can see very quick adoptions of successful methods and intense individualized training.
A minor tweak: yes, if you flip a coin fifty times and get Heads each time, then *conditional on the coin being fair* there’s a 50/50 chance of Heads on the 51st throw. But conditional on the coin being fair, the chance of getting a straight row of fifty Heads on the first 50 throws is less than a quadrillion to one against (similar to the odds of winning the lottery twice in a row). So that initial string of Heads is pretty good evidence that the coin actually isn’t fair, but has some bias that causes it to land Heads all the time. If I flip a coin fifty times and get Heads each time, I’m betting on Heads the 51st time too.
Well, yes. Obviously that's a needlessly extreme example and would offer good evidence that the coin is weighted. I'm just trying to underline the point that the coin doesn't "remember."
"But conditional on the coin being fair...."
Pretty much every probability problem boils down to correctly conditioning. 😂
Seeing 50 heads in a row and then betting on heads is definitely good Bayesian reasoning (as you say, unless you're infinitely confident in your prior). The way I read the post though, it's critical of someone who sees 50 heads and then bets on tails because of ideas about regression - which is essentially a more confused version of "it's about due".
This has me thinking of the Michael Jordan fans who insist that we omit his Washington Wizard years when we consider the greatness of his career. This omission, of course, makes his stellar career even more stellar. It also has me thinking of the Jordan fans who use his six titles as the sole or primary argument for his greatness—an argument that omits the seasons when his teams were average or below average. So what are the mean and median of Jordan's career?
I always argue that Jordan averaging 20-6-4, at 38-39 years old, after not playing basketball for three years, on a terrible team, is not something that should diminish his greatness. I love when athletes keep playing when everyone says they should hang it up.
It was amazing. And complicated. The behind the scenes story is so compelling. Michael Lahey's book, Nothing Else Matters, tells that story. https://www.amazon.com/When-Nothing-Else-Matters-Comeback/dp/0743254279/ref=nodl_?dplnkId=993cef84-0367-474e-b56d-d4956077bab3
Mean and median of his career is that he's GOAT. He did things other people simply couldn't, and he did it consistently. And he had a knack for coming up big in big moments. And he wasn't a sell-out bitch like LeBron. The more time passes, the more I respect that.
Freddie, I hate to pick nits but this is a good piece so to make it more perfect, about halfway down (in the section below the video) the phrase "in the seventh year." should really be "in the eighth year.". Normally do not worry about little things like that, but it is in the development of the core examples, so should probably be fixed. And thanks for all you do. Always go to your pieces first because they are virtually always interesting and I almost always learn something.
Right, thanks, well fix.
One thing that drives me absolutely nuts is how people use statistics to prove a point, and they have no idea what the statistic means, or they don't understand the numbers driving the statistics.
A common (but sadly true) situation is when a new company, or a company with a new product, experiences 'double digit sales growth'. Well heck, if I sold one product the first year, and two the next year, I'm achieving 100% sales growth. It would be absurd for me to project that rate of growth for very long.
But, I've been in many meetings where people who should know better show these kinds of growth forecasts. Fortunately, there have been people in the meetings who understand statistics and the real world, and these forecasts get shot down. But, apparently, this doesn't always happen (Enron), and the results can be embarrassing at best, catastrophic at worst.
One other potential meaning of "regression to the mean" comes from the process of using Bayesian inference to estimate a player's ability. Using Bayesian inference, the estimate of a player's ability is a weighted combination of the population mean (over all players) and the player's mean (over their respective performance), with the weights being determined both by the sample size and by the respective variances of the distribution of population abilities and the variance of performance for a player. In that way, for small sample sizes the estimate for a player's ability is "regressed" heavily towards the population average, for larger sample sizes less "regressed" because we have more confidence that the average truly represents the players ability and is not just noise.
Great post! A related point is that people try interventions after an extreme outcome—such as a goalie having a horrible game—and if performance improves, we give credit to the intervention when regression to the mean is a likely explanation.
Any time an intervention is a reaction to unusually poor performance, this is an issue.
It’s even harder for people to wrap their minds around this when we don’t have trend data. For example, I’ve seen a few news stories about racism in home appraisals using the following anecdote: A Black homeowner gets a low appraisal, then stages the house to make it look like a white family lives there, and gets a much higher appraisal. They conclude that “being Black cost me $200k in the appraisal.”
But homeowners only seek a second appraisal when they feel they have been lowballed, for example if the neighbor’s houses have higher values. It was probably an unusual / extremely low appraisal. So it makes sense that the second appraisal is higher—it’s probably closer to the average you’d get if the home were appraised over and over--but you can’t attribute the extra $200k to photographs of white people.
Of course I’m willing to believe there is racial bias in home appraisals, but to measure it we would need a study with many data points and (ideally) randomization. We definitely need more than anecdotes about people who got an unusually low appraisal and then tried again.
How does the saying go? Data is not the plural of anecdote. Or something like that.
It may amuse you to know that the original quote was "the plural of anecdote IS data"
Thanks, I knew I heard all those words in a phrase, couldn't remember the context or the exact wording.
Is there a non-DRM ebook version of your book? I have a Remarkable and can't do Kindle/Nook.
What I find interesting is how markets are supposedly efficient, and baseball is about as close as one can get to a theoretically perfect market (winners and losers are clearly determined and winners get more dough, number of teams remains relatively constant, all statistics going back to the dead-ball era are all publicly available and nobody has any stats that are not available to everyone else, etc.) and yet it took over 100 years for Sabermetrics to be invented.
For that matter, Sabermetrics was scoffed at for decades after its invention.
I think that's taking the market metaphor quite a bit too literally. Professional sports are *far* from perfect markets; there is a *lot* of nepotism and ego happening among decision makers.
That sounds like pretty much every business. If anything, there's less in sports, because the players typically don't have any capital tied up in the business.
Yes, and terrible strategies that used up outs prevailed for decades. You sort of see the same thing with sub-optimal chess openings historically. My sense is that football is better in this respect than baseball but that’s probably because I don’t know enough about football.
One of the otherwise-smartest guys I ever knew just could *not* not assume periodicity. It took him real, genuine, veins-popping-out physical effort not to say things like "Yeah but take it from the last two years..." whenever we discussed anything statistical. It was fascinating to see such a massive failing take place in real-time.
On the subject of the universe remembering, and outliers being checked off a list by the stat gods, it's like the old joke: you should always take a bomb on the airplane, because after all, what are the odds of there being two on one flight?
Interestingly, there seems to be some reversion to the mean with the analytics themselves as well. Take the 3pt shot or the shift in baseball, analytics departments correctly identified the value of prioritizing these concepts to the point of insisting on a reduction in coaching input during games. We went through a really long time of "see, the oldheads don't know what they are talking about!", when people were skeptical about some of the surface level stats, but now that we've been in this experiment for some time its becoming more obvious that the old guard was right on many accounts. Analytics based teams were clearly advantaged in the beginning, however being the most analytically analytical team to ever analyze doesn't seem to amount to much now.
The 3pt revolution really started with Mike Dantoni in PHX, which failed because it still relied on a heliocentric PG (Nash) in order to run the quick paced offense that created so much spacing. The next evolutionary step was in S.A. with coach Pop, who realized that everyone on the court had to buy into the offense, and that hunting for the best look could make up for massive talent gaps and physical prowess. I wouldn't say this failed, but the analytics weren't able to figure out that in most cases, a decent three is just more valuable than a great 2pt, and you didn't have to build a roster full of Jacks. Golden State with Steve Kerr and Curry are largely credited with perfecting the system, although I think that that's actually a little premature. They still had many of S.A. great principles of ball movement and efficient scoring, however they looked outside-in vs. inside-out, a product of their differently aged coaches.
The last evolution, and truly what made it a league wide formula was Mike Dantoni's redemption tour in Houston with James Harden. They took 3300 threes, almost 800 more than Golden State had when they set the record the previous year. They combined the outside-in approach with the heliocentric offense with the catch and shoot principles. If we sent them a tape, a 90s NBA fan would probably assume they were watching basketball from the 2100s on a lunar colony.
Strictly speaking that's not "reversion to the mean with the analytics themselves...", it's just that once everyone is exploiting an edge that edge will tend to go away. Sort of a variant of the efficient market hypothesis for sports.
Semi-related: curious about thoughts on the MLB rule changes from an entertainment perspective? I'm finding the games to be much more fun to watch/listen to on the radio and in person due to the faster pace of play. (The secondary benefit of the pitch clock is way less down time between pitches means less time for crappy TV announcer blather.) My hope is that the impact of analytics moving forward will reinforce the positive changes (more steals, for example) but we'll have wait and see.
so far, so good!
Only downside thus far is the abbreviated walk up songs... which would be a great article: best walk up songs. I'd go with Can I Kick It by Tribe Called Quest and Iron Man by Black Sabbath.
The process deBoer describes has been mathematically explored in Stochastic Partial Differential Equations. Turns out, as processes evolve over time, it is not chaotic, but with some semblance of order and predictability. SPDE are a difficult subject, even by standards of modern mathematics. But it is still interesting that what deBoer describes with intuition is actually a very rigorous mathematical subject.
I really loved this article and the overall conversation around statistical probability. I think the most underrated point, as it pertains to real life, is that outliers are expected to happen. So go for jobs, or relationships, or situations with high upside potential. While it might not be statistically likely to have a $1M earning year in any given year, by joining a field like Software sales, you have created the possibility of it happening. And if you stick to this field for a long enough time frame, it becomes unlikely that you don't have a good year at some point. The money does not count any less because it came during a "lucky" outlier year. Give yourself as many at bats as possible in fields with high upside, and some wildly cool shit can occur.
"the A’s not only were not a uniquely bad franchise, they had won the most games of any team in major league baseball in the ten years prior to the Moneyball season"
Not the point, but it struck me immediately that this can't possibly be true (having grown up and followed baseball in the 90's). So I downloaded team total wins from baseball-reference, and it looks like the A's had 779 wins from 1992-2001, which is fewer than Atlanta, Baltimore, Boston, Chicago (AL), Cincinnati, Cleveland, Houston, LAD, NYY, San Francisco, Seattle, Texas, and Toronto. They were basically exactly middle of the pack in wins over that span. Unless I'm completely losing my mind here.
It’s also a tad disingenuous, since none of the players that drove ‘88-‘90 AL Champ teams were on the 2000’s teams. McGwire was probably the last one, and he had foot issues before he was traded in ‘97.
I recognize that "writers, journalists, and other take-havers in the humanities don't like math" is a hasty and unfair stereotype.
That being said, much like the conversation with the totally fictional and definitely not real crime activist from a couple of weeks ago, I feel like this was subtly pointing fingers at a certain kind of person that Freddie has almost certainly interacted with consistently enough to be an archetype.
Freddie-
As you mention in the post, a lot of this can be applied to sports as well as education.
You have written extensively about the limitations of educators to change the underlying fundamentals regarding the abilities of the students that they have. Do you feel that a coach and a teacher have the same ability to affect outcomes?
In my opinion (based on no analysis, just my experience) a good teacher or coach may help the student or athlete achieve their personal best. But, given a bad teacher or coach, it can be really hard for even the best students or athletes to achieve their best.
This is not to say that a good athlete won't do better than a poor athlete in the same situation. It is to say an athlete's career can certainly be hampered by having a bad coach.
And, I don't think this is at odds with what Freddie has written about the limitations of educators to affect relative outcomes.
There’s been a lot of reporting on baseball players who have used outside coaching from places like Driveline (for pitchers; don’t remember the complement for hitters) and dramatically changed their career paths, JD Martinez and Justin Turner being the two examples I can think of top of mind. The book *The MVP Machine* goes into this in some detail, I believe.
While caveats about self-selecting samples apply (certainly with “signed by an MLB team”, probably also with “went to outside groups to improve”), it does also make me think of post-pandemic interventions on struggling students by giving them one-on-one tutoring. So while the central point of Freddie’s work is inevitable (differences in innate talent), outcomes could improve further with *intense* interventions. This, however, is unrealistic (those one-on-one tutors for literally everyone).
It’s just interesting to see a population where the incentives to improve are *so* strong, on both an individual and organizational basis, that you can see very quick adoptions of successful methods and intense individualized training.