You Aren't Actually Mad at the SATs

you're mad at what they reveal

May 17, 2021

The University of California system is getting rid of its SAT/ACT requirement. More will follow.

There’s a lot to say. First, we must distinguish between two types of tests, or really two types of testing. When people say “standardized tests,” they think of the SAT, but they also think of state-mandated exams (usually bought, at great taxpayer expense, from Pearson and other for-profit companies) that are designed to serve as assessments of public K-12 schools, of aggregates and averages of students. The SAT, ACT, GRE, GMAT, LSAT, MCAT, and similar tests are oriented towards individual ability or aptitude; they exist to show prerequisite skills to admissions officers. (And, in one of the most essential purposes of college admissions, to employers, who are restricted in the types of testing they can perform thanks to Griggs v Duke Power Co.) Sure, sometimes researchers will use SAT data to reflect on, for example, the fact that there’s no underlying educational justification for higher graduation rates1, but SATs are really about the individual. State K-12 testing is about cities and districts, and exists to provide (typically dubious) justification for changes to education policy2. SATs and similar help admissions officers sort students for spots in undergraduate and graduate programs. This post is about those predictive entrance tests like the SAT.

Liberals repeat several types of myths about the SAT/ACT with such utter confidence and repetition that they’ve become a kind of holy writ. But myths they are.

SATs/ACTs don’t predict college success. They do, indeed. This one is clung to so desperately by liberals that you’d think there was some sort of compelling empirical basis to believe this. There isn’t. There never has been. They’re making it up. They want it to be true, and so they believe it to be true.

The predictive validity of the SAT is typically understated because the comparison we’re making has an inherent range restriction problem. If you ask “how well do the SATs predict college performance?,” you are necessarily restricting your independent variable to those who took the SAT and then went to college. But many take the SAT and do not go to college. By leaving out their data, you’re cropping the potential strength of correlation and underselling the predictive power of the SAT. When we correct statistically for this range restriction, which is not difficult, the predictive validity of the SAT and similar tests becomes remarkably strong. Range restriction is a known issue, it’s not remotely hard to understand, and your average New York Times digital subscription holder has every ability to learn about it and use that knowledge to adjust their understanding of the tests. The fact that they don’t points to the reality that liberals long ago decided that any information that does not confirm their priors can be safely discarded.
There is such a movement to deny the predictive validity of these tests that researchers at eminently-respected institutions now appear to be contriving elaborate statistical justifications for denying that validity. Last year the University of Chicago’s Elaine Allensworth and Kallie Clark published a paper, to great media fanfare, that was represented as proving that ACT scores provide no useful predictive information about college performance. But as pseudonymous researcher Dynomight shows, this result was a mirage. The paper’s authors purported to be measuring the predictive validity of the ACT and then went through a variety of dubious statistical techniques that seem to have been performed only to… reduce the demonstrated predictive validity of the ACT. As someone on Reddit put it, the paper essentially showed that if you condition for ACT scores, ACT scores aren’t predictive. Well, yeah. Conditioning on a collider is a thing. Has any publication in the mainstream press followed up critically about this much-ballyhooed study? Of course not.

Why did so many publications simply accept the Allensworth and Clark paper as given? Well, 1) most education reporters lack even basic statistical literacy and 2) the paper found the outcome that confirms the worldview of media liberals. As for the researchers themselves, I emailed them a month ago to give them a chance to defend their work; predictably, they did not respond. Does this paper constitute research fraud? No, I don’t think that would be fair. I’m sure they think the results are genuine. But aside from the jury-rigged conclusion, as is increasingly the case the paper itself simply doesn’t make the claims the press release made with anything like equal strength. Allensworth and Clark allowed the media to circulate a false claim using their statistical machinations as justification. That’s an ethical problem on its own. They will, of course, pay no professional penalty for this, as (again) the field of Education wants this result to be true.

We are, in fact, quite good at predicting which students will perform well and which won’t in later educational contexts, in all manner of contexts and with many different tools, and we have been for 100 years. I’m sorry that this is injurious to some people’s sense of how the world should be but it’s true. The SAT and ACT work to predict college performance.
The SATs only tell you how well a student takes the SAT. This is perhaps a corollary to 1., and is equally wrong. They tell us what they were designed to tell us: how well students are likely to perform in college. But the SATs tell us about much more than college success. Let me run this graphic again.
Robertson et al 2010
The SAT doesn’t just predict college performance, though it does that very well. It predicts all manner of major life events we associate with higher intelligence. And since the SAT and ACT are proxy IQ tests, they almost certainly provide useful information about all manner of other outcomes as well. Intelligence testing has been demonstrated to predict constructs like work performance over and over again. The SAT/ACT are predictive, and they predict differences between test takers because not all people are academically equal. Obviously.
SATs just replicate the income distribution. No. Again, asserted with utter confidence by liberals despite overwhelming evidence that this is not true. I believe that this research represents the largest publicly-available sample of SAT scores and income information, with an n of almost 150,000, and the observed correlation between family income and SAT score is .25. This is not nothing. It is a meaningful predictor. But it means that the large majority of the variance in SAT scores is not explainable by income information. A correlation of .25 means that there are vast numbers of lower-income students outperforming higher-income students. Other analyses find similar correlations. If SAT critics wanted to say that “there is a relatively small but meaningful correlation between family income and SAT scores and we should talk about that,” fair game. But that’s not how they talk. The routinely make far stronger claims than that in an effort to dismiss these tests all together, such as here by Yale’s Paul Bloom. (Whose work I generally like.) It’s just not that hard to correlate two variables together, guys. I don’t know why you wouldn’t ever ask yourselves “is this thing I constantly assert as absolute fact actually true?” Well, maybe I do.
In general, progressive and left types routinely overstate the power of the relationship between family wealth and academic performance on all manner of educational outcomes. The political logic is obvious: if you generally want to redistribute money (as I do) then the claim that educational problems are really economic problems provides ammo for your position. But the fact that there is a generic socioeconomic effect does not mean that giving people money will improve their educational outcomes very much, particularly if richer people are actually mildly but consistently better at school than poorer for sorting reasons that are not the direct product of differences in income. That is, what correlation does exist between SES and academic indicators might simply be the metrics accurately measuring the constructs they were designed to measure.
And throwing money at our educational problems, while noble in intent, hasn’t worked. (People react violently to this, but for example poorer and Blacker public schools receive significantly higher per-pupil funding than richer and whiter schools, which should not be a surprise given that the policy apparatus has been shoveling money at the racial performance gap for 40 years.) All manner of major interventions in student socioeconomic status, including adoption into dramatically different home and family conditions, have failed to produce the benefits you’d expect if academic outcomes were a simple function of money. I believe in redistribution as a way to ameliorate the consequences of poor academic performance. There is no reason to think that redistribution will ameliorate poor academic performance itself.
SATs are easily gamed with expensive tutoring. They are not. This one is perhaps less empirically certain than the prior two and on which I’m most amenable to counterargument, but the preponderance of the evidence seems clear to me in saying that the benefits of tutoring/coaching for these tests are vastly overstated. Again, a simplistic proffered explanation for a troublesome set of facts that then implies simplistic solutions that would not work.
Going test optional increases racial diversity. This one, I think, must be called scientifically unsettled. However both Sweitzer, Blalock, and Sharma and Belasco, Rosinger, and Hearn find no appreciable increase in racial diversity after universities go test-optional. “Holistic” application criteria like admissions essays almost certainly benefit richer students anyway. What’s more, we have to ask ourselves what “diversity” really means in this context. Private colleges and universities keep the relevant data close to the vest, for obvious reasons, but it’s widely believed that many elite schools satisfy their internal diversity goals for Black students by aggressively pursuing wealthy Kenyan and Nigerian international students, whose parents have the means to be the kind of reliable donors that such schools rely on so heavily. I’m not aware of a really comprehensive study that examines this issue, and it would be hard to pull off, but the relevant question is “do various policies intended to improve diversity on campus actually increase the enrollment of American-born descendants of African slaves?” I can’t say, but you can guess where my suspicions lie.

Any useful discussion of these issues has to start with getting past the mountains of fake facts and folk wisdom that progressive people have been peddling since forever. If you’re anti-SAT/ACT, say so - but stop making empirically indefensible claims.

All of that is prologue to the bigger point: the controversy over college entrance examinations stems not from the examinations themselves, but from the fact that they reveal profound differences in human capital that make progressives uncomfortable. The SATs don’t create inequality. They reveal inequality.

Why do people have such revulsion towards the SATS? Because they produce unequal results; some students perform better than others on the test. Of course, this is the very function of testing, to reveal underlying inequality, in this case underlying academic aptitude or ability. In fact, the more valid a test is, the more powerful it is, the more inequality it reveals, as it becomes capable of demonstrating finer and finer-grained distinctions between test takers. Most people are bothered by this tendency to reveal inequality because of troublesome and persistent group differences. Traditionally the gender education gap was cited as a source of concern, but because the gender positions have flipped (outside of a few stubborn fields), most progressive people don’t care much3. The racial achievement gap, however, is still the singular obsession of the American education politics, policy, and research world, and despite periodic predictions that it will soon close, it remains stubbornly real. And that’s ultimately where the anti-SAT/ACT animus comes from: Black and Hispanic students significantly underperform white and Asian, and this is vexing for obvious reasons.

The racial achievement/performance gap is a curious thing even in the context of an American political discourse that seems to get more bizarre by the day. That the gap exists is, on balance, not controversial. Gaps in performance are observed on essentially every measured academic metric, though the size of the effects vary from context to context, and the general distribution is Asian American students at the top, white students next, then Hispanic, then Black. The Black-white gap in particular has shrunk from the era of (explicitly) segregated schools but progress has not been consistent or linear. Most people in academia and politics admit it exists: prominent Black politicians like Barack Obama and Kamala Harris reference it, every major think tank and foundation operating in the educational space identifies it as a major priority, and the NAACP used to address if often, though their Education and Education Strategy pages have recently disappeared so it’s hard to know where they stand now. These things are faddish but once upon a time every other dissertation written by someone getting a PhD in Education was about the gap. We can observe it even outside of reference to controversial tests, such as noting that the white high school graduation rate is 10% higher than that for Black students. The achievement gap is a thing.

And yet I also find a rapidly-congealing social prohibition against talking about these gaps in progressive spaces. If you refer to a racial achievement gap in a lot of liberal or left contexts now, you’ll find that people clam up fast and get visibly uncomfortable, even if you take pains to point out that an academic achievement gap does not imply an academic potential gap. People just don’t want to acknowledge that gaps exist at all; our racial discourse appears to have become such a blunt instrument that the acknowledgement of racial difference is controversial even when you preface discussion with the belief (that I hold) that the gap is the product of innumerable environmental and sociocultural factors rather than genetics or other inherent differences. Simply saying “Black students consistently score lower on tests like the SATs, have lower average GPAs, and have worse metrics on ancillary concerns like truancy” - again, Barack Obama’s position, Kamala Harris’s position, Cory Booker’s position - is enough for people to start launching into harangues about the inherent violence of those comparisons. People just do not want to talk about this stuff.

The rush to rid the world of the SAT is based on this dynamic. Because Black and white students are not equal in academic preparedness, and because we have failed to close the gap in a half-century of concerted policy effort to do so, we must eliminate the tools that reveal it, such as the SAT. Similarly, the movement to shutter gifted and talented programs4, due to the racial inequalities therein, demonstrates an attempt to shut down those structures that make educational inequality visible. It should go without saying that this will not doing anything to close the gap in actual ability.

Of course, if eliminating entrance examinations is an effort to improve the standing of Black students, it is necessarily also an effort to hurt the standing of students from certain other races. (When we’re talking about changing relative performance - or, really, perception of relative performance - everything is zero sum. For some to rise, others must fall, by definition.) In today’s political climate, hurting white kids to benefit Black is not a hard pull, politically, at least in academia. But claims that the SAT is an artifact of white supremacy, which are common these days, are hard to square with the fact that Asian Americans handily outperform white, even when controlling for income band. Asian test takers on average score 100 points higher than white, and are 3.5 times more likely to score in the 1400-1600 range than white. (Data.) Whatever else the effort to eliminate entrance testing may be, it is absolutely, undeniably, objectively an effort to disadvantage Asian students. If people want to fight this battle, fine, but anyone who does so without acknowledging its negative effects on Asian American applicants is a propagandist.

Those concerns with group differences, at least, have some sort of basic political logic and are amenable to complaints that they are the product of systemic inequality. (They are, but not the inequalities that people think, and again the SAT gap is a result of systemic inequality, not a cause of systemic inequality.) More disturbing to me is the rise of resistance within academia to the notion of inequalities between individuals. When I was in grad school more than a half-decade ago, I observed with some considerable unhappiness that it had become increasingly socially unacceptable to speak of some students as simply better students than others, as being more talented, harder working, or more prepared. All of this was seen as inegalitarian and, eventually, as “white supremacist” even if every student being compared in a given context was white. There were many instructors back then who bragged about giving all students As, etc., and I must assume this practice has only grown over time. In the humanities and social sciences especially there is a growing movement to reject assessment, including grading - the means through which we sort better students from worse - as the hand of illegitimate power that “does violence” to the students who voluntarily attend college.

When I worked at Brooklyn College, many faculty members resisted assessment efforts. This is typically quite sensible: internally-generated college assessment data, required by all of the regional accrediting agencies, is usually of such poor methodology and data quality that it is simply not worth gathering. But some went further and said that student performance should not be assessed, period, that making performance distinctions between students is a type of injustice, complicity in the neoliberal machine, or similar. Assessment is as old as education, but many in academia now don’t argue about how to do it (which is an essential conversation) but that it should not be done at all.

Of course, that complicity in the neoliberal machine is not some recent injustice; it is the very reason that colleges and universities are funded by our society at all. If this trend continues, not just eliminating SAT requirements or increasingly refusing to hierarchize students with grades but in rejecting the entire sorting function of the university, academia will collapse. Wealthy parents aren’t paying Harvard to enrich their children in the humanistic sense. They’re paying Harvard to act as a marker of their child’s superiority in the labor market and the social hierarchy. Employers value college because it provides at least some meaningful information about who will succeed as a worker; remove that function and the financial justification for a hideously expensive system dies. I would love if education dropped its association with meritocracy, but that cannot occur within our current system. The professors who self-aggrandize through their rejection of their hierarchizing function, if successful, would cause the doom of the modern university. (These tenured radicals, of course, never are so moved by the inherent inequities of academia that they quit the profession.)

Today, it is somehow controversial to say “some people are smarter than others,” a reflection of one of the simple brute realities of human life and something that has been accepted as true for thousands of years.

Here is the essence of it: hierarchies of relative academic performance are remarkably stable throughout life, due to differences in inherent or intrinsic academic ability of whatever origin, and the SATs and similar mechanisms reveal those differences in a way that liberal America is increasingly unable to accept. This is the source of all of this angst, not the technical details of whether a test is fair or valid or just, but a liberal intelligentsia that is incapable of honestly confronting the fact that different human beings have fundamentally different intrinsic abilities. I believe in political equality, social equality, equality of rights, equality of dignity, equality of protection under the law. But the notion that all people are equally talented, in academics or anything else, is an absurdity, and as much as people will rush to deny intrinsic difference, I suspect that pretty much everybody knows that they are real. When you were a child you casually assumed that some of your classmates were naturally better at school than others, and you did because it was true.

This is the conversation that I tried, and failed, to force with my book: left-of-center political movements, from center-left to radically socialist, cannot achieve the goal of the greater good for everyone, including greater political and economic equality, while pretending that we believe in equality of human ability. The only way to intelligently address various social, economic, and political equalities related to differences in human potential is to acknowledge that those differences exist. The current rending of garments regarding inequalities within our education system has led to certifiably bizarre situations like the movement, currently gathering steam, to teach math as if it is as subjective as literature or art. But this won’t make Black kids or poor kids or girls or anyone else actually better at math. And if the universities really give up their function of creating an academic hierarchy for political reasons, employers will find new systems that do that, or a lot of people will get hired and quickly fired for not being competent. This is not an intelligent policy approach. Getting rid of the SATs won’t make unprepared kids prepared. It won’t make naturally untalented students naturally talented. It won’t make kids who aren’t smart into smart kids. All it will do is hide the reality of those unpleasant inequalities.

Trying to fight educational inequality by getting rid of the SAT is like trying to fight climate change by getting rid of thermometers. It is as indicative of a heads-in-the-sand attitude as I can possibly imagine. For those who think that I’m “leaving children behind,” again, I wrote a whole book to detail what to do for the untalented. And I do have some allies here, including in unexpected places [LINK CORRECTED]. But the bottom line is that dropping the SAT, or exams for elite high schools, or college grades can never help those who struggle at the bottom of the totem pole. The only way to help them is to acknowledge that some of them will always exist.

Campbell’s law, friends, Campbell’s law.

This post is not about census-style state standardized K-12 tests, but the official Freddie position is that this testing is harmful and unnecessary, as we enjoy the power of inferential statistics. One thing we do can very effectively in ed research is to impute performance of an untested majority of a population through careful stratification and sampling of a tested minority. We don’t need to test everyone to know how our schools, districts, or states are doing; you test a sample and use responsible procedures to generalize. We know how to do this. Indeed, the NAEP is widely considered the gold standard of American educational testing. The NAEP is not a census-style test, and yet it provides us excellent information about how our schools are doing. So why do we do census-style testing, when the resources required to do so are immense and the disruption to schools is so obvious? Because doing so is conducive to the broken “education reform” movement in this country, and that movement represents bipartisan elite consensus.

College administrators care, and in fact many schools are going to greater and greater lengths to contrive incoming freshman classes with greater gender parity, but I suspect that this is because schools with large gender imbalances are considered less attractive to potential students for social reasons.

I am on record as saying that I’m not super beat up about losing gifted and talented programs, as the students in them are going to be fine no matter what. They are, after all, gifted and talented.

83 Comments

Commenting has been turned off for this post

Lamoille

This was a great read. Unfortunately, it seems very rare to address counter arguments in good faith as is done here.

I would propose one addition to #3: SES correlates with academic performance and parental academic performance is also correlated with household SES. There's so much noise here, but I would suggest three things are going on here among the elite: an advantage in material resources, likely a stronger culture of valuing education and some hereditary inheritance of ability. The last point (hereditary inheritance) seems the hardest to talk about (I can feel the thunderclouds forming over my head). It also suggests that there will be some limit in generational economic mobility, i.e. the economic quintile you were born into will almost always be correlated with where you wind up no matter how much redistribution you have.

Expand full comment

2 replies

Michael Weissman

Excellent article.

Here's my technical case along the same lines for the GRE for physics grad school:

Here's some of my work that bolsters your case, although it concerns GRE's rather than SATs. The common theme is the extraordinary lengths of embarrassing pseudo-scientific foolery to which the anti-test types resort.

Princeton Merton seminar: https://www.youtube.com/channel/UCHEvLUxTWGsAjNjR3epRiQw

arXiv paper with references to earlier technical publication in Science Advances: https://arxiv.org/abs/1902.09442

discussion on Andrew Gelman's blog : https://statmodeling.stat.columbia.edu/2020/12/14/debate-involving-a-bad-analysis-of-gre-scores/

My one quibble is that I think you've sort of muddled the description of collider stratification bias. It's distinguished from simple range restriction because unlike simple range restriction the stratification induces a correlation between the suspected cause and the other variables with which it collides on the stratified variable. If those other variables are outside the model, that typically induces a negative bias in the coefficient estimate.

6 replies by Freddie deBoer and others