The School Reformer "Accountability Era" Narrative Simply Does Not Add Up
neither history nor international data supports their theories of decline
Recently, the usual suspects (such as Matt Yglesias and Jon Chait and the crew at the Argument) have been going hard on the idea that American education improved during the “accountability era” - the era of No Child Left Behind and the larger movement it epitomized - and that a subsequent walking back of those efforts has led to subsequent declines. NCLB and the whole concept of turning the screws on teachers and students led to better American educational metrics, their story goes, and since those efforts were abandoned we’ve slid back again. This is wrong on many levels - for one thing, the “accountability era” is still with us - and it particularly looks foolish when we look at the broader international context. But let’s go through things one by one.
The “Accountability” Era Never Ended!
One of the hallmarks of the post-social justice era is watching people wrestle with the material consequences of the social justice era. A big part of doing so intelligently is understanding that those material consequences were in fact very limited relative to the discursive dominance of social justice signals from 2012ish to 2022ish. And nowhere is the divide between rhetoric and reality larger than in public education. Yes, a lot of loud voices were calling standardized testing racist and demanding that we decolonize curricula. But when you go looking for how that actually influenced brick & mortar policy, there’s not much there.
The reform movement’s claim that the accountability era has ended doesn’t survive contact with the facts. The 2015 passage of the Every Student Succeeds Act (ESSA) is sometimes nominated as the moment where the accountability era died, thanks to the fact that ESSA superseded No Child Left Behind, which is the North Star of accountability types. (As you would expect, partisan Republicans who hate Barack Obama tend to favor this timeline.) But as their similar titles imply, NCLB and ESSA are very similar bills, with the latter’s most consequential changes amounting to a softening of the absurd performance mandates that district after district simply couldn’t meet. (The Department of Education’s ESSA explainer page takes pains to highlight continuity with NCLB, which the Obama admin stressed in their push to get ESSA passed, but does say “over time, NCLB's prescriptive requirements became increasingly unworkable for schools and educators.” You don’t say.) The Obama administration had been forced to issue waiver after waiver when it became clear that NCLB’s procrustean standards simply couldn’t be achieved by actual schools and actual students. ESSA was largely a reflection of bipartisan recognition of those limits.
Otherwise, ESSA preserved NCLB’s core testing regime nearly intact: annual standardized assessments in reading and math in grades 3-8 and once in high school, plus science testing three times during a student’s K–12 career, with the 95 percent participation requirement still attached to federal Title I funding. (Though in comparison to NCLB enforcement of that requirement is largely left to states.) States still have to disaggregate results by race, income, disability, and ESL status, identify their lowest-performing five percent of schools for so-called “comprehensive support and improvement,” and flag schools with persistent racial achievement gaps for “targeted support.” That’s all core NCLB stuff. ESSA is best understood as a reform of how states meet federal accountability requirements than a repeal of the requirements themselves. And the clue is in the names: No Child Left Behind, Every Student Succeeds…. The only way the Obama administration was going to get very hostile Congressional Republicans to pass the bill was by emphasizing continuity with Bush’s NCLB.
What changed at the federal level after 2015 was largely a) rhetorical and b) administrative; the substance of test-based accountability was picked up and carried forward by the states. Every state continues to operate a federally required accountability system that rates schools using student test performance as the dominant input, though ESSA provoked the addition of “school quality” and “student success” measures. The large majority of states still assign schools A–F letter grades, 1–5 star ratings, or similar summative labels, driven primarily by proficiency and growth on state assessments. (Some states break this information down into “dashboards” of data, but the practical difference for schools and districts is slight.) Florida, Ohio, Texas, Arizona, Louisiana, Indiana, and Mississippi, among others, actually tightened their rating formulas after ESSA, and several expanded high-stakes uses such as third-grade reading-retention laws and high school exit expectations that depend on the same tests NCLB mandated. When Texas rolled out its revamped A–F ratings in 2023, the result was not a retreat from accountability but a lawsuit from dozens of districts because the state had made the stakes so much more severe.
Obama admin teacher-accountability infrastructure (the Race to the Top stuff) has likewise proved far stickier than reformers now acknowledge. Even after the Department of Ed lifted the waiver-driven pressure to tie evaluations to value-added scores (because VAMs are an absolute mess), a majority of states have still required student-growth measures in evaluations, and something like half still require them to be a “significant” factor. Charter authorization decisions, school-closure policies, state-takeover regimes… all remained premised on the same test-score triggers pioneered under NCLB. The “Nation’s Report Card” still gets published on its NCLB-era schedule, and districts continue to live and die by those numbers in the local press. It’s just weird to act as though we’re in a dramatically different era of American public schooling; we are not.
This continuity matters for the empirical question of whether accountability “worked.” The basic architecture - annual census-level testing, achievement-gap reporting, consequences for low performers, evaluation regimes tied to score - has been running continuously for almost twenty-five years. So, can flat long-term standardized test trend lines, persistent (and in some subjects widening) racial achievement gaps, and continued middling U.S. performance on PISA honestly be blamed on a premature retreat from “accountability”? What does that even really mean? The ed reformers claim that the accountability project was abandoned before it could deliver what it was intended to, but that’s a claim the statutes, state rating systems, and evaluation policies do not support. A more candid reading of the record is that accountability was sustained long enough and consistently enough to be judged on its own terms - and on those terms, the theory of action has not been vindicated. Because while federal policy can have big impacts on fairness and quality of life for teachers and schools, there’s simply no reason to think that it has consistent or meaningful effects on test scores.
Were there some bad educational ideas batted around in the 2010s? Sure. Ending algebra in 8th grade out of specious equity concerns, for example, was misguided in profoundly predictable ways. (The affluent kids just get algebra instruction outside of public schools, whether in private schools or with private tutors, rendering the whole thing a farce.) But there’s a reason that San Francisco is so often singled out for this bad policy: the number of districts who enacted it was tiny. I can’t get straight numbers, but I believe that the number of districts that eliminated the option for 8th graders to take algebra is in the single digits; there are more than 13,000 public school districts in the United States. Even San Francisco has rolled this policy back. Yes, participation rates are down, but that’s largely due to changes in standards, particularly Common Core sequencing - yes, the de facto national curricular standards beloved by the accountability people. This is one of the weird things about this whole debate, the way that the rhetoric of a loud fringe and the actions of a tiny number of outlier schools and districts are mistaken for actual meaningful pedagogical and policy change. They aren’t. More than a decade after its repeal, it’s remarkable, the degree to which NCLB still determines national ed policy.
The Causal Claims Are Contentious at Best
Probably the most important thing to say in this whole debate is that post hoc, ergo propter hoc is fallacious reasoning. The NCLB-caused-the-gains argument commits perhaps the most elementary error in causal inference: mistaking a temporal correlation for a mechanism, “after this, therefore because of this.” Some degree of that reasoning is inevitable in public policy, but it’s always fraught, and especially so here given that educational performance is so immensely multivariate, contextual, class-stratified, and historically contingent.
To begin with, NAEP scores in math and reading were already rising before NCLB was signed in January 2002. The 1990s had seen consistent gains, particularly among Black and Hispanic students. Why those gains were occurring is subject to as much debate as all the rest of this stuff; it will not surprise you to hear that I suspect that the remarkable decline in concentrated poverty during that decade played a large role. Whatever the causes, to credit NCLB with gains that began a decade prior doesn’t make much sense. More fundamentally, there is no counterfactual: every American public school was subject to NCLB simultaneously, so there is no control group against which to measure its effect. Without one, the claim that NCLB caused gains is simply not falsifiable in any rigorous sense, which means it’s a story imposed on a trend line rather than a scientific claim - a story of obvious ideological convenience for those in America’s still-dominant neoliberal policy establishment.
The historical picture also cuts against the narrative in inconvenient ways. NAEP gains during the NCLB era were heavily concentrated in elementary grades and in math (precisely the subjects and levels where the test-and-punish pressure was most intense) while reading gains at the 8th grade level were much weaker, and 12th grade scores barely moved at all. This is exactly the pattern you’d expect not from genuine learning improvements but from score inflation through fraud and teaching to the test. Schools under threat of sanctions a) participated in out-and-out test score manipulation, the extent of which we can really can only guess at, and b) drilled students on the narrow content most likely to appear on state assessments. Both fed into modestly higher NAEP performance at tested grades while likely leaving deeper literacy and higher-order reasoning largely untouched. It’s not like this dynamic was a surprise; researchers like Aubrey Amrein and David Berliner documented this pattern extensively years before NCLB was enacted. Likewise RAND, an organization that has published a healthy amount of skeptical takes but generally endorses the accountability narrative, had found Texas’s own high-stakes testing regime (the model for NCLB) showed larger gains on their own state tests than on NAEP, a very telling sign of score inflation rather than learning. If NCLB had genuinely improved student performance, you would expect broad, deep, sustained gains; what we got was narrow, shallow, and concentrated exactly where the incentive pressure was highest.
I am 100% aware that it’s very difficult to establish real causality in an educational research setting - I’ve made this point explicitly myself - but this difficulty does not excuse making broad claims about wide-ranging and complex social phenomena based on “X came after Y, therefore Y caused X.” National trends outside of the classroom, like those relating to food insecurity, often have the biggest impact on test scores. Given that knowledge, ascribing noisy NAEP score changes to national policies that were implemented piecemeal and at very different rates is irresponsible, especially given the surge in scores from the 1990s and how it complicates the simplistic narrative.
How Could American Pedagogical & Policy Decisions Explain Educational Declines Seen Across the Developed World?
These arguments about recent American educational decline, and the policy and pedagogical changes that supposedly provoked it, almost always fixate on NAEP scores, which is defensible to some degree because the NAEP is indeed a high-quality national assessment. But the NAEP is, indeed, national - it only tells us about the United States - and as such it can’t help us ascertain which educational trends are likely to be the result of national policy and which are the product of larger social forces. The Programme for International Student Assessment (PISA), the comparative educational assessment put together by the Organization for Economic Co-operation and Development, provides exactly that opportunity. And when we look at close analogs to the United States, we find that almost all of them have seen the same broad trends as our schools have.
So, here. Take a look at the PISA Math trends in Western European countries, which is the region that’s the closest match with the United States in terms of economics, culture, and shared history. Look at how uniform these trends have been! The baseline where each country starts from is different, but the trends over time are remarkably similar.
Now, here’s the same chart with the USA and OECD average highlighted. As is mathematically inevitable, the individual country USA line is more variable than the OECD average. And yet you can see a broad symmetry in how the line has trended, other than the 2006 dip for the United States. The 2015 American decline is steeper than that of the OECD average, but again, that’s how averages work. The 2018 to 2022 drop is actually larger for the OECD overall, which has had the effect of causing America’s math ranking to rise. (Curiously, the ed reform crowd doesn’t mention this.) Overall, though the American baseline is meaningfully lower than the OECD average, the general direction and intensity of the change is quite similar. I find it powerfully difficult to justify explaining these trends via reference to American local, state, or federal education policy. Just how powerful is the San Francisco school board?
If you think it’s cherry-picking to only look at Western Europe, here’s the ten highest performing countries.
Unsurprisingly, looking at just the top 10 performers plus the United States does us no favors. (Remember, we have always done very badly on international educational comparisons.) This graph does, however, demonstrate that this decline is a broad-based phenomenon across countries and performance levels. The countries that buck the trend here are Singapore, an incredibly wealthy city-state that is an outlier in every possible sense and which hosts an educational system that has repeatedly been called inhumane; Taiwan, which has a population 1/15th the size of the United States and is also an outlier in myriad ways; South Korea, which saw a pitched mid-2010s decline and has since largely treaded water; and Japan, which has oscillated around a similar point of excellence throughout the history of the PISA. Meanwhile, the bottom six countries in the top ten, European nations with more in common with our country than East Asian outliers, saw declines similar to ours. Again, I ask you to consider what’s more likely: that all of these countries saw remarkably similar declines to those among America’s students in the same timeframe, or that some non-policy, non-pedagogy force was at play?
How about reading? Well, this is a little more difficult; the PISA reading test has been less regimented over time compared to the math, and in fact in 2006 sampling issues led to the United States being removed from the comparison. Still, let’s take a look.
Well, look at that! The USA is on top when compared to Western Europe. That doesn’t look like a country that’s suffering uniquely from an abandonment of accountability to me. Our modest increase from 2018 to 2022 looks pretty damn good in the face of all that decline.
In fact compared to the OECD average American reading looks downright rosy, although that’s against a baseline of declining advanced-world literacy.
Even within the top ten countries, with the Singapore outlier again easy to exclude, we’re looking ordinary, running to pretty good. There is no help for the “accountability” narrative here.
The PISA declines visible in American math and reading scores over the 2003–2022 period aren’t remotely anomalous; they’re part of a near-universal pattern among wealthy, developed democracies. In particular, the Netherlands, Finland, Belgium, Canada, and Australia - that is, countries with many economic and social similarities but radically different curriculum philosophies, funding structures, pedagogical traditions, etc - all show trajectories strikingly similar to that of the United States. (In fact Finland, long held up as the gold standard of education reform and frequently invoked as a rebuke to American approaches, has seen some of the steepest reading declines in the developed world.) If policy and pedagogy were the primary drivers of American underperformance, one would expect American trends to diverge from those of peer nations, to look distinctively bad in ways that track distinctively American choices. Instead, what the data show is convergence: a broad, shared downward drift across the developed world that almost certainly reflects forces operating above the level of any individual nation’s classroom policy. Pinning these trends on American policy choices, without accounting for why virtually identical trends appear in countries that made very different choices, is not serious analysis.
What could those “forces operating above the level of any individual nation’s classroom policy” be? Well, I was just telling you not to make broad claims about the causes of widespread changes in educational metrics without strong evidence. But what do I suspect? I suspect that it’s related to the fact that children and adolescents have, in the past ten or fifteen years, almost universally adopted a kind of technology that has unique capacity to suck up their attention, drain their mental energy, and waste their time. I think in a decade we’re going to have very strong evidence that it was always the smartphones.
Which means that, once again, American teachers and schools are not guilty of the horrible crimes against children’s potential that they have been accused of. Then again, “accountability” was always less about education policy in the substantive sense and more of a political and moral narrative. Demanding accountability allowed elites to believe that compassion consisted of demanding more from teachers who were asked to do the impossible and students struggling against major socioeconomic barriers. But politicians and neoliberal wonks found that this profoundly unfair behavior towards public educators could be effectively rebranded as high expectations. Accountability rhetoric allowed politicians to posture as champions of children while systematically undermining the working conditions of teachers and narrowing the curriculum to whatever could be cheaply measured. We allowed pundits to talk endlessly about “what works” to improve test scores while refusing to confront the most basic empirical fact in all of education: that schools are downstream of society, not the other way around.









This all makes sense to anyone who’s talked at all to teachers who have been in the classroom over the last ten or fifteen years. No, the schools didn’t somehow do away with “accountability.” Teachers have always been expected to do the impossible by making sure every child is above average and the bottom quarter of all-but-unteachable kids are nevertheless regarded as future potential college graduates. The challenge has always been to teach kids who don’t want to learn or who don’t come from a home environment that supports learning. That’s it. That’s the whole game. More tests / assessments / curricula / benchmarks were never going to change that.
I continue to be bemused that Singapore is regarded as a model for other countries to attempt to emulate. The entire area of Singapore is only slightly more than half of the Brisbane City Local Government Area, which is itself only a fraction of the area of metropolitan Brisbane. As well as geographic compactness, Singapore has the kind of political economy and political ecology that is only possible in an entirely urban polity. It does not face the complications faced by a country with a countryside, a rural population, and land-based and resource-based industries. This is not to mention the other aspects of Singapore that simply can't be replicated in other countries.