Khan Academy at 15: What Worked, What Didn’t
In 2004, a hedge fund analyst named Salman Khan started tutoring his cousin Nadia in New Orleans from his apartment in Boston. She was twelve, stuck on unit conversions, and too self-conscious to ask her classroom teacher the same question twice. Khan, an MIT engineering grad with three degrees and a mild obsession with explaining things clearly, used Yahoo Doodle and a telephone. By 2006 he was recording the sessions as YouTube videos so Nadia could rewatch them and so her younger brothers could benefit too. He drew in Microsoft Paint. His voice was calm, slightly amused, and improvised. The lessons sat on a public channel because making them private seemed like more trouble than it was worth.
That casual decision became one of the largest single acts of educational distribution in history. By 2009 Khan had quit his job. In 2010 the project became a 501(c)(3) called Khan Academy, bankrolled first by Ann Doerr and then by the Bill and Melinda Gates Foundation, which reportedly wrote a check for 1.5 million dollars after Bill Gates said in an interview that he used the videos with his own kids. Fifteen years on, Khan Academy serves somewhere north of 160 million registered learners in dozens of languages, is embedded in countless U.S. school districts, and partners with the College Board on the official SAT prep product. A reasonable question, given the scale and the decade and a half of runway, is what the evidence actually shows.
Start with what plainly worked. The video library itself, now vastly expanded beyond Khan’s original recordings, gave a generation of students a free, calm, non-judgmental tutor they could pause and rewind. Anyone who has watched a bright seventh-grader get humiliated at the board for missing a step in long division can appreciate the psychological value of that alone. The product’s second act, the adaptive practice system, was more ambitious. Built on mastery learning principles that Benjamin Bloom described in his 1984 two-sigma paper, the Khan model required students to answer a streak of correct problems before advancing, with hints and step-by-step walkthroughs available on demand. Teachers got dashboards. Parents got progress emails. For students who actually engaged with the system in a sustained way, a reasonable body of research suggests small-to-moderate gains in procedural math skill.
The most cited study came out of the SAT partnership. College Board researchers reported in 2017 that students who used the free Official SAT Practice on Khan Academy for 20 hours gained an average of 115 points on their scores, and that the pattern of gains was consistent across income levels and racial groups. That finding received a lot of press and deserves some scrutiny: the study was observational, not randomized, and students who choose to practice for 20 hours may differ from students who don’t in ways that also predict score gains. Still, subsequent analyses have been broadly consistent, and the SAT partnership did something politically important by puncturing the assumption that only families who could afford Kaplan could compete on the test.
The classroom research is messier. The best-known implementation study, conducted by SRI International in 2011 and 2012 across nine Bay Area districts, found that teachers appreciated the platform, students generally liked it, and measured achievement effects were small and inconsistent. A 2020 study by researchers at Brown and the University of Pennsylvania looking at a statewide rollout found modest positive effects on state test scores. Other district-level evaluations have found essentially no effect. If you zoom out, the honest summary is that Khan Academy tends to produce small benefits when it is used seriously and thoughtfully, and no benefit when it is used as worksheets-with-a-login. This is not scandalous. It is the predictable finding for almost every classroom technology ever studied.
Where the platform runs into trouble is in the gap between its theory of learning and how students actually behave on it. The cognitive load involved in watching a video is genuinely lower than the load involved in wrestling with a textbook, which is part of the appeal, and part of the problem. Passive video consumption feels like studying without being studying. The practice problems partly compensate by forcing retrieval, but many students rush through them with hints until the streak completes. Researchers who study the testing effect have watched this pattern and winced. If you hint your way through twelve problems, you have practiced hint-taking, not math.
The mastery-learning promise also collides with the reality of motivation. Bloom’s two-sigma result came from one-on-one human tutors who could read a student’s face and escalate the challenge at exactly the right moment. A software system cannot do this yet. It can present progressively harder items, but it cannot tell when a student is bored, when they are frustrated in the productive way, and when they are frustrated in the way that precedes giving up. Keith Devlin, the Stanford mathematician who was an early and vocal Khan skeptic, argued in a 2012 essay that the videos reinforce a procedural view of mathematics at the expense of conceptual understanding. Fifteen years later, the critique has mellowed but not vanished. The exercises are better at teaching you how to compute than at teaching you what computation is for.
What Khan Academy has done, quietly and well, is become infrastructure. Many American middle schools now assume every student has access to it the way they assume access to a pencil. Homebound kids use it. Parents reteaching their children algebra during the pandemic used it. Adults going back for a GED use it. The recent push into Khanmigo, an AI tutor built on GPT-4 and piloted in districts including Newark and Hobart, Indiana, is the most interesting chapter the organization has opened in years, partly because it attempts to close exactly the motivational and diagnostic gap that the video library could not. Early reports are cautious optimism, and it is worth tracking how schools think about AI as a study partner rather than an outsourcing service.
A fairer verdict than either the hagiography or the backlash sounds like this. Khan Academy did not replace teachers, equalize outcomes, or crack the two-sigma problem. It never quite was going to. What it did was give tens of millions of students a second chance at a topic they felt lost in, and a first chance at topics their schools did not offer. It professionalized the genre of the educational explainer, dragged the SAT prep industry toward something resembling fairness, and demonstrated that a nonprofit could build durable software infrastructure without going public or selling out. For a project that began with a guy drawing fractions in Microsoft Paint for his cousin, that is a lot.
The next fifteen years will test whether the organization can pivot from video-plus-practice toward something more adaptive, more conversational, and more honest about where learning actually breaks. The evidence will, as always, be complicated.
Photo via Unsplash.