Learning & Cognition

Spaced Repetition, Done Wrong

A stack of handwritten index cards fanned out on a wooden desk with a pen beside them, under warm lamp light.

A first-year medical student writes to an Anki forum in early fall. He is 14,000 cards into the AnKing deck. His daily review queue is 340 cards. He is rating almost everything as Good because the Hard button makes his workload explode and the Again button feels like failure. He sleeps six hours a night. His exam scores are only modest, which surprises him, because on paper he is doing exactly what the upperclassmen recommended. Something is wrong with the system, he thinks. Maybe the algorithm is broken.

The algorithm is not broken. He is.

This is the most common and least discussed failure mode of spaced repetition, and it has very little to do with the software. Anki, Mnemosyne, RemNote, the various SuperMemo descendants — all of them implement an algorithm that is mathematically sound given honest input. The problem is that honest input is cognitively expensive, and students under exam pressure find a thousand small ways to lie to it without realizing they are doing so.

The algorithm itself deserves a quick sketch, because the failure modes only make sense if you understand what it is trying to do. Hermann Ebbinghaus, testing nonsense syllables on himself in 1880s Berlin, mapped the forgetting curve: retention falls steeply in the first hours after encoding and then levels off at a much lower floor. What Ebbinghaus saw, and what a century of replications confirmed, is that if you review the material right before it would have been forgotten, the retention curve resets more slowly the second time. Review it again at the right moment, and the curve flattens further. The intervals between successful reviews can lengthen by factors of two, three, ten. A card you have reviewed eight times successfully might not need to be seen again for a year.

Piotr Woźniak, a Polish polyglot and programmer, formalized this into the SM-2 algorithm in the late 1980s for his program SuperMemo. Every modern spaced repetition scheduler is a descendant of SM-2, with various refinements. Anki’s default scheduler is essentially SM-2 with a small set of user-adjustable parameters. FSRS, the Free Spaced Repetition Scheduler that has been replacing SM-2 as the Anki default since 2023, is a statistical model trained on millions of real user reviews that does the same job with better per-user calibration.

The algorithm asks you, at each review, to report how well you remembered the card. Anki’s four buttons — Again, Hard, Good, Easy — map to confidence levels, and the scheduler uses them to adjust the next interval. If the input is honest, the schedule stabilizes into something efficient. If the input is not honest, the schedule drifts, and the drift compounds in nasty ways.

The first and most common dishonesty is rating cards too high. A student half-remembers a drug’s mechanism and hits Good because Hard would shorten the interval and he already has 340 reviews to clear. The algorithm, trusting him, pushes the card out two weeks. In two weeks he has forgotten it entirely, hits Again, and the card comes back into daily rotation. He has learned nothing, wasted time at both the lied-to review and the fail, and damaged the scheduler’s model of his memory. Do this across a deck for a few months and the scheduler’s intervals become nearly meaningless. The student experiences this as “the algorithm is giving me too many reviews,” which is not wrong, but the cause is him.

The second dishonesty is the opposite: hitting Again on cards you actually knew, because the card felt uncomfortable or you were not sure enough of the answer. This sounds harmless. It is not. Every Again pulls the card back to the beginning of the learning queue, wiping out the intervals you had earned. A student who over-uses Again builds review debt the way the over-rater builds forgetting debt. His workload balloons because he is reviewing the same small set of cards over and over at short intervals.

A third dishonesty, subtler than the first two, is the drift of what counts as a successful review. The student is supposed to recall the answer before flipping the card. If he flips early, sees the answer, and then rates based on whether it looked familiar, he has just replaced retrieval with recognition. This is the same mechanism that ruins rereading, and it ruins flashcards for the same reason. The card now measures nothing useful. The distinction between recognition and retrieval is load-bearing here. Recognition is cheap, retrieval is expensive, and the cost of retrieval is precisely what makes it stick.

Beyond the rating problems, the card itself can be broken in ways that guarantee failure. The classic bad card is a cloze deletion with the answer partially given away by the context: The enzyme {{c1::amylase}} breaks down starch in the mouth. After a few exposures, the student no longer needs to recall amylase; he is pattern-matching on the sentence structure. The card now trains nothing. A better card splits the question from its answer and makes the retrieval mandatory.

Another common card failure is density. Students, trying to save time, pack multiple facts onto a single card. One side of the card asks for five drugs that cause hyperkalemia. The student retrieves four. He hits Again, which correctly recognizes that he did not have complete knowledge, but now the card comes back every other day and every other day he misses one of the five. He is not learning which one he keeps missing; he is just grinding. The card should have been split into five separate cards, each tied to a specific drug. One fact per card is not a style preference. It is what the algorithm was designed around.

The fifth pattern is the most insidious: review debt. A student misses three days because of a rough clinical week. He returns to a queue of 900 cards. He has two options, both bad. He can power through the 900 in a single day, in which case his ratings are unreliable because his brain is exhausted by card three hundred, and the scheduler gets poisoned data. Or he can stretch the 900 over a week at his normal daily cap, in which case all of those cards are reviewed at intervals much longer than the algorithm intended, and many slip from memory entirely. The structural problems of shared decks amplify this because the student is rarely reviewing cards he authored; the emotional cost of throwing them away is low, but the workload they generate is high, and review debt has a way of becoming deck abandonment.

There are disciplined responses to each of these failure modes, and most serious users converge on them within a year. Rate cards honestly, even when it hurts the daily load. Keep cards atomic. Treat new cards as a flow rate to manage rather than a quantity to hit. If you fall behind, do not binge-review; reduce the new-card rate to zero until the backlog clears, accepting that the scheduler will have drifted for a few weeks afterward. Suspend cards that have failed four or five times in a row — they are telling you that something about the card is broken, and no number of additional repetitions will fix it. Rebuild the card from scratch or drop it.

The larger lesson is that spaced repetition is not, despite its reputation, a passive technology. The software does not learn for you; it allocates the retrieval opportunities you then take or fail to take. The same distinction that separates AI as a study partner from AI as a crutch separates Anki as a retrieval trainer from Anki as a familiarity loop. In both cases the student has to be the one doing the cognitive work. The tool only measures whether she did.

The first-year with 14,000 cards and middling exam scores is usually one conversation away from fixing the whole thing. He needs to stop rating cards above his honest confidence, split a few dozen overstuffed cards into atomic ones, and accept a smaller daily new-card rate for the rest of the semester. His workload drops within a month. His retention climbs within two. The algorithm was doing its job the entire time. It was waiting for him to do his.

Photo via Unsplash.