Online Learning

Duolingo’s Methodology, Honestly Assessed

A phone on a small cafe table in Paris showing a language-learning lesson, half-empty espresso cup beside it.

A friend of mine has a 1,247-day Duolingo streak in French. She has done a lesson every day, without fail, since the spring of her pandemic. Last summer we were in a cafe near the Canal Saint-Martin and the waiter asked her a gentle question about whether she wanted her coffee with milk. She looked at me, panicked, and said “sorry, I don’t actually speak French.” The waiter smiled and switched to English. On the metro home she pulled out her phone and did the day’s lesson so the owl would not be disappointed in her.

This is not a story about my friend’s failure. It is a story about the gap between what Duolingo trains and what speaking a language requires, and the gap is structural. To understand why, it helps to know where the product comes from.

Luis von Ahn and Severin Hacker founded Duolingo at Carnegie Mellon in 2011. Von Ahn is the computer scientist who invented CAPTCHA and then reCAPTCHA, the clever scheme that turned human labor spent proving they are not robots into free OCR for Google Books. Duolingo’s original business idea was similar. Users would learn a language by translating real sentences from the web, and the translations would be sold to news outlets and businesses. The translation-as-homework model did not pan out, but the gamified learning product did, spectacularly. By the time Duolingo went public in 2021 it had tens of millions of monthly active users, a talking green owl with its own brand of menace, and a market cap that briefly exceeded three billion dollars.

The design is genuinely clever. Lessons are short enough to do while waiting for an elevator. The scoring system rewards consistency more than intensity, which flatters a certain kind of self-image and produces the famous streak. Spaced-repetition logic, which Duolingo calls “strength” or now “crown leveling” depending on which product iteration you encounter, tries to resurface vocabulary at scientifically useful intervals. The lesson content has improved dramatically from the mechanical early days, with actual character-driven stories, podcasts in Spanish and French that are well-produced, and grammar notes that finally resemble instruction.

And yet. The honest question, asked by applied linguists and by many learners themselves, is whether time spent on Duolingo produces speakers. The answer is mostly no, and the reasons are well understood.

The first reason is the difference between recognition and production. Most Duolingo exercises ask you to recognize the correct translation from a set of options, or to tap tiles into the correct order. Tapping tiles is easier than generating a sentence from scratch. The app’s own typed-answer exercises help, but they still accept input in a forgiving way, and they provide the target meaning in your native language, which removes most of the cognitive challenge. A learner who can recognize “je voudrais un cafĂ© au lait” on a screen has not necessarily learned to summon it, unprompted, while a waiter stares at her. This is the same problem that haunts badly used flashcard apps for medical vocabulary: if you only ever test in the easy direction, you build a lopsided model of the material.

The second reason is that Duolingo almost never asks you to produce open-ended output. Real conversation requires formulating novel sentences under time pressure, coordinating grammar and vocabulary and pragmatics in real time, and repairing when things go wrong. None of this happens in a lesson. The app cannot evaluate free-form speech at the level needed to give useful feedback, so it does not ask for it in the way a conversation partner would. The Duolingo Max tier added AI role-play features in 2023 and 2024, and these are a genuine improvement, but they are still not the same as talking to a bored tabac owner who does not care about your learning journey.

The third reason is that language acquisition, unlike vocabulary memorization, appears to require large quantities of comprehensible input, not short bursts of comprehension. Stephen Krashen’s input hypothesis, however contested in its strong form, pointed at something real. The learners I know who actually became conversational in a new language did it by watching hundreds of hours of shows in that language, reading graded novels, listening to podcasts during their commutes, and having stumbling conversations with actual humans who were willing to tolerate them. Duolingo can be one piece of that ecosystem, particularly for building a vocabulary floor, but a 1,247-day streak inside the app does not add up to the total hours of input that a language requires.

The company itself is more honest about this than you might expect. Duolingo research scientists, including Burr Settles, have published work showing that lessons within the app produce measurable learning on the kinds of tasks the app tests. That is a much narrower claim than “our users become speakers.” Independent studies have found that committed Duolingo users can achieve rough equivalence to one or two semesters of university language coursework after many months of use, which is meaningful but also consistent with the broader point. A semester of university Spanish does not make you conversational either.

Where the app works best is as a maintenance tool for someone who already has a foundation in a language, as a vocabulary feeder for someone who is getting input elsewhere, and as a low-commitment entry point for people who want to see if they enjoy studying a language at all. It is also, genuinely, a motivator. The streak is manipulative by design, and the streak works. Getting a reluctant learner to do anything language-related for 1,247 consecutive days is not nothing. The related question is whether that momentum can be redirected into more productive practice, and this is where the design starts to feel self-defeating. The app’s incentives pull you into the app. They do not push you toward the harder, messier practice that would make you fluent.

There is a deeper point here about gamification in learning generally. Streaks, points, and leaderboards are effective at driving engagement and famously poor at driving transfer. Principles that researchers describe as desirable difficulties almost always run against the grain of game feel. The things that make learning stick, retrieval from memory, spacing with real forgetting, interleaving of confusable material, all feel bad in the moment. A well-designed game smooths away bad feelings. A well-designed learning experience, sometimes, leans into them. Duolingo splits the difference, and the split shows up in outcomes.

None of this is a condemnation. The app has brought tens of millions of people into contact with second languages, many of whom would otherwise have done nothing. Its existence has changed what people expect from free software in education. And the company’s willingness to publish its own research, including studies that are not flattering, puts it in a better ethical position than most of its competitors.

The kind thing to say to a friend with a 1,247-day streak is that she should keep it going, because the alternative is probably not more serious study but no study at all. The truer thing to say is that if she actually wants to answer the waiter next summer, she needs to add something messier to the routine. A tutor on iTalki. A French television show with French subtitles. A book she reads slowly with a dictionary beside her. The owl, for all its menace, cannot get her there alone.

Photo via Unsplash.