AI Tools

AI Transcription in the Lecture Hall: Otter, Whisper, and What Changes

A tiered university lecture hall viewed from the back row with half-full seats, a speaker at the distant lectern, and open laptops glowing on desks.

Walk into an undergraduate lecture at almost any large research university this semester and count the laptops. Most of them are open to notes, but a quiet minority (the fraction is growing) are running Otter.ai in the background, microphone pointed at the lectern, a live transcript scrolling past the slides. The student isn’t typing much. Sometimes they’re not typing at all. The screen is there for reference, in case something flagged as unclear needs a look later. The actual record of the lecture is happening in the cloud, being time-stamped and speaker-labeled and keyword-indexed by a service that will, by the end of the semester, contain something like forty hours of audio and a million words of transcribed text per course.

This is a quiet shift, and like most quiet shifts in how people learn, it has happened faster than the research literature can chase it. Whisper, OpenAI’s open-source speech recognition model, was released in September 2022. It was the first widely available system that could transcribe classroom audio with reasonable accuracy across accents and domain vocabulary. Otter.ai, which predates Whisper by several years, rebuilt much of its stack on top of the new generation of models and has since integrated with Zoom, Google Meet, and, critically for universities, several major lecture-capture platforms. The practical consequence is that automated, searchable, reasonably accurate transcripts of a lecture are now free at the margin and effectively universal.

The question worth asking is what this changes in how students learn, and the starting point is the study that anyone writing about note-taking has to reckon with. In 2014, Pam Mueller and Daniel Oppenheimer published a paper in Psychological Science titled “The Pen Is Mightier Than the Keyboard,” showing across three experiments that students who took notes by hand outperformed students who typed on conceptual questions, not because of writing speed but because longhand note-takers were forced to summarize rather than transcribe. Typists, the authors argued, wrote too fluently; they produced near-verbatim records that required less encoding. The paper has been cited thousands of times. A 2021 replication by Kathleen Morehead and colleagues found the effect was smaller and less reliable than the original paper suggested, but the underlying mechanism, that summarization in real time forces cognitive work, has held up.

Automatic transcription pushes the Mueller and Oppenheimer scenario one step further. The typist at least types. The Otter user doesn’t even do that. The cognitive work of encoding the lecture into notes has been delegated entirely to a model, and the student’s relationship to the lecture is now closer to the relationship a podcast listener has to a podcast: ambient, skimmable, returnable. Which might be fine, or might not be. It depends what the student does next.

Here the evidence gets genuinely interesting. Students who use transcripts as a review tool, searching for a specific concept after class, jumping to the timestamp, listening again, tend to do roughly as well as students who took structured notes, at least on the studies that exist so far. Students who use transcripts as a substitute for attention during class tend to do worse. The transcript becomes a false promise: the material is captured, therefore it’s been learned. It hasn’t. Reviewing a transcript after class is passive rereading, which the cognitive-science literature has been unkind to for at least twenty years. Our own overview of the evidence against passive rereading covers the landmark studies in the piece on the testing effect, and the mechanism is the same here: you can’t build durable memory by consuming material you’ve already consumed.

The most interesting use of transcripts I’ve seen is by students who combine them with active study tools rather than treating them as the study object. A biology student at a university in the Midwest told me her routine: she attends class without a laptop and takes sparse notes by hand. After class, she pulls the Otter transcript, skims it for the five or six concepts she didn’t fully grasp, and uses those sections, not the whole transcript, to generate Anki cards. The transcript is a gap-finder, not a study artifact. Her approach works because she’s read enough of the cognitive-science literature to know the risk; most students haven’t.

A different trade-off appears for students with disabilities, and it’s the piece of this story that deserves to come first in a different kind of essay. Automatic transcription has been transformative for students with hearing impairments, students with attention-related disabilities, and non-native speakers following a fast-talking professor. Accommodations that used to require a human note-taker, a request to Disability Services, and a week of paperwork now happen automatically, in real time, for free. The accuracy floor of Whisper in 2022 was approximately what a professional transcriptionist produced in 2015. For students who genuinely needed the text, this is an unambiguous improvement.

The more contested question is what happens at the class level when lecture-capture and transcription become default. Faculty at several institutions have reported attendance declines that coincide, plausibly but not provably, with the rollout of comprehensive lecture-capture systems. If the transcript is available tomorrow, the argument goes, why be in the room today? The counter-argument, from educational-technology researchers, is that attendance was already declining and capture is a convenient scapegoat. Both are probably half right. What’s clearer is that the students who attend and use capture as review do best; the students who skip and plan to catch up from the transcript do worst, and usually don’t catch up at all.

There’s also a change in what gets said in lectures. Faculty who know they’re being captured and searchable are, by most accounts I’ve heard, more careful. Less tangent, fewer asides, more polished pacing. Whether that’s a loss or a gain depends on your view of the lecture as a form. The memorable asides and half-formed thoughts that sometimes turn out to be the most useful part of a class are the parts that tend to get self-censored when the room is, functionally, always on.

For students deciding how to integrate transcription into their own study, a few durable suggestions. Don’t use the transcript as the primary record. Take notes in the room, longhand if possible, sparse and summarized, focused on the points that seem to matter. Use the transcript later to fill specific gaps, not to reread the whole lecture. Convert what you’ve learned into retrieval practice, through cards or self-testing, rather than re-consuming the transcript a second time. The transcript is an archive. The learning still has to happen in your head, which connects to the broader argument in our piece on different note-taking systems and what they actually do for memory.

The model is better every year. The pedagogy has not kept up. That’s not a reason for alarm, but it is a reason to be slightly skeptical of any claim that transcription has made traditional note-taking obsolete. What transcription has done is raise the price of attention — because now inattention is cheaper than ever.

Photo via Unsplash.