AI-Powered Closed Captions Could Open Up New Possibilities – and Pitfalls

Closed captions have become a staple of the TV- and movie-watching experience. For some, it’s a way to decipher muddled dialogue. For others, like those who are deaf or hard of hearing, it’s a critical accessibility tool. But captions aren’t perfect, and tech companies and studios are increasingly looking to AI to change that.
Captioning for TV shows and movies is largely still done by real people, who can help to ensure accuracy and preserve nuance. But there are challenges. Anyone who’s watched a live event with closed captions knows on-screen text often lags, and there can be errors in the rush of the process. Scripted programming offers more time for accuracy and detail, but it can still be a labor-intensive process — or, in the eyes of studios, a costly one.
In September, Warner Bros. Discovery announced it’s teaming up with Google Cloud to develop AI-powered closed captions, “coupled with human oversight for quality assurance.” In a press release, the company said using AI in captioning lowered costs by up to 50%, and reduced the time it takes to caption a file up to 80%. Experts say this is a peek into the future.
“Anybody that’s not doing it is just waiting to be displaced,” Joe Devon, a web accessibility advocate and co-founder of Global Accessibility Awareness Day, said of using AI in captioning. The quality of today’s manual captions is “sort of all over the place, and it definitely needs to improve.”
As AI continues to transform our world, it’s also reshaping how companies approach accessibility. Google’s Expressive Captions feature, for instance, uses AI to better convey emotion and tone in videos. Apple added transcriptions for voice messages and memos in iOS 18, which double as ways to make audio content more accessible. Both Google and Apple have real-time captioning tools to help deaf or hard-of-hearing people access audio content on their devices, and Amazon added text-to-speech and captioning features to Alexa.
Warner Bros. Discovery is teaming up with Google Cloud to roll out AI-powered captions. A human oversees the process.
In the entertainment space, Amazon launched a feature in 2023 called Dialogue Boost in Prime Video, which uses AI to identify and enhance speech that might be hard to hear above background music and effects. The company also announced a pilot program in March that uses AI to dub movies and TV shows “that would not have been dubbed otherwise,” it said in a blog post. And in a mark of just how collectively reliant viewers have become on captioning, Netflix in April rolled out a dialogue-only subtitles option for anyone who simply wants to understand what’s being said in conversations, while leaving out sound descriptions.
As AI continues to develop, and as we consume more content on screens both big and small, it’s only a matter of time before more studios, networks and tech companies tap into AI’s potential — hopefully, while remembering why closed captions exist in the first place.
Keeping accessibility at the forefront
The development of closed captioning in the US began as an accessibility measure in the 1970s, ultimately making everything from live television broadcasts to movie blockbusters more equitable for a wider audience. But many viewers who aren’t deaf or hard of hearing also prefer watching movies and TV shows with captions — which are also commonly referred to as subtitles, even though that technically relates to language translation — especially in cases where production dialogue is hard to decipher.
Half of Americans say they usually watch content with subtitles, according to a 2024 survey by language learning site Preply, and 55% of total respondents said it’s become harder to hear dialogue in movies and shows. Those habits aren’t limited to older viewers; a 2023 YouGov survey found that 63% of adults under 30 prefer to watch TV with subtitles on — compared to 30% of people aged 65 and older.
“People, and also content creators, tend to assume captions are only for the deaf or hard of hearing community,” said Ariel Simms, president and CEO of Disability Belongs. But captions can also make it easier for anyone to process and retain information.
By speeding up the captioning process, AI can help make more content accessible, whether it’s a TV show, movie or social media clip, Simms notes. But quality could suffer, especially in the early days.
“We have a name for AI-generated captions in the disability community — we call them ‘craptions,'” Simms laughed.
That’s because automated captions still struggle with things like punctuation, grammar and proper names. The technology might not be able to pick up on different accents, dialects or patterns of speech the way a human would.
Ideally, Simms said, companies that use AI to generate captions will still have a human onboard to maintain accuracy and quality. Studios and networks should also work directly with the disability community to ensure accessibility isn’t compromised in the process.
“I’m not sure we can ever take humans entirely out of the process,” Simms said. “I do think the technology will continue to get better and better. But at the end of the day, if we’re not partnering with the disability community, we’re leaving out an incredibly important perspective on all of these accessibility tools.”
Studios like Warner Bros. Discovery and Amazon, for example, emphasize the role of humans in ensuring AI-powered captioning and dubbing is accurate.
“You’re going to lose your reputation if you allow AI slop to dominate your content,” Devon said. “That’s where the human is going to be in the loop.”
But given how rapidly the technology is developing, human involvement may not last forever, he predicts.
“Studios and broadcasters will do whatever costs the least, that’s for sure,” Devon said. But, he added, “If technology empowers an assistive technology to do the job better, who is anyone to stand in the way of that?”
The line between detailed and overwhelming
It’s not just TV and movies where AI is supercharging captioning. Social media platforms like TikTok and Instagram have implemented auto-caption features to help make more content accessible.
These native captions often show up as plain text, but sometimes, creators opt for flashier displays in the editing process. One common “karaoke” style involves highlighting each individual word as it’s being spoken, while using different colors for the text. But this more dynamic approach, while eye-catching, can compromise readability. People aren’t able to read at their own pace, and all the colors and motion can be distracting.
“There’s no way to make 100% of the users happy with captions, but only a small percentage benefits from and prefers karaoke style,” said Meryl K. Evans, an accessibility marketing consultant, who is deaf. She says she has to watch videos with dynamic captions multiple times to get the message. “The most accessible captions are boring. They let the video be the star.”
But there are ways to maintain simplicity while adding helpful context. Google’s Expressive Captions feature uses AI to emphasize certain sounds and give viewers a better idea of what’s happening on their phones. An excited “HAPPY BIRTHDAY!” might appear in all caps, for instance, or a sports announcer’s enthusiasm may be relayed by adding extra letters onscreen to say, “amaaazing shot!” Expressive Captions also labels sounds like applause, gasping and whistling. All on-screen text appears in black and white, so it’s not distracting.
Expressive Captions puts some words in all-caps to convey excitement.
Accessibility was a primary focus when developing the feature, but Angana Ghosh, Android’s director of product management, said the team was aware that users who aren’t deaf or hard of hearing would benefit from using it, too. (Think of all the times you’ve been out in public without headphones but still wanted to follow what was happening in a video, for instance.)
“When we develop for accessibility, we are actually building a much better product for everyone,” Ghosh says.
Still, some people might prefer more lively captions. In April, ad agency FCB Chicago debuted an AI-powered platform called Caption with Intention, which uses animation, color and variable typography to convey emotion, tone and pacing. Distinct text colors represent different characters’ lines, and words are highlighted and synchronized to the actor’s speech. Shifting type sizes and weight help to relay how loud someone is speaking, as well as their intonation. The open-source platform is available for studios, production companies and streaming platforms to implement.
FCB partnered with the Chicago Hearing Society to develop and test captioning variations with people who are deaf and hard of hearing. Bruno Mazzotti, executive creative director at FCB Chicago, said his own experience being raised by two deaf parents also helped shape the platform.
“Closed caption was very much a part of my life; it was a deciding factor of what we were going to watch as a family,” Mazzotti said. “Having the privilege of hearing, I always could notice when things didn’t work well,” he noted, like when captions were lagging behind dialogue or when text got jumbled when multiple people were speaking at once. “The key objective was to bring more emotion, pacing, tone and speaker identity to people.”
Caption with Intention is a platform that uses animation, color and different typography to convey tone, emotion and pacing.
Eventually, Mazzotti said, the goal is to offer more customization options so viewers can adjust caption intensity. Still, that more animated approach might be too distracting for some viewers, and could make it harder for them to follow what’s happening onscreen. It ultimately boils down to personal preference.
“That’s not to say that we should categorically reject such approaches,” said Christian Vogler, director of the Technology Access Program at Gallaudet University. “But we need to carefully study them with deaf and hard of hearing viewers to ensure that they are a net benefit.”
No easy fix
Despite its current drawbacks, AI could ultimately help to expand the availability of captioning and offer greater customization, Vogler said.
YouTube’s auto-captions are one example of how, despite a rough start, AI can make more video content accessible, especially as the technology improves over time. There could be a future in which captions are tailored to different reading levels and speeds. Non-speech information could become more descriptive, too, so that instead of generic labels like “SCARY MUSIC,” you’ll get more details that convey the mood.
But the learning curve is steep.
“AI captions still perform worse than the best of human captioners, especially if audio quality is compromised, which is very common in both TV and movies,” Vogler said. Hallucinations could also serve up inaccurate captions that end up isolating deaf and hard-of-hearing viewers. That’s why humans should remain part of the captioning process, he added.
What will likely happen is that jobs will adapt, said Deborah Fels, director of the Inclusive Media and Design Centre at Toronto Metropolitan University. Human captioners will oversee the once-manual labor that AI will churn out, she predicts.
“So now, we have a different kind of job that is needed in captioning,” Fels said. “Humans are much better at finding errors and deciding how to correct them.”
And while AI for captioning is still a nascent technology that’s limited to a handful of companies, that likely won’t be the case for long.
“They’re all going in that direction,” Fels said. “It’s a matter of time — and not that much time.”
https://www.cnet.com/a/img/resize/1caaa3f48dbe49695ea04d852cedc7d3c53db2a3/hub/2025/05/29/817c58ac-f94b-4185-a875-779c5577697c/ai-in-captioning-ck.png?auto=webp&fit=crop&height=675&width=1200
2025-05-30 14:00:03