Proof of Concept: Generative Subtitles and Captioning

Shirley Coady 27 Sep 2023 5 mins read
It's no secret, video localization is an expensive process. It's complex - subtitles are difficult to translate, quality assurance is time consuming and even a millisecond counts! Read this blog to learn how AI and LLMs can help transform video subtitling processes.

Videos and other media forms are increasingly becoming the best way to reach a wide audience. Who keeps their instruction manuals around anymore? If I need to change the thingamabob on my doohickey, that’s what YouTube is for. Someone isn’t just going to tell me how to do it, they’ll show me!

Of course, I have the privilege of being a native English speaker, and more than half of web content is in English. The world is my oyster!  

But what about non-English speakers? Their thingamabobs also break. Sure, most modern browsers have built-in capability to display text in other languages, but not videos. Back to those instruction manuals! 

Why aren’t content creators localizing their content? Surely it would be good for everyone: those who are looking for exposure and to promote their personal and professional brands, and those who would consume far more content if it was accessible to them. Well, there’s a simple answer… video localization is expensive.

So, why is it so expensive?

  • It’s very complex to do. So much so that there are specialty applications, such as our partner CaptionHub, built to let professionals do this work. 
  • Subtitles are difficult to translate. There are constraints around the length of the text, the text has to fit to the video, sentences are cut off because of pauses or scene changes… the list goes on. There are good reasons why this is typically handled by professionals. 
  • Automatic translation doesn’t cope well with segment fragments, but subtitles have to match what you see on screen in a video, necessitating fragments. 
  • Subtitling alone doesn’t make your video accessible to those who are hearing-impaired. You also need captioning which provides an accessible way for viewers who can’t hear audio to watch your video. 
  • Quality Assurance is time consuming. You need to view, listen, and read again and again… and again… through the whole video in all languages. 
  • A millisecond counts. Viewers will notice if the subtitle is offset to the actual speech. 

Everywhere you turn, there’s hype about AI. It can write your thesis! It can drive your car! It can wash your dishes – no, wait, I still need to do that. Well, can it at least make my videos accessible to anyone? 

The answer is, YES! 

Video subtitling

We can take a video and use AI to subtitle it. There are generally available Large Language Models (LLMs) that do a good job of this. Be mindful though – these are public, and you should use caution on what content you send to them. In our use case? It’s great news if anyone, anywhere knows how to change the thingamabob on their doohickey. This isn’t confidential information. I want everyone to know who I am and how to do this. 

Are these LLMs and subtitles perfect? Likely not. Many LLMs struggle with brand names and other proper nouns. Does it matter? Well, that depends on the situation. I certainly wouldn’t want my highly polished product marketing videos, or something that could be used by HR or Legal, to have flaws. I’ll continue to hire professionals for that. Working with my thingamabob? Automated subtitling may exceed your expectations. 

Excellent, I have my subtitles. Now what? I don’t want to use any of the freely available machine translations, because it’s not only going to mess up my subtitle format, but it’s not well suited for segment fragments. That dramatic pause in your original video? It sure looks and sounds good, but now I have two half-sentences to translate instead of a full sentence. 

How will AI do it better? First, you want to ensure your LLM knows what you’re talking about. The glasses I use to read aren’t the same as the glasses I’m drinking out of. You can’t coach your public machine translation on the context, but you can tell your LLM what you’re talking about. So, we can take that subtitle file, and summarize the content. 

But AI hallucinates! We all know that. I really don’t want my thingamabob to have magically grown out of a whirligig. No problem. Let’s take that summary, and tell the LLM to use that context, and only that context, to translate my subtitles. 

Let’s not forget accessibility. We want everyone to be able to change the thingamabob on their doohickies. That click it makes when it’s properly tightened? Let’s be sure someone who is hearing-impaired can also use the video. LLMs can be used to extract noises with timestamps. 

I know it might sound too good to be trusted, but if you’d like to see how we proved this can work in a real-world scenario, I invite you to watch any of our recorded sessions from this year’s ELEVATE and choose from any of the language options available in the captions setting:  

Trados for Corporations

Queen said it best, didn’t they. I want it all, and I want it now! There is a way to get all of this, integrated and available without the need for content creators to cobble all of it themselves. That’s where we come in. We can guide you on the best approach to take advantage of these amazing new possibilities. 

Create... and translate everything, even videos!

Shirley Coady
Author

Shirley Coady

Director of Product Management

Shirley Coady has over 20 years of experience in language technology. Starting her career as a software developer, she moved to a start-up company, where she took on additional roles including technical support and professional services.


As the company grew, building products remained her passion and she moved into the Product Management role. Through various mergers and acquisitions, she has managed both greenfields and existing products in small, medium and large companies. 


In 2022, she took over the leadership of the Language Technology Product Management team at RWS, where alongside a very talented team, she’s responsible for the portfolio of traditional as well as cutting-edge technologies for the largest technology company in the industry. 

All from Shirley Coady