AI Transcription, Translation And Dubbing: How To Think About Accuracy
We get a number of questions around AI Accuracy, and depending on the context,
the question can refer to a number of things.
Wanted to put down a note describing the various iterations of what Accuracy in Artificial Intelligence refers to, and where it is going from my point of view.
- Is the AI accurate?
- Is AI accurate when it comes to Transcription?
- Is AI accurate when it comes to Translation?
- If I use AI for Text-To-Speech Dubbing, will it sound natural?
Lets look at these ideas one at a time.
Is The Artificial Intelligence Accurate? When Is Artificial Intelligence Not Accurate?
Answer: It depends.
Lets talk about where Artificial Intelligence / Machine Learning came from - both of these technologies were originally known as Big Compute and/or Big Data. While referring to the field in this form is no longer useful, its super useful to think about the field in these terms.
Using vast amounts of computing power on large databases set the stage for the advent of AI. The way to think about AI is
pattern discovery at scale (@petertboyer). The pattern discovery and/or matching is essentially statistics and this is what data scientists spend their time doing.
Failure in AI - that is, when the AI gets stuff wrong - can be due to a number of issues. The simplest kind of error to diagnose is probably the over-fitting problem, where the statistical model behind the AI is tweaked so much, there is no room for small deviations.
The more general mode of AI failure is the AI does not have a large enough data set to be trained properly. The reason we refer to this as a
more general failure mode, is if the event we are trying to pick up has not happened before, there is little chance the AI will pick it up.
There are several other failure modes for AI, but we will be looking at the more general failure mode as described above.
Is AI Accurate When It Comes To Transcription?
Answer: It depends, pro-tip: pretty much all the answers here are going to be some variant of
it depends :)
Until fairly recently, dialects were less studied by linguists. This is because the tools to truly understand dialects did not exist.
Largely, this has changed due to Google and the other big technology players. Now, due to low costs of data storage, we are able to store and analyse large amounts of audio/video content.
Within the Video Translator application,
we have a one-to-many language-to-dialect mapping.
In the case of Spanish for example, we have a number of dialects. Using the correct dialect is vital for AI accuracy. Similarly,
we have a large number of English dialects too - our latest is English (Singapore).
Use the correct AI for the correct speaker. What if your content has multiple speakers? Right now, read this to understand how to compensate - but it is absolutely a problem we are working on and think some very exciting things are possible in the future.
Is AI Accurate When It Comes To Translation?
Answer: It can be very accurate. Lets be clear -
AI when used for translation is normally a bit meh. Here is how we recommend you play it.
Assuming you did a AI transcription first, that is - you used an AI to do a Speech-To-Text conversion, you need to clean up the grammar. This process is known as
Humans are smart, so in a conversation, we know when a comma, or full-stop, would be appropriate. In addition,
a large amount (possibly even a majority) of communication is not related to words. It can come from tone, hand gestures or even facial gestures.
All of this communication is not visible to an AI. Hence, fix up the grammar and check your transcription.
Remember: Use the AI for the heavy lifting, and use your own Human translators for the high value tasks of cleaning up the AI’s work. Practically, this means thinking deeply about the intended audience, and what concepts they are familar with, and plugging that gap.
If I Use The AI For Text-To-Speech Dubbing, Will It Sound Natural?
Answer: Yes - but this depends on what you are trying to achieve. Two examples might be useful here:
For our healthcare clients, the use case is often
the video is aimed at possible low literacy/exposure stakeholders. Practically this means we may take some AI generated dubbed speech, and
slow it down up to 85% of the original. Note, this is specifically because we are trying to reach people who may not be literate.
For elderly clients, we sometimes change the dialect. For example, an elderly man of Indian descent may speak English perfectly, but has trouble understanding Australian English (or vice versa). In this case
we might slow the AI down 5%, but it is the changing of dialect which becomes magic. For a visual guide read this.
Remember: Think deeply about your target audience. There is no such thing as translation quality, there is only - did my target audience understand? It is perfectly valid to loose information content, if context is preserved. Indeed, people are very good at working out what someone else is saying, as long as both are on the same page.
In this blog post we looked at some of the different ways in which AI Accruacy can be thought of - while we do not have all the answers, we hope the above is useful for you.
The way to think of Video Translator - strategy and discretion across your translation life cycle. Use our app if you think it make sense! Send us some feedback if you do.