What A Hindi Speaking AI Sounds Like
Table Of Contents
A few posts ago we transcribed an Indian English video using a Speech-To-Text Indian English transcription Artificial Intelligence (AI). Today we are going to take the same idea forward, and now going to translate the video into Hindi.
To recap, we used a sample clip from Rajya Sabha TV which as (a) transcribed using a speech-to-text AI, and (b) cleaned up by a human. We will now
clean it up a little bit more, and then translate the content into Hindi. Finally, we will use a Text-To-Speech AI to speak out the Hindi content.
Its going to be awesome. We hope this post gives you some ideas about how to use our technology. Let’s begin!
What Are We Going To Be Working With? Our Source Video Is Already Transcribed
We will use the video we produced at the end of this post. It is posted below.
First though, a bit of theory! Just like school!
The Look-Ahead And Forward Algorithms - The Theory
One of the challenges in working with AI is the Look-Ahead and Forward algorithm usage. The majority of translation AI’s use the look forward algorithm to translate text content.
From our trusty source of all knowledge (Wikipedia), Look-Ahead (backtracking) is
'In backtracking algorithms, look ahead is the generic term for a sub-procedure that attempts to foresee the effects of choosing a branching variable to evaluate one of its values. The two main aims of look-ahead are to choose a variable to evaluate next and the order of values to assign to it.'
Also, and slightly more technically, a Forward Algorithm is
'The forward algorithm, in the context of a hidden Markov model (HMM), is used to calculate a 'belief state': the probability of a state at a certain time, given the history of evidence. The process is also known as filtering. The forward algorithm is closely related to, but distinct from, the Viterbi algorithm.'
The Look-Ahead And Forward Algorithms - What Do I Need To Know?
Okay - what does the above mean? Basically,
the translation AI looks for a full stop. Once it finds the full stop, it translates the sentence.
No really - what? You need to add grammar, primarily full stops and comma’s so that the text-to-text translation AI works properly
To do this,
make sure you check your transcript for grammatical correctness. This is shown below. Practically this means:
Add lots of full stops everywhere! Humans are very good at understanding each other, so we don’t need to be told where one sentence ends and where another sentence starts. AI’s - not so much.
If you watch the video embedded below, the sentences have all been converted into shorter sentences which are in third person. An AI can translate the shorter sentences way better, but take care to not loose meaning.
It is expected that a human subject matter expert translator will do post editing after the AI has done its first pass. However, simplifying means that you can translate into many languages quickly, and the post-editing can be reduced.
Use your judgement!
The grammar was changed - shown below. If you watch through the video below, you will notice the open captions do not match what is being said.
Note that this step is being taken primarily in preparation for the translation step.
After the process was complete we
add an Auto-Overlay so you can see the full outcome. Note that the words are a little bit different, tense had been changed to past tense for everything and sentences are shorter.
Very cool. Now we can begin the translation process.
Translating The Content To Hindi
So the first thing we need to do is translate the content. Click
Action -> Translateand translate this video into Hindi.
It’ll take a few seconds and then
you will see the translated file in your Root pane. It is hightlighted in yellow in the below image.
Open it up, and in the Captions tab you will be able to see the translation. In the below image,
note that the colour of the text has been set to black so it is easier to see for your post-editing.
When you do your own video, please ensure you do post editing!Remember, use the AI for the heavy lifting, but use human subject matter experts for the best results.
Below, we use the Auto-Overlay so you can see the output. We are using a black text with a yellow highlight at transparency of 80%.
The little spinner indicates that the app is applying the Auto-Overlay to the video asset. This is what creates the open captions in the video.
This is a direct AI translation so
some errors are expected. Also, note that the timings are a little bit off. What is happening in the background is our code is trying to do a best fit of the sentences, but
manual edits are often required. Please do this as part of the post-editing process.
Well done! Your video is now translated. But you thought it was going to speak Hindi?
AI Dubbing In Hindi - Now The Magic Starts!
This is the simple version -
we can get much more sophisticated in our outcome, including voices that sound like young or older people, men or women, and changing the volume and speed of the speech.
It is also possible to
break the conversation up into different blocks of speech, properly matching the English content spoken.
Copy the captions from the Video component into the Audio component.
we will use a Text-To-Speech AI to 'speak-out' the content in Hindi. Use the
Action -> Transcribeto access the Text-To-Speech AI as shown below. This will give us the AI dubbing.
Next Steps - How To Implement Your Own AI Dubbing
There are many options you have once you AI dubbing is complete. Several clients simply
prefer to download the asset and work on it in their preferred audio/video editor. However, much of the same functionality is also present in the app.
Mute the original audio soundtrack using the ‘minus’ button as shown below.
Generally it is recommended you
slow-down the AI dubbing. This is because it is an AI, there is limited scope for tone. People expect changes is tone when talking to each other. An AI is unable to do this so when there is less tone, it helps to slow down the speech to about 90%.
This helps people to understand the words.
When you insert the AI speech into the video as an Audio-Overlay
use the 'plus' button to add the AI dubbing *.mp3 file.
Above, the highlights show (a) the
plus button at the bottom to add additional audio overlays, (b) the
original soundtrack muted and new soundtrack added, (c) the
new soundtrack has its speed reduced to 80%, and (d)
start and end time can be managed on a per conversation block basis.
The final outcome is below. This has several errors, and needs more work.
To see the error forward to 1:57 and you will hear the sound stops. This is because we have just used one mp3 Voice Overlay for simplicity to explain the workflow.
Congratulations! We hope you found this post useful. This is quite a simplified workflow - let us know if you have any questions.
How To Think About This
Generally, you would
break up the transcript into multiple conversation blocks. This helps space out the AI speaking nicely.
Select the appropriate gender of the AIif there is a conversation happening between a man and a woman.
If two men or two women are speaking in a conversation,
use multiple AI voices. This helps your viewers understand what is happening.
Remember, the key win is SEO. When
deploying the asset to some social media channel like LinkedIn or YouTube, make sure to translate the relevant metadata. This helps search engines index your content making it reachable in other languages.
We are very grateful for your support!