Text-To-Speech: AI Dubbing And What You Can Do With It
we are going to look AI dubbing and its various use cases. If you are wondering,
is my content suitable for AI dubbing, or AI localisation? In simple terms:
- What is dubbing or dub localisation?
- Should I use a human, or a machine for my dubbing?
- How do I use a video translator to dub my video?
- Rules of thumb, or heuristics, to achieve a superior outcome with AI dubbing?
On reviewing this visual guide, you will be able to
(1) upload your own video content, (2) transcribe your content, (3) dub your content into a different dialect.
Note: We are going to
take a video in Australian English, and switch it into American English. Obviously, the usual use case is to translate into some other language first, and then do the dubbing. We forego the translation so this guide is easier to understand.
From Wikipedia, ‘Dubbing, mixing, or re-recording is a post-production process used in film making and video production in which additional or supplementary recordings are “mixed” with original production sound to create the finished soundtrack.’
In everyday usage however,
the term 'dubbing' commonly refers to the replacement of the actor's voices with those of different performers speaking another language, which is called
're-voicing', or a
'voice-over' in the film industry.
Outcomes - Before And After
To make it obvious what we are talking about, the after is provided.
Steps - Transcribe Your Content Using The Provided AI
We are working with an English video today, so select the English template, and create a new item. Note that the account we are using to show this functionality has
Chinese, French and Vietnamese as translation options. We will not be using this functionality today.
This is the original content.
We know the speaker is an Australian so
we select English (Australia) as the preferred dialect to transcribewith - note picking the correct dialect makes AI accuracy much better.
Once transcription is complete,
use the Picture-In-Picture to clean up the transcript. Note that the text shown in the PIP is real time, so you can edit both timestamps/text of the transcript and see the changes directly.
You can see we
removed text fragment 9 and merged it into text fragment 8as an example. This makes no difference to the rendering of the captions.
Use the Auto-Index button to fix this upif you require your
*.srtto work properly.
Save your changesand your transcription is complete. If you need the captions burnt into the asset, use the Auto-Overlay. Nicely done!
Steps - Dubbing/Adding A Voice Over With AI
To dub the content using a different voice, our first step is to copy the captions from our video component into our audio component. Click into your editor, and hit
Ctrl-A and Ctrl-C.
Go to the Audio component, click add captions and paste your content.
Use the highlighted button to add captions without adding an audio file.
We will transcribe again this time using the Text-To-Speech Transcription AI.In the below image, we use the Text-To-Speech tab. It is also possible to break up your captions into fragments, as shown below. We will only use the first fragment, but
you can break your content to produce different *.mp3files.
A few quick notes. The options shown are the different voices, with
(M) for Male and (F) for Female. The accent can also be seen and we are using
English (US) (M) which is American English Male. The
S/P indicates Standard, or Professional - we recommend Professional, Standard is simply an older technology (sounds more robotic). Once done, it looks like below.
The audio content is below.
Have a listen!Sweet - dubbing is complete.
Steps - Overlaying Our Audio On Our Video
We are now going to
add our generated audio to our original video. Go back to your video component, and click on Audio Overlays.
Click the + and your audio fragmentwill be shown.
Notice, we have not set any times. Add (in seconds) the time.
We added 0.5 and 21 as start and end times, and then
clicked the + button to bake in the audio overlay. That is the entire process. The final output is shown below.
In this visual guide, we used our video translator to dub, or overlay a different voice, on our video. It should be noted, that most clients prefer to download assets and do the last, audio overlay step themselves. But you can use our app if you think it make sense! Send us some feedback if you do.
We are very grateful for your support!