How To Transcribe Lots of Different Accents Accurately
Today, we will look at how to
transcribe many different accents correctly. First off, what does this question even mean in terms of AI?
Looking at the ICH’s mission statement, “The ICH’s mission is to build an international, collaborative research and training hub in healthcare communication to improve patient safety and quality of healthcare practice around the world…”
This goal aligns with what Video Translator is doing, but from the point of view of this post, there is a number of different English accents being spoken in their video, which makes it an ideal test for this post.
The following process was followed to get the output:
- Use the
Speech-To-Text AI Transcriptionto transcribe the video.
Fix up the resultsof the transcription.
- Use the
English to Chineseto translate the video.
we will not be fixing up the Chinese language translation post AI translation, as this post covers the process itself. For a client facing output,
please work with a language aware subject matter specialist to ensure a
high quality artefact at the end of your process.
First we will look at the process by which this video translation was completed. Then, a discussion on challenges and options around the video translation. Finally, some extra options and how they can be used to meet your client’s requirements.
The original video was sourced from the ICH website.
Please direct your browser to videotranslator.ai, and then click on the
Select myTemplate, or your preferred template, and
create a new item. Note to follow this visual guide you will require a
video componentonly in your template.
Once in the new item, please upload your video. In this example, we upload the ICH video, and your screen should look like below.
Next, click on
Actions -> Transcribe, and you should see something like the below. We have selected
English (United Kingdom)here. Why exactly will be addressed in the discussion below.
After triggering this action,
the platform will close this item, and lock it. Once
transcription is complete, this will automatically unlock.
Open up the item, and have a look at the captions. The main task here, is to
fix up the transcription. The heuristics section below covers some of the issues around this specific video.
After fixing up, the end result is below.
Sweet! Now that we have our transcription sorted, we can translate. Now, click
Action -> Translation, to trigger the video captions being translated. This is a pretty simple process, and can be seen in the image below.
Once the translation process is complete, the below result is available. In the application, this looks like below. Also, please note the
Origintoggle, allowing a user to flick back and forth - this functionality is for use when a
human translator is checking the work of the AI.
The final version of the video is shown below. In a real workflow, please ask your subject matter expert to eyeball the translation and verify suitability for your stakeholders.
How do you work with many different dialects, specifically
which AI should we use? In this video, we see the following features:
- There are
five people speakingusing the following accents,
English as spoken in Hong Kong, and
English as spoken by a person of European descent.
- This is the crux of the problem, which AI should we use? While experimenting, we tried both
English (UK). The results were different, but it seemed liked the English (UK) worked a little bit better - this is probably because the first speaker, Professor Diana Slade, does
not have a classic Australian accent, but more of a mix of Australian English and British English.
- For each of the speakers, the AI had trouble during a transition from one speaker to another. The first instance of this is at
0:18 seconds, which is a simple (and nifty) cut scene, but totally threw the AI. In essence,
the AI likes monotone and the same person speaking.
- Dr Elizabeth Rider,
speaking from 1:11 worked fairly well. This is because the underlying AI has been
trained on a significant volume of American content. The exact opposite was true for the voice over by Dr Angela Chan. Additionally, switching between the HK English dialect, and the European English dialect, was ugly.
- All in all
using the UK English worked better, in terms of number of post AI translation changes required. This is likely because
if there is a mix of accents, going with base English, for lack of better descriptors, is probably the way to go.
In this post we looked at some of the trade off’s between using different dialect AI’s for transcription. This is a valid concern for
English, Arabic and Spanish, because these languages have the largest number of possible dialects.
The platform is currently in closed beta, while we work with early users to test/iron out issues. If you are interested in trying out our technology, please drop us an email at firstname.lastname@example.org.
We are very grateful for your support!