Language Translation Experiment:
Create a Generative AI program to translate spoken words in different languages in near real time to facilitate remote meetings where the participants are speaking different languages.
Business need
A global pharma company holds quarterly teleconference meetings involving employees from different countries, speaking different languages. Before Generative AI, the company had only one language of choice English, which limited participation and collaboration between teams. English language skills vary across employees allowing for miscommunication. Simultaneous translation using interpreters is prohibitively expensive.
Issues encountered and solved
Long pauses between translated sentences caused by continuous speech. Developed custom algorithm to fetch complete sentences in midstream of speech to text conversion send to translation. This effectively reduced pause duration from 20 – 30 seconds to 5 – 6 seconds, thus improving overall experience.
Technology used: OBS to interface with the conferencing platform, Python applications were developed to integrate: OBS, Google Translation and Speech synthesis APIs and customized Avatar/LipSync Generative AI applications.
Lipsync issues with avatar: A creative video alteration application was developed to take synthesized audio as input and generate the avatar with correct lipsync.
End to End Latency: GPU servers were used to run the Generative AI applications and performance optimization techniques helped resolve latency problems.
Incorrect translation: Created custom dictionary to look up words that were not properly translated by Google API to improve accuracy.
Results: The meetings were conducted, and the translation software worked well. The delay in translation was shorter than would have been the case with human simultaneous translators. The accuracy was better than would have been the case if the participants were forced to speak in a common language other than their native tongue.