Google Launches Real-Time Translation Model Supporting Over 70 Languages

Google has unveiled an artificial intelligence (AI) translation model capable of real-time interpretation in over 70 languages. The new model aims to reduce delays compared to traditional methods that wait for the speaker to finish before translating, while also capturing the speaker's tone and style.

On June 9, Google introduced its new real-time voice translation model, "Gemini 3.5 Live Translate." This model will be gradually integrated into the Google Translate app, Google Meet video conferencing service, and the Gemini Live API for developers.

The new model automatically detects the language being spoken without requiring users to select a translation language in advance. It can recognize speech in over 70 languages and convert it into another language's audio, even in conversations that mix multiple languages.

The most significant improvement is in translation speed. Traditional voice translation often delivers the translated audio only after the speaker has finished talking. In contrast, the new model provides translation audio while the speaker is still talking. Google noted that the difference between the original speech and the translated audio is typically just a few seconds.

The quality of the audio has also been enhanced. The model not only conveys the meaning of sentences but also strives to reflect the original speaker's tone, style, speed, and pitch as closely as possible. The goal is to produce a translation that sounds more like a natural conversation rather than a mechanical reading.

This new model will be available on both Android and iOS versions of the Google Translate app. Users can connect earphones to listen to real-time voice translations. On Android devices, a "listening mode" allows users to hold their smartphones to their ears, similar to a phone call, to hear the translated audio.

This feature will enable travelers to receive near real-time voice translations when conversing with locals in different languages through their smartphones.

The new functionality will initially be available to select corporate clients using Google Meet. Google plans to test the new voice translation feature with Google Workspace customers this month and expand availability later this year for multilingual meetings.

For developers, the Gemini Live API and Google AI Studio will provide early access to this feature, allowing them to create real-time voice translation services. Real-time media platforms like Agora and LiveKit will also support integration of related features.

Ride-hailing service Grab is testing the model for multilingual communication between drivers and passengers. This would allow real-time translation during calls within the app for users speaking different languages.

Google has stated that all generated voices will include a SynthID watermark to help identify AI-generated audio. This technology embeds an invisible identifier in AI-generated voices.

* This article has been translated by AI.