Gradio

Speech Translation Demo with Automatic TTS and Restart Option

This demo performs the following:

Accepts up to 15 seconds of audio recording from the microphone.
Uses OpenAI’s Whisper model to transcribe the speech.
Splits the transcription into segments and translates each segment on-the-fly using Facebook’s M2M100 model.
Streams the cumulative translation output to the user.
Automatically converts the final translated text to speech using gTTS.
Provides a "Restart Recording" button (located just below the recording section) to reset the audio input, translated text, and TTS output.

Note: True real-time translation (i.e. while speaking) requires a continuous streaming solution which is not provided by the standard browser microphone input.