Translatotron 2 | Smilegate.AI
https://smilegate.ai/en/2021/09/02/translatotron-202.09.2021 · The rough structure of Translatotron 2 is close to that of a mixed ASR and TTS model. It receives L1 voice information (mel-spectrogram) and predicts L2 phoneme with a decoder (ASR), and at the same time predicts L2 mel-spectrogram through a synthesizer by combining the decoder output and attention before calculating L2 phoneme (TTS) .
Translatotron 2 | Smilegate.AI
smilegate.ai › en › 2021/09/02Sep 02, 2021 · The rough structure of Translatotron 2 is close to that of a mixed ASR and TTS model. It receives L1 voice information (mel-spectrogram) and predicts L2 phoneme with a decoder (ASR), and at the same time predicts L2 mel-spectrogram through a synthesizer by combining the decoder output and attention before calculating L2 phoneme (TTS) .
[2107.08661] Translatotron 2: Robust direct speech-to-speech ...
arxiv.org › abs › 2107Jul 19, 2021 · We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a mel-spectrogram synthesizer, and an attention module that connects all the previous three components. Experimental results suggest that Translatotron 2 outperforms the original Translatotron by a large margin in terms ...