Direct speech-to-speech translation with discrete units
https://arxiv.org/abs/2107.0560412.07.2021 · Direct speech-to-speech translation with discrete units. We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work addresses the problem by training an attention-based sequence-to-sequence model that maps ...
Direct speech-to-speech translation with discrete units
arxiv.org › abs › 2107Jul 12, 2021 · Abstract: We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work addresses the problem by training an attention-based sequence-to-sequence model that maps source speech spectrograms into target spectrograms.
Direct speech-to-speech translation with discrete units
arxiv.org › abs › 2107Jul 12, 2021 · Direct speech-to-speech translation with discrete units. We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work addresses the problem by training an attention-based sequence-to-sequence model that maps source speech spectrograms into target spectrograms.
Direct speech-to-speech translation with discrete units ...
www.arxiv-vanity.com › papers › 2107Direct speech-to-speech translation with discrete units. Abstract. We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work [ 9] addresses the problem by training an attention-based sequence-to-sequence model that maps source speech spectrograms into target spectrograms.
Direct speech-to-speech translation with discrete units
https://arxiv.org/abs/2107.05604v112.07.2021 · We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work addresses the problem by training an attention-based sequence-to-sequence model that maps source speech spectrograms into target spectrograms. To tackle …