S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should output N - - best best Translated all (lattice) Translated all (lattice) Choose best translation Choose best translation (MT as a LM for ASR) (MT as a LM for ASR) Remove Remove disfluencies/hestitations disfluencies/hestitations Add more relevant data Add more relevant data Automatically convert past tense/third person data to Automatically convert past tense/third person data to present tense/ present tense/ first+second first+second person … person …
21
Embed
S2S ASR Advanced issues - Speech at CMU ASR Advanced issues Tight coupling ASR should output N -best ... Voice transformation, Voice morphing. Voice Identity What makes a voice …
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
S2S ASR Advanced issues
�� Tight couplingTight coupling�� ASR should output NASR should output N--bestbest
�� Translated all (lattice)Translated all (lattice)
�� Add more relevant dataAdd more relevant data�� Automatically convert past tense/third person data to Automatically convert past tense/third person data to
present tense/present tense/first+secondfirst+second person …person …
S2S TTS Advance Issues
��MT output isn’t MT output isn’t gramticalgramtical�� TTS doesn’t care and just says itTTS doesn’t care and just says it
�� TTS should try to say MT output with more TTS should try to say MT output with more breaks.breaks.
��TTS (unit selection)TTS (unit selection)�� As a LM on MT output As a LM on MT output
�� Choose the best translation on what is said bestChoose the best translation on what is said best
Speech Processing 15-492/18-492
Voice Conversion
Voice Conversion
�� Live (or offline)Live (or offline)�� Convert an existing voice to anotherConvert an existing voice to another
�� Use only a small amount of target speechUse only a small amount of target speech
�� Uses:Uses:�� Synthesis without collecting lots of dataSynthesis without collecting lots of data
�� Disguising voicesDisguising voices
�� Emotional voices without full synthesis supportEmotional voices without full synthesis support