This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Word (Depends on the language. Usually more than 100,000)
Syllable
Diphone & Triphone
Phoneme (Between 10 , 100)
44
Phone Units (Cont’d)Phone Units (Cont’d)
Diphone : We model Transitions between Diphone : We model Transitions between two phonemestwo phonemes
p1 p2 p3 p4 p5 . . . . .
Diphone
Phoneme
55
Phone Units (Cont’d)Phone Units (Cont’d)
In farsi we have 30 Phoneme. so we have In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically.30*30 Diphone Theoretically.
Practically the only Diphone that we don’t Practically the only Diphone that we don’t have in farsi is have in farsi is /zho/ /zho/
we have 27000 Triphone Theoretically. we have 27000 Triphone Theoretically. But practically we have about 15000 But practically we have about 15000 Triphone in farsi.Triphone in farsi.
Syllable is a set of phonemes that exactly Syllable is a set of phonemes that exactly contains one vowelcontains one vowel
Syllables in Farsi : CV , CVC , CVCC Syllables in Farsi : CV , CVC , CVCC
We have about 4000 Syllables in farsiWe have about 4000 Syllables in farsi
Syllables in English :V, CV , CVC ,CCVC, Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . .CCVCC, CCCVC, CCCVCC, . . .
Number of Syllables in English is very muchNumber of Syllables in English is very much
77
Phone Sequence To SpeechPhone Sequence To Speech
Concatenative Approaches : Trade-Off Concatenative Approaches : Trade-Off between Naturality And Memory usage between Naturality And Memory usage and variety of desired functionsand variety of desired functions
Rule-Based Approaches : The most Rule-Based Approaches : The most important Rule-Based approach is Klatt important Rule-Based approach is Klatt methodmethod
88
Phone Sequence To Speech Phone Sequence To Speech (Cont’d)(Cont’d)
Text to Phone
Sequence
Phone Sequence
to primitive utterance
Text Speechprimitive utteranceto Natural
Speech
NLP Speech Processing
99
Speech NaturalnessSpeech Naturalness
Obviation of undesirable noise and Obviation of undesirable noise and distortion and dissociation from speechdistortion and dissociation from speech
Intonation and Stress are very effective in Intonation and Stress are very effective in speech naturalnessspeech naturalness
Intonation : Variation of Pitch frequency Intonation : Variation of Pitch frequency along speakingalong speaking
Stress : Increasing the pitch frequency in a Stress : Increasing the pitch frequency in a specific timespecific time
1111
Concatenative ApproachesConcatenative Approaches
In this approaches we store units of In this approaches we store units of natural speech for reconstruction of natural speech for reconstruction of desired speechdesired speech
We could select the appropriate phone We could select the appropriate phone unit for speech synthesisunit for speech synthesis
we can store compressed parameters we can store compressed parameters instead of main waveforminstead of main waveform
Benefits of storing compressed Benefits of storing compressed parameters instead of main waveformparameters instead of main waveform– Less memory useLess memory use– General state instead of a specific storedGeneral state instead of a specific stored