This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. Concatenative Bangla Speech Synthesizer Model Author's Name
: Md. Abdullah-al-mamun 1
2. OUTLINEOUTLINE What is speech Synthesis?What is speech
Synthesis? Concatenative SynthesisConcatenative Synthesis
Concatenative Synthesizer ModelConcatenative Synthesizer Model
Bangla keyword setBangla keyword set Classification of
KeywordsClassification of Keywords Independent KeywordIndependent
Keyword Dependent keywordDependent keyword Database
ModelingDatabase Modeling Speech Synthesis ProcessSpeech Synthesis
Process Synthesizer ComplexitySynthesizer Complexity
PerformancePerformance ConclusionsConclusions ReferencesReferences
2
3. 3 Primary communication process to exchanged information
among people is speech. In the modern era of information technology
we can expect to carry out spoken dialogue with computers by the
speech technology. A text to speech synthesis technology creating
synthetic voice from text that has integrated language and speech
for human-computer interaction. Abstract
4. What isWhat is SSpeechpeech SSynthesisynthesis?? Speech
Synthesis is the artificial production ofSpeech Synthesis is the
artificial production of human speech.human speech. The speech
synthesizer is device that is used toThe speech synthesizer is
device that is used to translate text characters into sounds
thattranslate text characters into sounds that approximate the
sound of human speech.approximate the sound of human speech. Speech
synthesizer also known as Text-to-Speech synthesizer also known as
Text-to- Speech(TTS).Speech(TTS). 4
5. Synthesizer TechnologySynthesizer Technology HMM-based
synthesisHMM-based synthesis Formant synthesisFormant synthesis
Concatenation synthesisConcatenation synthesis Diphone
synthesisDiphone synthesis Sinewave synthesisSinewave synthesis And
so on...And so on... 5
6. Concatenative SynthesisConcatenative Synthesis Concatenative
synthesis is based on theConcatenative synthesis is based on the
concatenation of segments of recorded speech.concatenation of
segments of recorded speech. Concatenative synthesis technology can
beConcatenative synthesis technology can be created by
concatenating number of recordedcreated by concatenating number of
recorded voice that are stored in a database as audio file.voice
that are stored in a database as audio file. To return the
synthesizer speech, a keyword isTo return the synthesizer speech, a
keyword is taken as input and searched from the databasetaken as
input and searched from the database and returning the output as
speech.and returning the output as speech. 6
7. Input Text Keyword Generator Speech Synthesizer Databases
Concatenative Synthesizer ModelConcatenative Synthesizer Model
Figure 1 :Functional Scheme of an Speech Synthesizer SystemFigure 1
:Functional Scheme of an Speech Synthesizer System X Y W* 7
8. KeywordKeyword Keywordis a unit of organization for a
sequenceKeywordis a unit of organization for a sequence
ofspeechsounds.ofspeechsounds. For example, the word For example,
the word consist of twoconsist of two keywords, one is keywords,
one is and other is and other is .. 8
9. Bangla keyword setBangla keyword set Bangla keywords is
collected by the Bangla literacyBangla keywords is collected by the
Bangla literacy books:books: 1)1) (Riktar Badon) Written by(Riktar
Badon) Written by (Kazi(Kazi Nazrul Islam);Nazrul Islam); 2)2)
(Durgashnandini)written by(Durgashnandini)written by (Bumkimchandro
Chittopadhai);(Bumkimchandro Chittopadhai); 3)3) (Shas Prasno)
written by(Shas Prasno) written by (Shratchandro
Chittopadhai);(Shratchandro Chittopadhai); 4)4) (Magnabod
Kabbo)written by(Magnabod Kabbo)written by (Maikal
Modhosudon(Maikal Modhosudon Dotto).Dotto). 9
10. CClassificationlassification ofof KKeywordseywords Bengal
Word Independent Dependent Vowel Consonant Modifier Character
Compound Character 10
11. Independent KeywordIndependent Keyword A keyword that is
constructed by only one letter.A keyword that is constructed by
only one letter. a) Vowel(a) Vowel(): A speech sound that is
produced): A speech sound that is produced by comparatively open
configuration of theby comparatively open configuration of the
vocal tract likevocal tract like ,, ,, ,, and so on.and so on. b)
Consonant(b) Consonant(): A basic speech sound in): A basic speech
sound in which the breath is at least partly obstructed andwhich
the breath is at least partly obstructed and combined with a vowel
to form a syllable likecombined with a vowel to form a syllable
like ,, ,, ,, and so on.and so on. 11
12. Dependent keywordDependent keyword A keyword is constructed
by one or more consonant withA keyword is constructed by one or
more consonant with combining kar(combining kar() (smallest term of
vowel. i.e,) (smallest term of vowel. i.e, andand like this ) or
fola(like this ) or fola()) a) Modifier Character: A keyword that
is constructed by onea) Modifier Character: A keyword that is
constructed by one consonant with kar (consonant with kar ( ) for
example) for example ,, ,, ,, ,, ,, ,, ,, and like this.and like
this. b) Compound Character: A keyword which is the combination
ofb) Compound Character: A keyword which is the combination of two
or more consonants . For example,two or more consonants . For
example, ,, ,, ,, ,, ,, ,, ,, ,, and link this.and link this. c)
Complex Character: A keyword is the combination of bothc) Complex
Character: A keyword is the combination of both modifier character
and compound character; For example,modifier character and compound
character; For example, and so on.and so on. 12
13. Speech DatabaseSpeech Database -A speech database is a
collection ofA speech database is a collection of recorded speech
accessible on a computerrecorded speech accessible on a computer
and supported with the necessaryand supported with the necessary
transcriptions.transcriptions. -In this Speech Synthesizer model, I
haveIn this Speech Synthesizer model, I have used about 1200
keywords.used about 1200 keywords. 13
14. DDatabaseatabase MModelingodeling 14
15. Speech Synthesis ProcessSpeech Synthesis Process 15
Normalization Database
16. Synthesizer ComplexitySynthesizer Complexity 16 Table 1:
Time variation for different keywords before and after segmentation
Word () Total audio file length L(ms) 2:1 ratio length for letter
LA(ms) Origina l length for LO (ms) Error(% ) |LA-LO| ---------- L
1120 746.67 810 5.65% 960 640 710 7.29% 1050 700 670 2.85% 940
626.67 690 6.73% Figure 6: Error rate for different keywords before
and after segmentation. Ratio problem to segmenting recorded
voice
17. Synthesizer Complexity(Cont.)Synthesizer Complexity(Cont.)
17 Table 1: Time variation for different speakers Figure 6: Error
rate for different speakers for same word . Speaker variation
problem Speaker Audio file length for Speaker S (ms) 2:1 ratio
length for letter SA (ms) Original length for letter SO (ms)
Error(%) |SA-SO| --------- S Speaker_1 1150 766.67 820 4.63%
Speaker_2 980 653.34 690 3.74% Speaker_3 1070 713.34 780 6.22%
Speaker_4 1020 680 750 6.86%
18. Synthesizer Complexity(Cont.)Synthesizer Complexity(Cont.)
18 Confusing letter problem Some types of Bangla keyword utterance
cant be detected properly like (), (), () For example, the word and
both utterance are same. So the keyword and has no difference.
19. PerformancePerformance 19 Table 1: Time variation for
different speakers In our proposed Concatenative Bangla Speech
Synthesizer Model the Bangla sentence as an input (text) to
synthesis that time listener couldnt fully identify all of the
words. It was found that the listeners identified about 85% of the
words correctly from the text.
20. ConclusionsConclusions Using Concatenation speech synthesis
algorithmUsing Concatenation speech synthesis algorithm phonetic
contexts has to reduce the audiophonetic contexts has to reduce the
audio waveform discontinuities and the phantomwaveform
discontinuities and the phantom mismatches at the
borders.mismatches at the borders. Our goal is to develop a Bangla
TTS applicationOur goal is to develop a Bangla TTS application that
can produce real-time speech from thethat can produce real-time
speech from the input text for human-computer interaction.input
text for human-computer interaction. So, in future we will try to
improve the accuracySo, in future we will try to improve the
accuracy about this Bangla speech synthesizer model.about this
Bangla speech synthesizer model. 20