Concatenative bangla speech synthesizer model

1. Concatenative Bangla Speech Synthesizer Model Author's Name : Md. Abdullah-al-mamun 1

2. OUTLINEOUTLINE What is speech Synthesis?What is speech Synthesis? Concatenative SynthesisConcatenative Synthesis Concatenative Synthesizer ModelConcatenative Synthesizer Model Bangla keyword setBangla keyword set Classification of KeywordsClassification of Keywords Independent KeywordIndependent Keyword Dependent keywordDependent keyword Database ModelingDatabase Modeling Speech Synthesis ProcessSpeech Synthesis Process Synthesizer ComplexitySynthesizer Complexity PerformancePerformance ConclusionsConclusions ReferencesReferences 2

3. 3 Primary communication process to exchanged information among people is speech. In the modern era of information technology we can expect to carry out spoken dialogue with computers by the speech technology. A text to speech synthesis technology creating synthetic voice from text that has integrated language and speech for human-computer interaction. Abstract

4. What isWhat is SSpeechpeech SSynthesisynthesis?? Speech Synthesis is the artificial production ofSpeech Synthesis is the artificial production of human speech.human speech. The speech synthesizer is device that is used toThe speech synthesizer is device that is used to translate text characters into sounds thattranslate text characters into sounds that approximate the sound of human speech.approximate the sound of human speech. Speech synthesizer also known as Text-to-Speech synthesizer also known as Text-to- Speech(TTS).Speech(TTS). 4

5. Synthesizer TechnologySynthesizer Technology HMM-based synthesisHMM-based synthesis Formant synthesisFormant synthesis Concatenation synthesisConcatenation synthesis Diphone synthesisDiphone synthesis Sinewave synthesisSinewave synthesis And so on...And so on... 5

6. Concatenative SynthesisConcatenative Synthesis Concatenative synthesis is based on theConcatenative synthesis is based on the concatenation of segments of recorded speech.concatenation of segments of recorded speech. Concatenative synthesis technology can beConcatenative synthesis technology can be created by concatenating number of recordedcreated by concatenating number of recorded voice that are stored in a database as audio file.voice that are stored in a database as audio file. To return the synthesizer speech, a keyword isTo return the synthesizer speech, a keyword is taken as input and searched from the databasetaken as input and searched from the database and returning the output as speech.and returning the output as speech. 6

7. Input Text Keyword Generator Speech Synthesizer Databases Concatenative Synthesizer ModelConcatenative Synthesizer Model Figure 1 :Functional Scheme of an Speech Synthesizer SystemFigure 1 :Functional Scheme of an Speech Synthesizer System X Y W* 7

8. KeywordKeyword Keywordis a unit of organization for a sequenceKeywordis a unit of organization for a sequence ofspeechsounds.ofspeechsounds. For example, the word For example, the word consist of twoconsist of two keywords, one is keywords, one is and other is and other is .. 8

9. Bangla keyword setBangla keyword set Bangla keywords is collected by the Bangla literacyBangla keywords is collected by the Bangla literacy books:books: 1)1) (Riktar Badon) Written by(Riktar Badon) Written by (Kazi(Kazi Nazrul Islam);Nazrul Islam); 2)2) (Durgashnandini)written by(Durgashnandini)written by (Bumkimchandro Chittopadhai);(Bumkimchandro Chittopadhai); 3)3) (Shas Prasno) written by(Shas Prasno) written by (Shratchandro Chittopadhai);(Shratchandro Chittopadhai); 4)4) (Magnabod Kabbo)written by(Magnabod Kabbo)written by (Maikal Modhosudon(Maikal Modhosudon Dotto).Dotto). 9

10. CClassificationlassification ofof KKeywordseywords Bengal Word Independent Dependent Vowel Consonant Modifier Character Compound Character 10

11. Independent KeywordIndependent Keyword A keyword that is constructed by only one letter.A keyword that is constructed by only one letter. a) Vowel(a) Vowel(): A speech sound that is produced): A speech sound that is produced by comparatively open configuration of theby comparatively open configuration of the vocal tract likevocal tract like ,, ,, ,, and so on.and so on. b) Consonant(b) Consonant(): A basic speech sound in): A basic speech sound in which the breath is at least partly obstructed andwhich the breath is at least partly obstructed and combined with a vowel to form a syllable likecombined with a vowel to form a syllable like ,, ,, ,, and so on.and so on. 11

12. Dependent keywordDependent keyword A keyword is constructed by one or more consonant withA keyword is constructed by one or more consonant with combining kar(combining kar() (smallest term of vowel. i.e,) (smallest term of vowel. i.e, andand like this ) or fola(like this ) or fola()) a) Modifier Character: A keyword that is constructed by onea) Modifier Character: A keyword that is constructed by one consonant with kar (consonant with kar ( ) for example) for example ,, ,, ,, ,, ,, ,, ,, and like this.and like this. b) Compound Character: A keyword which is the combination ofb) Compound Character: A keyword which is the combination of two or more consonants . For example,two or more consonants . For example, ,, ,, ,, ,, ,, ,, ,, ,, and link this.and link this. c) Complex Character: A keyword is the combination of bothc) Complex Character: A keyword is the combination of both modifier character and compound character; For example,modifier character and compound character; For example, and so on.and so on. 12

13. Speech DatabaseSpeech Database -A speech database is a collection ofA speech database is a collection of recorded speech accessible on a computerrecorded speech accessible on a computer and supported with the necessaryand supported with the necessary transcriptions.transcriptions. -In this Speech Synthesizer model, I haveIn this Speech Synthesizer model, I have used about 1200 keywords.used about 1200 keywords. 13

14. DDatabaseatabase MModelingodeling 14

15. Speech Synthesis ProcessSpeech Synthesis Process 15 Normalization Database

16. Synthesizer ComplexitySynthesizer Complexity 16 Table 1: Time variation for different keywords before and after segmentation Word () Total audio file length L(ms) 2:1 ratio length for letter LA(ms) Origina l length for LO (ms) Error(% ) |LA-LO| ---------- L 1120 746.67 810 5.65% 960 640 710 7.29% 1050 700 670 2.85% 940 626.67 690 6.73% Figure 6: Error rate for different keywords before and after segmentation. Ratio problem to segmenting recorded voice

17. Synthesizer Complexity(Cont.)Synthesizer Complexity(Cont.) 17 Table 1: Time variation for different speakers Figure 6: Error rate for different speakers for same word . Speaker variation problem Speaker Audio file length for Speaker S (ms) 2:1 ratio length for letter SA (ms) Original length for letter SO (ms) Error(%) |SA-SO| --------- S Speaker_1 1150 766.67 820 4.63% Speaker_2 980 653.34 690 3.74% Speaker_3 1070 713.34 780 6.22% Speaker_4 1020 680 750 6.86%

18. Synthesizer Complexity(Cont.)Synthesizer Complexity(Cont.) 18 Confusing letter problem Some types of Bangla keyword utterance cant be detected properly like (), (), () For example, the word and both utterance are same. So the keyword and has no difference.

19. PerformancePerformance 19 Table 1: Time variation for different speakers In our proposed Concatenative Bangla Speech Synthesizer Model the Bangla sentence as an input (text) to synthesis that time listener couldnt fully identify all of the words. It was found that the listeners identified about 85% of the words correctly from the text.

20. ConclusionsConclusions Using Concatenation speech synthesis algorithmUsing Concatenation speech synthesis algorithm phonetic contexts has to reduce the audiophonetic contexts has to reduce the audio waveform discontinuities and the phantomwaveform discontinuities and the phantom mismatches at the borders.mismatches at the borders. Our goal is to develop a Bangla TTS applicationOur goal is to develop a Bangla TTS application that can produce real-time speech from thethat can produce real-time speech from the input text for human-computer interaction.input text for human-computer interaction. So, in future we will try to improve the accuracySo, in future we will try to improve the accuracy about this Bangla speech synthesizer model.about this Bangla speech synthesizer model. 20

21. ThankThank YouYou 21

Concatenative bangla speech synthesizer model

Engineering

synthesizer speech

speech synthesis technology

speech technology

tothe speech synthesizer

recorded speech

sound of human speech

theconcatenative synthesis

keyword isto