Top Banner
Progress on Bangla Text-To-Speech System Presented By: Dr. M. Shahidur Rahman Professor, Dept. of Computer Science & Engg. Shahjalal University of Science & Technology [email protected]
30

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Aug 21, 2015

Download

Education

Shuvo Habib
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Progress on Bangla Text-To-Speech System

Presented By:Dr. M. Shahidur Rahman

Professor, Dept. of Computer Science & Engg.Shahjalal University of Science & Technology

[email protected]

Page 2: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

2

Outline

• Introduction to TTS• How TTS works• Present Bangla TTS systems• Problems of the present Bangla TTS• Directions to improve the performance of

Bangla TTS• Discussion…

Page 3: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

3

What is a TTS?

• The goal of text-to-speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech – TTS is not a “cut-and-paste” approach that strings together

isolated words – Instead, TTS employs linguistic analysis to infer correct

pronunciation and prosody (i.e., NLP) and acoustic representations of speech to generate waveforms (i.e., DSP)

Page 4: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

4

TTS ApplicationsApplications: Services for the visually impaired community Services for the Illiterate people with difficulties in reading Enable use of Computers and IT services

Reading email aloud Using Word processor Using Internet

Commercial TTS Systems: Festival Bell Labs TTS

Page 5: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

5

How TTS Works

Page 6: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

6

Different TTS Systems

Phoneme-Based TTS System• Phonemes are:

– The minimal distinctive phonetic units– Relatively small in number (39 phonemes in English)

• Disadvantage– Phonemes ignore transitional sound !!!

Page 7: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

7

Different TTS Systems (cont’d)

Diphone-Based TTS System: Diphones are:

– Made up of 2 phonemes– Incorporate transitional sound– Produce better sounding speech– Ex. কক = ক + কঅ + অক + ক

Disadvantage:• Over 1500 diphones in English language !!!

Page 8: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

8

Text Pre-Processing

• Convert raw text, which may include numbers, abbreviations, etc., into the equivalent of written-out words

Page 9: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

9

Word to Diphone Converter (Phonetization)

PurposeTranslate words to their diphone representations

(Ex. রা�জা� -> Diphones: { রা + রাআ + আজা + জাআ})mark the text into prosodic units such as phrases,

clauses and sentences

Resource– Dictionary of words and their diphones

Page 10: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

10

Prosody

DiphoneRetrieval ConcatenationAcoustic

Manipulation

DiphoneDatabase

ProsodyParam.

Page 11: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

11

Properties of Speech

PeriodicNon-Periodic

Non-Periodic

eg. cat.wav

Page 12: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

12

Altering Pitch/Duration/Amplitude

• For smooth concatenation, altering pitch, duration and amplitude at the concatenation point is very important.

Page 13: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

13

Altering Pitch

Hanningwindow

Original diphone Extractedpitch period

Hannedpitch period

X =

Page 14: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

14

PSOLA – Pitch Synchronous Overlap and Add

=

50% Overlap + Add

Pitch Up > 50%Pitch Down < 50%

Page 15: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

15

Altering Duration

• Increase number of PSOLA iterations (overlaps) to increase duration

• Decrease number of PSOLA iterations (overlaps) to decrease duration

Page 16: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

16

Altering Amplitude

Multiplying the signal by a constantIf constant > 1, amplitude increaseIf constant < 1, amplitude decrease

Page 17: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

17

Concatenation

Diphones Word• Using PSOLA at the joining ends• Ensures smooth transition

Words Sentence• Straight joining at the end points due to

presence of pauses

Page 18: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

18

Putting All Together

TTS System

TextPre-processing Prosody Concatenation

words

Page 19: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

19

Types of Concatenative speech synthesis

• Concatenative synthesis with a fixed inventory– contain one sample for each unit, and perform

prosodic modification to match the required prosody

• Unit-selection-based synthesis– store several instances of each unit, thus

improving the chances of finding a well-matched unit

Page 20: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

20

Progress of Bangla TTS

• KATHA Developed in BRAC university Unit based system using Festival framework 4355 Diphones Takes 2 sec to generate a 10 sec utterance

• BANGLA VAANI syllable based synthesis system Developed in Kolkata

• SUBACHAN Developed by SUST people Diphone based synthesis system 527 Diphones Takes 45ms to generate a 10 sec utterance

Page 21: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Speech Signal From Kotha and Subachan

• (Voice of kotha) তি�তি প্রধা�� কতি� হলে�ও বে�শ তিকছু� প্র�ন্ধ- তি�ন্ধ রাচ� ও প্রক�শ কলেরালেছু

• (Voice of Subachan) তি�তি প্রধা�� কতি� হলে�ও বে�শ তিকছু� প্র�ন্ধ- তি�ন্ধ রাচ� ও প্রক�শ কলেরালেছু

• (Voice of kotha) জা���ন্দ দা�শ তি��শ শ��ব্দী�রা অ��ম প্রধা� আধা�তিক ����� কতি�

• (Voice of Subachan) জা���ন্দ দা�শ তি��শ শ��ব্দী�রা অ��ম প্রধা� আধা�তিক ����� কতি�

21

Page 22: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

22

Problems: Homograph Ambiguity

• Homographs are words that share the same spelling but differ in meaning and pronunciation

Page 23: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

23

Solution: Homograph Disambiguation

Collect all possible homograph words Determine POS tag of the homograph

words Ex. বেছুলে�রা� ম�লে �� (bol) বে!�লেছু।

�# তিম যা�লে� তিক � �� (bolo)।• Bayes Theorem can also be applied to determine the

likelihood of a word.

Page 24: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

24

Problems: Improper Concatenation

Not concatenated properly

Signal from the the utterance of রা�শে�দ

Page 25: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

25

Solution: Improper Concatenation

• PSOLA• Reducing number of concatenation point

– Ex 1. Sentence-> ।ক�ম�� ভা�� বেছুলে� Diphones-> ক� + আম� + আ� ভা�+ আলে�� বেছু+এলে�Instead of ক + কআ + আম + মআ + আ� + …�– Ex 2. ফ��( পৃ*তি+�� -> পৃ* + ইতি+ + ই��

• Vowel sound is periodic, thus suitable for appropriate concatenation

• Use 1000 most frequently spoken word

Page 26: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

26

Duration Modeling

Page 27: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

27

Duration Modeling

Page 28: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

28

Thank you all!

Suggestions??

Page 29: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

29

Sound Synthesized by Katha

• Katha

Page 30: Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

30

Sound Synthesized by Subachan

• Subachan