Top Banner
Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007
29

Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

Sep 06, 2018

Download

Documents

dangthuy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

Speech Synthesis for Linguists:

An Introduction to MBROLA

Dafydd Gibbon

Universität BielefeldMay 2007

Page 2: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 2

OVERVIEW

Page 3: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 3

Speech Synthesis

● Definition:– Speech Synthesis is communication using software which

implements an artificial voice● Types:

– Text-To-Speech synthesis (TTS)– Concept-To-Speech synthesis (CTS)– Close Copy Speech synthesis (CCS)

● Inverse:– Automatic Speech Recognition (ASR)

Page 4: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 4

Speech Synthesis

● Uses:– Reading software for the blind– Computer output in visually difficult situations– Readback in dictation software– Linguistic research and language teaching

● Illustration:– Microsoft Sam– ...

● Background information:– Speech Synthesis - Wikipedia– The MBROLA project

Page 5: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 5

SPEECH COMMUNICATION

Page 6: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 6

Natural speech cycle

Articulatoryorgans

Acousticchannel

Auditoryorgans

Time (s)0 2.86633

-0.2539

0.2628

0 speech signal

A tiger and a mouse were walking in a field...

Page 7: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 7

Artificial speech cycle

Speechsynthesis

Acousticchannel

ASR

Time (s)0 2.86633

-0.2539

0.2628

0 speech signal

SpeechTechnology

Software

It would be a considerable invention...

Page 8: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 8

SPEECH PROCESSING

Page 9: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 9

Natural speech processing

Page 10: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 10

SPEECH SYNTHESISER

Artificial speech processingInformation input (e.g. text)

Representation of sounds:NLP-DSP interface

Acoustic output

LINGUISTIC BACK END.Natural Language Processing

(NLP) component

SYNTHESIS FRONT END:Digital Signal Processing

(DSP) component

Page 11: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 11

Text-To-Speech NLP Component

● Preprocessing– Abbreviations– Numbers

● Lexical analysis, tokenisation– Identification of words (word boundaries, lexicon)

● Parsing– Identification of parts of speech– Identification of phrases (grammar)– Identification of stress, focus, emphasis positions

● Phonetisation– Grapheme-phoneme conversion– Prosodic analysis

● Pitch assignment (accentuation, intonation)● Duration assignment (tempo, phrasing, rhythm)

Page 12: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 12

PROSODY COMPONENT

TTS NLP Components

PREPROCESSOR

LEXICAL ANALYSER / TOKENISER

PARSER

LOUDNESSPITCHDURATIONGRAPHEME-PHONEME

CONVERTER(PHONETISER)

LEXICON

TEXT INPUT

NLP-DSP INTERFACE TO SYNTHESIS ENGINE

Page 13: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 13

NLP-DSP INTERFACE

PHONEME LOUDNESSPITCHDURATION

PHONEME LOUDNESSPITCHDURATION

PHONEME LOUDNESSPITCHDURATION

PHONEME LOUDNESSPITCHDURATION

PHONEME LOUDNESSPITCHDURATION

PHONEME LOUDNESSPITCHDURATION

PHONEME LOUDNESSPITCHDURATION

Page 14: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 14

SIMPLIFIED NLP-DSP INTERFACE

PHONEME PITCHDURATION

PHONEME PITCHDURATION

PHONEME PITCHDURATION

PHONEME PITCHDURATION

PHONEME PITCHDURATION

PHONEME PITCHDURATION

PHONEME PITCHDURATION

Page 15: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 15

MBROLA

Page 16: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 16

General Information about MBROLA

● The MBROLA Project● DSP front end only

– you have to find or make the NLP back end:● TTS, CCS, ...

● Voice:– diphone database– MBROLA timing and pitch normalisation algorithm– normalised intensity

● Development– there are many MBROLA voices for many languages– anyone can make an MBROLA voice:

● recording-annotation-diphone splitting - conversion● conversion by MBROLA team● voice is then in the public domain

Page 17: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 17

Example

euler.wav

Page 18: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 18

Example“It would be a considerable invention indeed, that of a machine able to mimic our speech, with its sounds and articulations. I think it is not impossible.”

Leonard Euler, 1761

euler.txt

euler.wav

Page 19: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 19

Example“It would be a considerable invention indeed, that of a machine able to mimic our speech, with its sounds and articulations. I think it is not impossible.”

Leonard Euler, 1761

euler.txt

euler.wav

Phoneme Duration Pitch place/value pairs(SAMPA) (ms) (% Hz)

I 60 75 109 n 80 v 40 e 90 0 109 50 153 75 177 n 30 25 195 S 80 @ 70 n 30

Page 20: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 20

Example“It would be a considerable invention indeed, that of a machine able to mimic our speech, with its sounds and articulations. I think it is not impossible.”

Leonard Euler, 1761

euler.txt

euler.pho

euler.wav

Phoneme Duration Pitch place/value pairs(SAMPA) (ms) (% Hz)

I 60 75 109 n 80 v 40 e 90 0 109 50 153 75 177 n 30 25 195 S 80 @ 70 n 30

Page 21: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 21

Outputeuler.pho

euler.wav

euler_short.pho

euler_short.TextGrideuler_short.wav

Praat

amplitude

waveform

spectrogram

pitch track

frequency(& intensity)time

Page 22: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 22

PRACTICAL WORK WITH MBROLA

Page 23: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 23

Starting up with MBROLA

● Go to the MBROLA website– http://tcts.fpms.ac.be/synthesis/mbrola.html

● Download– the MBROLA binary file for your operating system– an MBROLA voice

● Install the MBROLA binary and the voice– follow the instructions

● Find a .pho file– double-click, which opens the MBROLI user interface– go ahead...

Page 24: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 24

MAKING A VOICE: FIRST STEPS

Page 25: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 25

Recorded speech data

amplitude

waveform

spectrogram

pitch track

frequency(& intensity)time

A tiger and a mouse were walking in a field...

Page 26: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 26

Annotated speech data

oscillogramme

pitch track

orthography tier

A tiger and a mouse were walking in a field...

Page 27: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 27

Creating diphones

oscillogramme

pitch track

orthography tier

phoneme tier

diphone tier

A tiger and a mouse were walking in a field...

Page 28: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 28

Creating the diphone database

● Instructions for creating voices are to be found on the MBROLA project page:– cut diphones out of the annotation file(s)– organise as a diphone database according to the

instructions– submit the database to the MBROLA team

● The voice (normalised diphone database) will be returned to you.

● Test the voice in the manner illustrated in these slides.● Create an NLP front end

– CCS– or TTS, which is an entirely different story...

Page 29: Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

U Bielefeld, May 2007 Dafydd Gibbon: Speech Synthesis for Linguists 29

And then have your pictureput on

the MBROLA project site...