Top Banner
1 Spoken Arabic Dialect ID Speech & Audio Processing & Recognition Fadi Biadsy March 13, 2008
37

Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

Apr 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

1

Spoken Arabic Dialect ID

Speech & Audio Processing & Recognition

Fadi BiadsyMarch 13, 2008

Page 2: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

2

Background

Modern Standard Arabic (MSA): standard language throughout the Arab world (Literary Arabic)

A native Language of Nobody

Colloquial Arabic: collective term for all dialects of Arabic

Page 3: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

3

Maghrebi, Egyptian, Sudanese, Levantine, Iraqi, Arabian

Page 4: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

4

Dialect ID

Given a speech segment as short as possible Dialect ID

Page 5: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

5

Why Study Dialect ID

Interesting problem Phonetic cues? Prosodic cues? (e.g., intonational contours, phrase accents,

durational features...)

*Lexical and syntactic features?

Page 6: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

6

Why Study Dialect ID

Page 7: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

6

Why Study Dialect ID

ASR fails when an Arabic speaker code switches to her regional dialect

Page 8: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

6

Why Study Dialect ID

ASR fails when an Arabic speaker code switches to her regional dialect

Identifying dialects prior to recognition enables the ASR to adapt its:

Page 9: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

6

Why Study Dialect ID

ASR fails when an Arabic speaker code switches to her regional dialect

Identifying dialects prior to recognition enables the ASR to adapt its:

Pronunciation Model Acoustic Models Morphological Model Language Model

Page 10: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

6

Why Study Dialect ID

ASR fails when an Arabic speaker code switches to her regional dialect

Identifying dialects prior to recognition enables the ASR to adapt its:

Pronunciation Model Acoustic Models Morphological Model Language Model

Speaker Annotation

Page 11: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

7

Dialect ID – Our Approach

Phonotactic Modeling Hypothesis: Every Arabic dialect has its own

phonetic distribution This approach was successfully used in

Language ID

Page 12: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

8

Dialect ID - TRAIN

Page 13: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

8

Dialect ID - TRAIN

First, train an MSA Arabic “phone” recognizer

Page 14: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

8

Dialect ID - TRAIN

First, train an MSA Arabic “phone” recognizer Now, given K dialects

For Dialect idh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n

f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae

dh iy jh sh p eh ae ey d p sh ua r m ey f ay n z

Page 15: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

9

Dialect ID - TRAIN

First, train an MSA Arabic “phone” recognizer Now, given K dialects

For Dialect idh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n

f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae

dh iy jh sh p eh ae ey d p sh ua r m ey f ay n z

Train an n-gram modelλi

Page 16: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

10

Dialect ID - TEST

Given a speech segment S from an unknown dialect:

uw hh ih n d uw w ay ey uh jh y eh k oh v hh aw ao n hh aa m

S PS

Page 17: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

11

Dialect ID - TEST

Given a speech segment S from an unknown dialect:

uw hh ih n d uw w ay ey uh jh y eh k oh v hh aw ao n hh aa m

S PS

Page 18: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

12

Experiment

Train an MSA “phone” recognizer on ~37 hours of speech from TDT4 Broadcast News

Page 19: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

13

Corpora – Levantine

Page 20: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

13

Corpora – Levantine

Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524 speaker Each dialogue is 10 minutes 127 hours of speech Annotated: LEB=547, JOR=393, PAL=187, SYR=72

Page 21: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

13

Corpora – Levantine

Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524 speaker Each dialogue is 10 minutes 127 hours of speech Annotated: LEB=547, JOR=393, PAL=187, SYR=72

Silence based segmentation + remove every segment < 0.5s

Page 22: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

Page 23: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech

Page 24: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers

Page 25: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of

speech

Page 26: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of

speech

Page 27: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of

speech

Silence based segmentation + remove every segment < 0.5s

Page 28: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of

speech

Silence based segmentation + remove every segment < 0.5s

Page 29: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of

speech

Silence based segmentation + remove every segment < 0.5s

Page 30: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

14

Corpora – Egyptian

CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of

speech

Silence based segmentation + remove every segment < 0.5s

Page 31: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

15

Experiment

Page 32: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

15

Experiment

Egyptian corpus: held-out 20/240 speakers Run the Arabic phone recognizer on 220 files: ~18.3 million phones

Levantine corpus, held out 757/1524 Run the Arabic phone recognizer on 220 files:

~19.4 million phones

Page 33: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

16

Results on the held out Data

Levantine: 98.3% 744/757 were correctly classified as Levantine

Egyptian: 95% 19/20 were correctly classified as Egyptian

Page 34: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

17

Results on a different corpus

Babylon Levantine corpus Microphone Recordings 164 speakers ~60 hours of speech Accuracy: 96.3% speakers

Page 35: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

18

TODO

Test on a different corpus for Egyptian

Try to identify “sub” dialects (from the same corpus)

Identify Gulf and Iraqi Arabic

Incorporate English phone recognizer

Page 36: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

19

Important issue (TODO)

We use all the speech of a speaker avg: ~5 minutes for Lev. avg: ~15 minutes for Egy.

Will this approach work if we use less than 30s of speech?

Page 37: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524

20

Thank you!