Proceedings of the EACL 2009 Workshop on Computational
Approaches to Semitic Languages, pages 53–61,Athens, Greece, 31
March, 2009. c©2009 Association for Computational Linguistics
Spoken Arabic Dialect Identification Using Phonotactic
Fadi Biadsy and Julia HirschbergDepartment of Computer
Columbia University, New York,
Nizar HabashCenter for Computational Learning Systems
Columbia University, New York, USAhabash@ccls.columbia.edu
AbstractThe Arabic language is a collection ofmultiple variants,
among which ModernStandard Arabic (MSA) has a special sta-tus as
the formal written standard languageof the media, culture and
education acrossthe Arab world. The other variants are in-formal
spoken dialects that are the mediaof communication for daily life.
Arabic di-alects differ substantially from MSA andeach other in
terms of phonology, mor-phology, lexical choice and syntax. In
thispaper, we describe a system that automat-ically identifies the
Arabic dialect (Gulf,Iraqi, Levantine, Egyptian and MSA) of
aspeaker given a sample of his/her speech.The phonotactic approach
we use provesto be effective in identifying these di-alects with
considerable overall accuracy— 81.60% using 30s test
For the past three decades, there has been a greatdeal of work
on the automatic identification (ID)of languages from the speech
signal alone. Re-cently, accent and dialect identification have
be-gun to receive attention from the speech scienceand technology
communities. The task of dialectidentification is the recognition
of a speaker’s re-gional dialect, within a predetermined
language,given a sample of his/her speech. The
dialect-identification problem has been viewed as morechallenging
than that of language ID due to thegreater similarity between
dialects of the same lan-guage. Our goal in this paper is to
analyze the ef-fectiveness of a phonotactic approach, i.e.
makinguse primarily of the rules that govern phonemesand their
sequences in a language — a techniqueswhich has often been employed
by the languageID community — for the identification of
The Arabic language has multiple variants, in-cluding Modern
Standard Arabic (MSA), the for-
mal written standard language of the media, cul-ture and
education, and the informal spoken di-alects that are the preferred
method of communi-cation in daily life. While there are
commerciallyavailable Automatic Speech Recognition (ASR)systems for
recognizing MSA with low error rates(typically trained on Broadcast
News), these rec-ognizers fail when a native Arabic speaker
speaksin his/her regional dialect. Even in news broad-casts,
speakers often code switch between MSAand dialect, especially in
conversational speech,such as that found in interviews and talk
shows.Being able to identify dialect vs. MSA as well as toidentify
which dialect is spoken during the recog-nition process will enable
ASR engines to adapttheir acoustic, pronunciation, morphological,
andlanguage models appropriately and thus improverecognition
Identifying the regional dialect of a speaker willalso provide
important benefits for speech tech-nology beyond improving speech
recognition. Itwill allow us to infer the speaker’s regional
originand ethnicity and to adapt features used in
speakeridentification to regional original. It should alsoprove
useful in adapting the output of text-to-speech synthesis to
produce regional speech aswell as MSA – important for spoken
dialogue sys-tems’ development.
In Section 2, we describe related work. In Sec-tion 3, we
discuss some linguistic aspects of Ara-bic dialects which are
important to dialect iden-tification. In Section 4, we describe the
Arabicdialect corpora employed in our experiments. InSection 5, we
explain our approach to the identifi-cation of Arabic dialects. We
present our experi-mental results in Section 6. Finally, we
concludein Section 7 and identify directions for future
2 Related Work
A variety of cues by which humans and machinesdistinguish one
language from another have beenexplored in previous research on
fication. Examples of such cues include phone in-ventory and
phonotactics, prosody, lexicon, mor-phology, and syntax. Some of
the most suc-cessful approaches to language ID have madeuse of
phonotactic variation. For example, thePhone Recognition followed
by Language Model-ing (PRLM) approach uses phonotactic informa-tion
to identify languages from the acoustic sig-nal alone (Zissman,
1996). In this approach, aphone recognizer (not necessarily trained
on a re-lated language) is used to tokenize training data foreach
language to be classified. Phonotactic lan-guage models generated
from this tokenized train-ing speech are used during testing to
compute lan-guage ID likelihoods for unknown utterances.
Similar cues have successfully been used forthe identification
of regional dialects. Zisssmanet al. (1996) show that the PRLM
approach yieldsgood results classifying Cuban and Peruvian
di-alects of Spanish, using an English phone recog-nizer trained on
TIMIT (Garofolo et al., 1993).The recognition accuracy of this
system on thesetwo dialects is 84%, using up to 3 minutes of
testutterances. Torres-Carrasquillo et al. (2004) devel-oped an
alternate system that identifies these twoSpanish dialects using
Gaussian Mixture Models(GMM) with shifted-delta-cepstral features.
Thissystem performs less accurately (accuracy of 70%)than that of
(Zissman et al., 1996). Alorfi (2008)uses an ergodic HMM to model
phonetic dif-ferences between two Arabic dialects (Gulf andEgyptian
Arabic) employing standard MFCC (MelFrequency Cepstral
Coefficients) and delta fea-tures. With the best parameter
settings, this systemachieves high accuracy of 96.67% on these
twodialects. Ma et al. (2006) use multi-dimensionalpitch flux
features and MFCC features to distin-guish three Chinese dialects.
In this system thepitch flux features reduce the error rate by
morethan 30% when added to a GMM based MFCCsystem. Given 15s of
test-utterances, the systemachieves an accuracy of 90% on the three
Intonational cues have been shown to be goodindicators to human
subjects identifying regionaldialects. Peters et al. (2002) show
that human sub-jects rely on intonational cues to identify two
Ger-man dialects (Hamburg urban dialects vs. North-ern Standard
German). Similarly, Barakat etal. (1999) show that subjects
distinguish betweenWestern vs. Eastern Arabic dialects
significantlyabove chance based on intonation alone.
Hamdi et al. (2004) show that rhythmic dif-
ferences exist between Western and Eastern Ara-bic. The analysis
of these differences is done bycomparing percentages of vocalic
intervals (%V)and the standard deviation of intervocalic inter-vals
(∆C) across the two groups. These featureshave been shown to
capture the complexity of thesyllabic structure of a
language/dialect in additionto the existence of vowel reduction.
The com-plexity of syllabic structure of a language/dialectand the
existence of vowel reduction in a languageare good correlates with
the rhythmic structure ofthe language/dialect, hence the importance
of sucha cue for language/dialect identification (Ramus,2002).
As far as we could determine, there is noprevious work that
analyzes the effectiveness ofa phonotactic approach, particularly
the parallelPRLM, for identifying Arabic dialects. In this pa-per,
we build a system based on this approach andevaluate its
performance on five Arabic dialects(four regional dialects and
MSA). In addition, weexperiment with six phone recognizers trained
onsix languages as well as three MSA phone recog-nizers and analyze
their contribution to this classi-fication task. Moreover, we make
use of a discrim-inative classifier that takes all the perplexities
ofthe language models on the phone sequences andoutputs the
hypothesized dialect. This classifierturns out to be an important
component, althoughit has not been a standard component in
3 Linguistic Aspects of Arabic Dialects
3.1 Arabic and its Dialects
MSA is the official language of the Arab world.It is the primary
language of the media and cul-ture. MSA is syntactically,
morphologically andphonologically based on Classical Arabic, the
lan-guage of the Qur’an (Islam’s Holy Book). Lexi-cally, however,
it is much more modern. It is nota native language of any Arabs but
is the languageof education across the Arab world. MSA is
pri-marily written not spoken.
The Arabic dialects, in contrast, are the true na-tive language
forms. They are generally restrictedin use to informal daily
communication. Theyare not taught in schools or even standardized,
al-though there is a rich popular dialect culture offolktales,
songs, movies, and TV shows. Dialectsare primarily spoken, not
written. However, thisis changing as more Arabs gain access to
tronic media such as emails and newsgroups. Ara-bic dialects are
loosely related to Classical Ara-bic. They are the result of the
interaction betweendifferent ancient dialects of Classical Arabic
andother languages that existed in, neighbored and/orcolonized what
is today the Arab world. For ex-ample, Algerian Arabic has many
influences fromBerber as well as French.
Arabic dialects vary on many dimensions –primarily, geography
and social class. Geo-linguistically, the Arab world can be divided
inmany different ways. The following is only oneof many that covers
the main Arabic dialects:
• Gulf Arabic (GLF) includes the dialects ofKuwait, Saudi
Arabia, Bahrain, Qatar, UnitedArab Emirates, and Oman.
• Iraqi Arabic (IRQ) is the dialect of Iraq. Insome dialect
classifications, Iraqi Arabic isconsidered a sub-dialect of Gulf
• Levantine Arabic (LEV) includes the di-alects of Lebanon,
Syria, Jordan, Palestineand Israel.
• Egyptian Arabic (EGY) covers the dialectsof the Nile valley:
Egypt and Sudan.
• Maghrebi Arabic covers the dialects ofMorocco, Algeria,
Tunisia and Mauritania.Libya is sometimes included.
Yemenite Arabic is often considered its ownclass. Maltese Arabic
is not always consid-ered an Arabic dialect. It is the only
Arabicvariant that is considered a separate languageand is written
with Latin script.
Socially, it is common to distinguish three sub-dialects within
each dialect region: city dwellers,peasants/farmers and Bedouins.
The three degreesare often associated with a class hierarchy
fromrich, settled city-dwellers down to Bedouins. Dif-ferent social
associations exist as is common inmany other languages around the
The relationship between MSA and the dialectin a specific region
is complex. Arabs do not thinkof these two as separate languages.
This particularperception leads to a special kind of
coexistencebetween the two forms of language that serve dif-ferent
purposes. This kind of situation is what lin-guists term diglossia.
Although the two variantshave clear domains of prevalence: formal
written(MSA) versus informal spoken (dialect), there is
a large gray area in between and it is often filledwith a mixing
of the two forms.
In this paper, we focus on classifying the di-alect of audio
recordings into one of five varieties:MSA, GLF, IRQ, LEV, and EGY.
We do not ad-dress other dialects or diglossia.
3.2 Phonological Variations among ArabicDialects
Although Arabic dialects and MSA vary on manydifferent levels —
phonology, orthography, mor-phology, lexical choice and syntax — we
willfocus on phonological difference in this paper.1
MSA’s phonological profile includes 28 conso-nants, three short
vowels, three long vowels andtwo diphthongs (/ay/ and /aw/). Arabic
dialectsvary phonologically from standard Arabic andeach other.
Some of the common variations in-clude the following (Holes, 2004;
The MSA consonant (/q/) is realized as a glot-tal stop /’/ in
EGY and LEV and as /g/ in GLF andIRQ. For example, the MSA word
appears as /t¯ari:’/ (EGY and LEV) and /t
and IRQ). Other variants also are found in sub di-alects such as
/k/ in rural Palestinian (LEV) and/dj/ in some GLF dialects. These
changes do notapply to modern and religious borrowings fromMSA. For
instance, the word for ‘Qur’an’ is neverpronounced as anything but
The MSA alveolar affricate (/dj/) is realized as/g/ in EGY, as
/j/ in LEV and as /y/ in GLF. IRQpreserves the MSA pronunciation.
For example,the word for ‘handsome’ is /djami:l/ (MSA,
IRQ),/gami:l/ (EGY), /jami:l/ (LEV) and /yami:l/ (GLF).
The MSA consonant (/k/) is generally realizedas /k/ in Arabic
dialects with the exception of GLF,IRQ and the Palestinian rural
sub-dialect of LEV,which allow a /č/ pronunciation in certain
con-texts. For example, the word for ‘fish’ is /samak/in MSA, EGY
and most of LEV but /simač/ in IRQand GLF.
The MSA consonant /θ/ is pronounced as /t/ inLEV and EGY (or /s/
in more recent borrowingsfrom MSA), e.g., the MSA word /θala:θa/
‘three’is pronounced /tala:ta/ in EGY and /tla:te/ in LEV.IRQ and
GLF generally preserve the MSA pronun-ciation.
1It is important to point out that since Arabic dialects arenot
standardized, their orthography may not always be con-sistent.
However, this is not a relevant point to this papersince we are
interested in dialect identification using audiorecordings and
without using the dialectal transcripts at all.
The MSA consonant /δ/ is pronounced as /d/in LEV and EGY (or /z/
in more recent borrow-ings from MSA), e.g., the word for ‘this’ is
pro-nounced /ha:δa/ in MSA versus /ha:da/ (LEV) and/da/ EGY. IRQ
and GLF generally preserve theMSA pronunciation.
The MSA consonants /d¯/ (emphatic/velarized
d) and /δ¯/ (emphatic /δ/) are both normalized to
/d¯/ in EGY and LEV and to /δ
¯/ in GLF and IRQ.
For example, the MSA sentence /δ¯alla yad
‘he continued to hit’ is pronounced /d¯all yud
(LEV) and /δ¯all yuδ
¯rub/ (GLF). In modern bor-
rowings from MSA, /δ¯/ is pronounced as /z
phatic z) in EGY and LEV. For instance, the wordfor ‘police
officer’ is /δ
¯/ in MSA but /z
in EGY and LEV.In some dialects, a loss of the emphatic
of some MSA consonants occurs, e.g., the MSAword /lat
¯i:f/ ‘pleasant’ is pronounced as /lati:f/ in
the Lebanese city sub-dialect of LEV. Empha-sis typically
spreads to neighboring vowels: if avowel is preceded or succeeded
directly by an em-phatic consonant (/d
¯/) then the vowel
becomes an emphatic vowel. As a result, the lossof the emphatic
feature does not affect the conso-nants only, but also their
Other vocalic differences among MSA and thedialects include the
following: First, short vow-els change or are completely dropped,
e.g., theMSA word /yaktubu/ ‘he writes’ is pronounced/yiktib/ (EGY
and IRQ) or /yoktob/ (LEV). Sec-ond, final and unstressed long
vowels are short-ened, e.g., the word /mat
¯a:ra:t/ ‘airports’ in MSA
becomes /mat¯ara:t/ in many dialects. Third, the
MSA diphthongs /aw/ and /ay/ have mostly be-come /o:/ and /e:/,
respectively. These vocalicchanges, particularly vowel drop lead to
differentsyllabic structures. MSA syllables are primarilylight (CV,
CV:, CVC) but can also be (CV:C andCVCC) in utterance-final
positions. EGY sylla-bles are the same as MSA’s although without
theutterance-final restriction. LEV, IRQ and GLF al-low heavier
syllables including word initial clus-ters such as CCV:C and
When training a system intended to classify lan-guages or
dialects, it is of course important to usetraining and testing
corpora recorded under simi-lar acoustic conditions. We are able to
obtain cor-pora from the Linguistic Data Consortium (LDC)with
similar recording conditions for four Arabic
dialects: Gulf Arabic, Iraqi Arabic, Egyptian Ara-bic, and
Levantine Arabic. These are corpora ofspontaneous telephone
conversations produced bynative speakers of the dialects, speaking
with fam-ily members, friends, and unrelated individuals,sometimes
about predetermined topics. Although,the data have been annotated
phonetically and/ororthographically by LDC, in this paper, we do
notmake use of any of annotations.
We use the speech files of 965 speakers (about41.02 hours of
speech) from the Gulf Arabicconversational telephone Speech
database for ourGulf Arabic data (Appen Pty Ltd, 2006a).2 Fromthese
speakers we hold out 150 speakers for test-ing (about 6.06 hours of
speech).3 We use the IraqiArabic Conversational Telephone Speech
database(Appen Pty Ltd, 2006b) for the Iraqi dialect, se-lecting
475 Iraqi Arabic speakers with a total du-ration of about 25.73
hours of speech. Fromthese speakers we hold out 150 speakers4 for
test-ing (about 7.33 hours of speech). Our Levan-tine data consists
of 1258 speakers from the Ara-bic CTS Levantine Fisher Training
Data Set 1-3(Maamouri, 2006). This set contains about 78.79hours of
speech in total. We hold out 150 speakersfor testing (about 10
hours of speech) from Set 1.5
For our Egyptian data, we use CallHome Egyp-tian and its
Supplement (Canavan et al., 1997)and CallFriend Egyptian (Canavan
and Zipperlen,1996). We use 398 speakers from these corpora(75.7
hours of speech), holding out 150 speakersfor testing.6 (about 28.7
hours of speech.)
Unfortunately, as far as we can determine, thereis no data with
similar recording conditions forMSA. Therefore, we obtain our MSA
training datafrom TDT4 Arabic broadcast news. We use about47.6
hours of speech. The acoustic signal was pro-cessed using
forced-alignment with the transcriptto remove non-speech data, such
as music. Fortesting we again use 150 speakers, this time
iden-tified automatically from the GALE Year 2 Dis-tillation
evaluation corpus (about 12.06 hours ofspeech). Non-speech data
(e.g., music) in the test
2We excluded very short speech files from the corpora.3The 24
speakers in devtest folder and the last 63 files,
after sorting by file name, in train2c folder (126 speakers).The
sorting is done to make our experiments reproducible byother
4Similar to the Gulf corpus, the 24 speakers in devtestfolder
and the last 63 files (after sorting by filename) intrain2c folder
5We use the last 75 files in Set 1, after sorting by name.6The
test speakers were from evaltest and devtest folders
in CallHome and CallFriend.
corpus was removed manually. It should be notedthat the data
includes read speech by anchors andreporters as well as spontaneous
speech spoken ininterviews in studios and though the phone.
5 Our Dialect ID Approach
Since, as described in Section 3, Arabic dialectsdiffer in many
respects, such as phonology, lex-icon, and morphology, it is highly
likely thatthey differ in terms of phone-sequence distribu-tion and
phonotactic constraints. Thus, we adoptthe phonotactic approach to
distinguishing amongArabic dialects.
5.1 PRLM for dialect ID
As mentioned in Section 2, the PRLM approach tolanguage
identification (Zissman, 1996) has hadconsiderable success. Recall
that, in the PRLMapproach, the phones of the training utterances
ofa dialect are first identified using a single phonerecognizer.7
Then an n-gram language model istrained on the resulting phone
sequences for thisdialect. This process results in an n-gram
lan-guage model for each dialect to model the dialectdistribution
of phone sequence occurrences. Dur-ing recognition, given a test
speech segment, werun the phone recognizer to obtain the phone
se-quence for this segment and then compute the per-plexity of each
dialect n-gram model on the se-quence. The dialect with the n-gram
model thatminimizes the perplexity is hypothesized to be thedialect
from which the segment comes.
Parallel PRLM is an extension to the PRLM ap-proach, in which
multiple (k) parallel phone rec-ognizers, each trained on a
different language, areused instead of a single phone recognizer
(Ziss-man, 1996). For training, we run all phone recog-nizers in
parallel on the set of training utterancesof each dialect. An
n-gram model on the outputs ofeach phone recognizer is trained for
each dialect.Thus if we have m dialects, k x m n-gram modelsare
trained. During testing, given a test utterance,we run all phone
recognizers on this utterance andcompute the perplexity of each
n-gram model onthe corresponding output phone sequence. Finally,the
perplexities are fed to a combiner to determinethe hypothesized
dialect. In our implementation,
7The phone recognizer is typically trained on one of
thelanguages being identified. Nonetheless, a phone
recognizetrained on any language might be a good
approximation,since languages typically share many phones in their
we employ a logistic regression classifier as ourback-end
combiner. We have experimented withdifferent classifiers such as
SVM, and neural net-works, but logistic regression classifier was
supe-rior. The system is illustrated in Figure 1.
We hypothesize that using multiple phone rec-ognizers as opposed
to only one allows the systemto capture subtle phonetic differences
that mightbe crucial to distinguish dialects. Particularly,since
the phone recognizers are trained on differ-ent languages, they may
be able to model differentvocalic and consonantal systems, hence a
differentphonetic inventory. For example, an MSA phonerecognizer
typically does not model the phoneme/g/; however, an English phone
recognizer does.As described in Section 3, this phoneme is
animportant cue to distinguishing Egyptian Arabicfrom other Arabic
dialects. Moreover, phone rec-ognizers are prone to many errors;
relying uponmultiple phone streams rather than one may leadto a
more robust model overall.
5.2 Phone RecognizersIn our experiments, we have used phone
recogniz-ers for English, German, Japanese, Hindi, Man-darin, and
Spanish, from a toolkit developed byBrno University of Technology.8
These phone rec-ognizers were trained on the OGI
multilanguagedatabase (Muthusamy et al., 1992) using a
hybridapproach based on Neural Networks and Viterbidecoding without
language models (open-loop)(Matejka et al., 2005).
Since Arabic dialect identification is our goal,we hypothesize
that an Arabic phone recognizerwould also be useful, particularly
since otherphone recognizers do not cover all Arabic con-sonants,
such as pharyngeals and emphatic alveo-lars. Therefore, we have
built our own MSA phonerecognizer using the HMM toolkit (HTK)
(Younget al., 2006). The monophone acoustic modelsare built using
3-state continuous HMMs withoutstate-skipping, with a mixture of 12
Gaussians perstate. We extract standard Mel Frequency
CepstralCoefficients (MFCC) features from 25 ms frames,with a frame
shift of 10 ms. Each feature vec-tor is 39D: 13 features (12
cepstral features plusenergy), 13 deltas, and 13 double-deltas. The
fea-tures are normalized using cepstral mean normal-ization. We use
the Broadcast News TDT4 corpus(Arabic Set 1; 47.61 hours of speech;
downsam-pled to 8Khz) to train our acoustic models. The
seconds accuracy Gulf Iraqi Levantine Egyptian
5 60.833 49.2 52.7 58.1 83
15 72.83 60.8 61.2 77.6 91.9
30 78.5 68.7 67.3 84 94
45 81.5 72.6 72.4 86.9 93.7
60 83.33 75.1 75.7 87.9 94.6
120 84 75.1 75.4 89.5 96
"# )"# *$# !"# %$# )+$#
seconds accuracy Gulf Iraqi Levantine Egyptian
5 68.6667 54.5 50.7 60 77.9
15 76.6667 57.3 62.6 73.8 90.7
30 81.6 68.3 71.7 79.4 90.2
45 84.8 69.9 73.6 86.2 94.9
60 86.933 76.8 76.5 85.4 96.3
120 87.86 79.1 77.4 90.1 93.6
"# )"# *$# !"# %$# )+$#
ReferencesF. S. Alorfi. 2008. PhD Dissertation: Automatic
tion Of Arabic Dialects Using Hidden Markov Models. InUniversity
Appen Pty Ltd. 2006a. Gulf Arabic Conversational Tele-phone
Speech Linguistic Data Consortium, Philadelphia.
Appen Pty Ltd. 2006b. Iraqi Arabic Conversational Tele-phone
Speech Linguistic Data Consortium, Philadelphia.
M. Barkat, J. Ohala, and F. Pellegrino. 1999. Prosody as
aDistinctive Feature for the Discrimination of Arabic Di-alects. In
Proceedings of Eurospeech’99.
F. Biadsy, N. Habash, and J. Hirschberg. 2009. Improv-ing the
Arabic Pronunciation Dictionary for Phone andWord Recognition with
Linguistically-Based Pronuncia-tion Rules. In Proceedings of
NAACL/HLT 2009, Col-orado, USA.
A. Canavan and G. Zipperlen. 1996. CALLFRIEND Egyp-tian Arabic
Speech Linguistic Data Consortium, Philadel-phia.
A. Canavan, G. Zipperlen, and D. Graff. 1997. CALL-HOME Egyptian
Arabic Speech Linguistic Data Consor-tium, Philadelphia.
J. S. Garofolo et al. 1993. TIMIT Acoustic-PhoneticContinuous
Speech Corpus Linguistic Data Consortium,Philadelphia.
N. Habash. 2006. On Arabic and its Dialects.
R. Hamdi, M. Barkat-Defradas, E. Ferragne, and F. Pelle-grino.
2004. Speech Timing and Rhythmic Structure inArabic Dialects: A
Comparison of Two Approaches. InProceedings of Interspeech’04.
C. Holes. 2004. Modern Arabic: Structures, Functions,
andVarieties. Georgetown University Press. Revised Edition.
B. Ma, D. Zhu, and R. Tong. 2006. Chinese Dialect
Iden-tification Using Tone Features Based On Pitch Flux.
InProceedings of ICASP’06.
M. Maamouri. 2006. Levantine Arabic QT Training DataSet 5,
Speech Linguistic Data Consortium, Philadelphia.
P. Matejka, P. Schwarz, J. Cernocky, and P. Chytil.
2005.Phonotactic Language Identification using High QualityPhoneme
Recognition. In Proceedings of Eurospeech’05.
Y. K. Muthusamy, R.A. Cole, and B.T. Oshika. 1992. TheOGI
Multi-Language Telephone Speech Corpus. In Pro-ceedings of
J. Peters, P. Gilles, P. Auer, and M. Selting. 2002.
Iden-tification of Regional Varieties by Intonational Cues.
AnExperimental Study on Hamburg and Berlin
F. Ramus. 2002. Acoustic Correlates of Linguistic
Rhythm:Perspectives. In Speech Prosody.
A. Stolcke. 2002. SRILM - an Extensible Language Model-ing
Toolkit. In ICASP’02, pages 901–904.
P. Torres-Carrasquillo, T. P. Gleason, and D. A. Reynolds.2004.
Dialect identification using Gaussian Mixture Mod-els. In
Proceedings of the Speaker and Language Recog-nition Workshop,
S. Young, G. Evermann, M. Gales, D. Kershaw, G. Moore,J. Odell,
D. Ollason, D. Povey, V. Valtchev, and P. Wood-land. 2006. The HTK
Book, version 3.4.
M. A. Zissman, T. Gleason, D. Rekart, and B. Losiewicz.1996.
Automatic Dialect Identification of Extempora-neous Conversational,
Latin American Spanish Speech.In Proceedings of the IEEE
International Conference onAcoustics, Speech, and Signal
Processing, Atlanta, USA.
M. A. Zissman. 1996. Comparison of Four Approaches toAutomatic
Language Identification of Telephone Speech.IEEE Transactions of
Speech and Audio Processing, 4(1).