Can speech technology be useful for people with dysarthria? Speech technology & pathology
Post on 02-Feb-2016
47 Views
Preview:
DESCRIPTION
Transcript
Can speech technology be usefulCan speech technology be usefulfor people with dysarthria?for people with dysarthria?
Speech technology & pathologySpeech technology & pathology
Helmer StrikLanguage & SpeechDept. of LinguisticsRadboud University
Nijmegen
02-09-2005, Antwerpen
SPACE symposium 2
OutlineOutline
Speech technology & pathology Applications: existing, possible In practice Target groups
Speech technology & dysarthria Introduction Speech recognition for dysarthric speech
Conclusions
02-09-2005, Antwerpen
SPACE symposium 3
ApplicationsApplications
AAC (Augmentative & Alternative Communication):
Improve communication Interactive tools:
Training, reading, listening Assessment:
Diagnosis, monitoring Therapy
02-09-2005, Antwerpen
SPACE symposium 4
AACAAC
Speaking problems Speech generation Speech manipulation Speech recognition (of handicapped)
+ output (text, speech, talking head, etc.)
Hearing problems Hearing aids, cochlear implants, etc. Speech recognition (of others)
+ output (text,sign language, talking head, etc.)
02-09-2005, Antwerpen
SPACE symposium 5
ASR & output channel
ASR
text
speechsynthesis
02-09-2005, Antwerpen
SPACE symposium 6
Interactive toolsInteractive tools
Speech generation Reading tools: screen readers, reading pen, text
processors, etc. Writing tools: word prediction, TTS, (dedicated)
spell checking Analysis, manipulation, training
Delayed Auditory Feedback (DAF) and Frequency Altered Feedback (FAF), for stutterers
CAFET: Computer-Aided Fluency Establishment Training
CAPT: Computer Assisted Pronunciation Training
02-09-2005, Antwerpen
SPACE symposium 7
Delayed Auditory Feedback (DAF) Delayed Auditory Feedback (DAF) Frequency Altered Feedback (FAF)Frequency Altered Feedback (FAF)
02-09-2005, Antwerpen
SPACE symposium 8
Assessment, therapyAssessment, therapy
Assessment: diagnosis, monitoring TherapyClinical setting, with expert
Speech analysis + visualization, categorization, etc.
IBM speech viewer … Research
02-09-2005, Antwerpen
SPACE symposium 9
ApplicationsApplications
Amount of applications differs
(from most to fewest): speech generation speech analysis, manipulation, etc. speech recognition
02-09-2005, Antwerpen
SPACE symposium 10
In practiceIn practice
Many existing applications
Many more are possible
However, relatively little use
Why?
02-09-2005, Antwerpen
SPACE symposium 11
In practiceIn practice
However, relatively little use. Why?
Needed: Tailor made, flexible applications
Tailor made: taking into account the capabilities & desires of the user + environment
Flexible: the capabilities & desires often change
More user tests & adequacy evaluation
instead of technology improvement & performance evaluation
02-09-2005, Antwerpen
SPACE symposium 12
Target groupsTarget groups
International Classification of Functioning, Disability and Health (ICF):
Mental functions: aphasia, dyslexia, mental disabilities
Sensory functions: blindness, deafness, both Voice & speech functions: dysarthria,
anarthria, mutism, stuttering Motorial functions: dyspraxia, apraxia, RSI /
UEMSD (Upper Extremity Musculoskeletal Disorders)
02-09-2005, Antwerpen
SPACE symposium 13
Speech technology & dysarthriaSpeech technology & dysarthria
Dysarthria: speech disorder caused by dysfunctioning of nerves and muscles
Many different kinds of dysarthria
02-09-2005, Antwerpen
SPACE symposium 14
Can speech technology be useful for Can speech technology be useful for people with dysarthria?people with dysarthria?
Yes!
AAC Interactive tools Assessment Therapy
02-09-2005, Antwerpen
SPACE symposium 15
Can speech technology be useful for Can speech technology be useful for people with dysarthria?people with dysarthria?
Speech generation
Prefer voice similar to their (old) voice
Preferably: own voice
AAC Manipulation Speech recognition + output channel Pronunciation training:
Speech recognition, analysis, feedback, etc.
02-09-2005, Antwerpen
SPACE symposium 16
Speech technology & dysarthriaSpeech technology & dysarthria ASR for dysarthric speech ASR for dysarthric speech
Questions:
How well can dysarthric speech be recognized by a standard (“non-dysarthric”) speech recognizer?
Will the recognition results improve if we train the recognizer on speech of dysarthric speakers?
02-09-2005, Antwerpen
SPACE symposium 17
Experimental setupExperimental setup SpeakersSpeakers
Dysarthric: 2 Dutch males, DYS1 & DYS2Reference: 2 Dutch males, REF1 & REF2
Total duration of the speech material (minutes)
DYS 2: speaks more slowly
DYS1 DYS2 REF1 REF2
8.5 min. 12.8 min. 9.1 min. 7.9 min.
02-09-2005, Antwerpen
SPACE symposium 18
Experimental setupExperimental setup Speech tasksSpeech tasks
All four speakers read the same list of items, consisting of four different tasks:
1. NUM: numbers 0-12 spoken in isolation
2. PFU: from Polyphone the 50 most Frequent Utterances
3. PMS: 130 Plomp-Mimpen Sentences (semantically unpredictable)
4. PRS: 10 Phonetically Rich Sentences
02-09-2005, Antwerpen
SPACE symposium 19
Experimental setupExperimental setup Speech tasksSpeech tasks
Number of utterances & words per task
The NUM and PRS task were both read three times.
NUM PFU PMS PRS
# utt. 39 50 130 30
# words 39 91 809 336
02-09-2005, Antwerpen
SPACE symposium 20
Experimental setupExperimental setup Speech recognizerSpeech recognizer
General specifications Standard phone based recognizer 37 context independent phones 3-state HMM’s 14 cepstral coeffiecients + delta’s from
Melbank freq 350-3400 Hz 16ms Hamming window, 10 ms step
02-09-2005, Antwerpen
SPACE symposium 21
Experimental setupExperimental setup ExperimentsExperiments
Lexicon & language model (uni- and bigram)
Based on all words in 4 tasks
Task specific & same for all speakers
Perplexity
NUM PFU PMS PRS
13 15 8 2
02-09-2005, Antwerpen
SPACE symposium 22
Experimental setupExperimental setup Speaker Indep. & DependentSpeaker Indep. & Dependent
SI: Speaker Independent training material
Polyphone (5000+ speaker Dutch telephone database) 4022 connected digit strings 3702 polyphone most frequent items 20,110 phonetically rich sentences
SD: Speaker Dependent training material
Speakers own speech
02-09-2005, Antwerpen
SPACE symposium 23
Speaker Independent (SI)Speaker Independent (SI) Results Results
Word Error Rates (WERs) for SI recognition
15,4
19,8
30,3
7,4
41
22
15,2
4,5
01,1
2,1 1,20
1,1 1,70
0
5
10
15
20
25
30
35
40
45
NUM PFU PMS PRS
WER
DYS1
DYS2
REF1
REF2
02-09-2005, Antwerpen
SPACE symposium 24
Speaker Independent (SI)Speaker Independent (SI)ConclusionsConclusions
REF better than DYS DYS1 better than DYS2 in short utterances
because of speaking rate (table 1) Results DYS quite reasonable (especially for
sentences) because of tight language model
02-09-2005, Antwerpen
SPACE symposium 25
Speaker Dependent (SD)Speaker Dependent (SD)
= semi randomly selected test set
= rest = training set
Models (also) trained on speech of speakers
Jackknife procedure
02-09-2005, Antwerpen
SPACE symposium 26
Speaker Dependent (SD)Speaker Dependent (SD) Results Results
Word Error Rates (WERs) for the whole test set
for different number of Gaussians (2N)
2N 0 2 4 8 16 32 64
DYS1 14.3 12.0 9.5 9.7 10.3 11.7 15.1
DYS2 7.5 4.1 2.9 2.4 3.0 3.8 5.3
REF1 3.4 2.2 1.8 2.6 3.5 4.0 4.2
REF2 3.6 2.4 2.8 3.0 3.3 3.9 4.4
02-09-2005, Antwerpen
SPACE symposium 27
Speaker Dependent (SD)Speaker Dependent (SD) Results Results
2,6
9,9
12,2
3,6
0
5,5
3,3
1,5
0
1,1
2,2
1,2
0
2,2
3,6
1,2
0
2
4
6
8
10
12
14
NUM PFU PMS PRS
WER
DYS1
DYS2
REF1
REF2
Word Error Rates (WERs) for SD recognition
02-09-2005, Antwerpen
SPACE symposium 28
Speaker Dependent (SD)Speaker Dependent (SD) Results Results
Word Error Rates (WERs)
for SD / SI recognition
DYS1 DYS2 REF1 REF2
NUM 2.6 / 15.4 0.0 / 41.0 0.0 / 0.0 0.0 / 0.0
PFU 9.9 / 19.8 5.5 / 22.0 1.1 / 1.1 2.2 / 1.1
PMS 12.2 / 30.3 3.3 / 15.2 2.2 / 2.1 3.6 / 1.7
PRS 3.6 / 7.4 1.5 / 4.5 1.2 / 1.2 1.2 / 0.0
02-09-2005, Antwerpen
SPACE symposium 29
Speaker Dependent (SD)Speaker Dependent (SD)ConclusionsConclusions
For REF results for SD equal or worse than for SI (counterbalance between own models, but less training material)
For DYS results for SD much better than for SI
DYS2 better than DYS1, almost as good as REF
02-09-2005, Antwerpen
SPACE symposium 30
ConclusionsConclusionsASR for dysarthric speechASR for dysarthric speech
Results for DYS2 are remarkable SI: High WERs, esp. for NUM & PFU SD: sometimes better than REF Low speaking rate!
Automatic recognition of dysarthric speech is possible. Better results: Lower speaking rate Speaker dependent models
Even better: also speaker dependent lexicon
02-09-2005, Antwerpen
SPACE symposium 31
ConclusionsConclusionsST & pathologyST & pathology
Applications: Many already exist Many more are possible
Needed: Tailor made, flexible applications User tests, adequacy evaluation
02-09-2005, Antwerpen
SPACE symposium 32
ReferencesReferences
http://lands.let.ru.nl/TSpublic/strik/pres/p97-SPACE.ppt
E. Sanders, M. Ruiter, L. Beijer, H. Strik (2002) Automatic recognition of dutch dysarthric speech: A pilot study. ICSLP-2002, Denver, USA, pp. 661-664.
T. Rietveld & I. Stolte (2005)Taal- en spraaktechnologie en communicatieve beperkingen
02-09-2005, Antwerpen
SPACE symposium 33
top related