Can speech technology be useful for people with dysarthria? Speech technology & pathology

Can speech technology be usefulCan speech technology be usefulfor people with dysarthria?for people with dysarthria?

Speech technology & pathologySpeech technology & pathology

Helmer StrikLanguage & SpeechDept. of LinguisticsRadboud University

Nijmegen

02-09-2005, Antwerpen

SPACE symposium 2

OutlineOutline

Speech technology & pathology Applications: existing, possible In practice Target groups

Speech technology & dysarthria Introduction Speech recognition for dysarthric speech

Conclusions

SPACE symposium 3

ApplicationsApplications

AAC (Augmentative & Alternative Communication):

Improve communication Interactive tools:

Training, reading, listening Assessment:

Diagnosis, monitoring Therapy

SPACE symposium 4

AACAAC

Speaking problems Speech generation Speech manipulation Speech recognition (of handicapped)

+ output (text, speech, talking head, etc.)

Hearing problems Hearing aids, cochlear implants, etc. Speech recognition (of others)

+ output (text,sign language, talking head, etc.)

SPACE symposium 5

ASR & output channel

speechsynthesis

SPACE symposium 6

Interactive toolsInteractive tools

Speech generation Reading tools: screen readers, reading pen, text

processors, etc. Writing tools: word prediction, TTS, (dedicated)

spell checking Analysis, manipulation, training

Delayed Auditory Feedback (DAF) and Frequency Altered Feedback (FAF), for stutterers

CAFET: Computer-Aided Fluency Establishment Training

CAPT: Computer Assisted Pronunciation Training

SPACE symposium 7

Delayed Auditory Feedback (DAF) Delayed Auditory Feedback (DAF) Frequency Altered Feedback (FAF)Frequency Altered Feedback (FAF)

SPACE symposium 8

Assessment, therapyAssessment, therapy

Assessment: diagnosis, monitoring TherapyClinical setting, with expert

Speech analysis + visualization, categorization, etc.

IBM speech viewer … Research

SPACE symposium 9

ApplicationsApplications

Amount of applications differs

(from most to fewest): speech generation speech analysis, manipulation, etc. speech recognition

SPACE symposium 10

In practiceIn practice

Many existing applications

Many more are possible

However, relatively little use

SPACE symposium 11

In practiceIn practice

However, relatively little use. Why?

Needed: Tailor made, flexible applications

Tailor made: taking into account the capabilities & desires of the user + environment

Flexible: the capabilities & desires often change

More user tests & adequacy evaluation

instead of technology improvement & performance evaluation

SPACE symposium 12

Target groupsTarget groups

International Classification of Functioning, Disability and Health (ICF):

Mental functions: aphasia, dyslexia, mental disabilities

Sensory functions: blindness, deafness, both Voice & speech functions: dysarthria,

anarthria, mutism, stuttering Motorial functions: dyspraxia, apraxia, RSI /

UEMSD (Upper Extremity Musculoskeletal Disorders)

SPACE symposium 13

Speech technology & dysarthriaSpeech technology & dysarthria

Dysarthria: speech disorder caused by dysfunctioning of nerves and muscles

Many different kinds of dysarthria

SPACE symposium 14

Can speech technology be useful for Can speech technology be useful for people with dysarthria?people with dysarthria?

AAC Interactive tools Assessment Therapy

SPACE symposium 15

Can speech technology be useful for Can speech technology be useful for people with dysarthria?people with dysarthria?

Speech generation

Prefer voice similar to their (old) voice

Preferably: own voice

AAC Manipulation Speech recognition + output channel Pronunciation training:

Speech recognition, analysis, feedback, etc.

SPACE symposium 16

Speech technology & dysarthriaSpeech technology & dysarthria ASR for dysarthric speech ASR for dysarthric speech

Questions:

How well can dysarthric speech be recognized by a standard (“non-dysarthric”) speech recognizer?

Will the recognition results improve if we train the recognizer on speech of dysarthric speakers?

SPACE symposium 17

Experimental setupExperimental setup SpeakersSpeakers

Dysarthric: 2 Dutch males, DYS1 & DYS2Reference: 2 Dutch males, REF1 & REF2

Total duration of the speech material (minutes)

DYS 2: speaks more slowly

DYS1 DYS2 REF1 REF2

8.5 min. 12.8 min. 9.1 min. 7.9 min.

SPACE symposium 18

Experimental setupExperimental setup Speech tasksSpeech tasks

All four speakers read the same list of items, consisting of four different tasks:

1. NUM: numbers 0-12 spoken in isolation

2. PFU: from Polyphone the 50 most Frequent Utterances

3. PMS: 130 Plomp-Mimpen Sentences (semantically unpredictable)

4. PRS: 10 Phonetically Rich Sentences

SPACE symposium 19

Experimental setupExperimental setup Speech tasksSpeech tasks

Number of utterances & words per task

The NUM and PRS task were both read three times.

NUM PFU PMS PRS

# utt. 39 50 130 30

# words 39 91 809 336

SPACE symposium 20

Experimental setupExperimental setup Speech recognizerSpeech recognizer

General specifications Standard phone based recognizer 37 context independent phones 3-state HMM’s 14 cepstral coeffiecients + delta’s from

Melbank freq 350-3400 Hz 16ms Hamming window, 10 ms step

SPACE symposium 21

Experimental setupExperimental setup ExperimentsExperiments

Lexicon & language model (uni- and bigram)

Based on all words in 4 tasks

Task specific & same for all speakers

Perplexity

NUM PFU PMS PRS

13 15 8 2

SPACE symposium 22

Experimental setupExperimental setup Speaker Indep. & DependentSpeaker Indep. & Dependent

SI: Speaker Independent training material

Polyphone (5000+ speaker Dutch telephone database) 4022 connected digit strings 3702 polyphone most frequent items 20,110 phonetically rich sentences

SD: Speaker Dependent training material

Speakers own speech

SPACE symposium 23

Speaker Independent (SI)Speaker Independent (SI) Results Results

Word Error Rates (WERs) for SI recognition

2,1 1,20

1,1 1,70

NUM PFU PMS PRS

SPACE symposium 24

Speaker Independent (SI)Speaker Independent (SI)ConclusionsConclusions

REF better than DYS DYS1 better than DYS2 in short utterances

because of speaking rate (table 1) Results DYS quite reasonable (especially for

sentences) because of tight language model

SPACE symposium 25

Speaker Dependent (SD)Speaker Dependent (SD)

= semi randomly selected test set

= rest = training set

Models (also) trained on speech of speakers

Jackknife procedure

SPACE symposium 26

Speaker Dependent (SD)Speaker Dependent (SD) Results Results

Word Error Rates (WERs) for the whole test set

for different number of Gaussians (2N)

2N 0 2 4 8 16 32 64

DYS1 14.3 12.0 9.5 9.7 10.3 11.7 15.1

DYS2 7.5 4.1 2.9 2.4 3.0 3.8 5.3

REF1 3.4 2.2 1.8 2.6 3.5 4.0 4.2

REF2 3.6 2.4 2.8 3.0 3.3 3.9 4.4

SPACE symposium 27

NUM PFU PMS PRS

Word Error Rates (WERs) for SD recognition

SPACE symposium 28

Word Error Rates (WERs)

for SD / SI recognition

DYS1 DYS2 REF1 REF2

NUM 2.6 / 15.4 0.0 / 41.0 0.0 / 0.0 0.0 / 0.0

PFU 9.9 / 19.8 5.5 / 22.0 1.1 / 1.1 2.2 / 1.1

PMS 12.2 / 30.3 3.3 / 15.2 2.2 / 2.1 3.6 / 1.7

PRS 3.6 / 7.4 1.5 / 4.5 1.2 / 1.2 1.2 / 0.0

SPACE symposium 29

Speaker Dependent (SD)Speaker Dependent (SD)ConclusionsConclusions

For REF results for SD equal or worse than for SI (counterbalance between own models, but less training material)

For DYS results for SD much better than for SI

DYS2 better than DYS1, almost as good as REF

SPACE symposium 30

ConclusionsConclusionsASR for dysarthric speechASR for dysarthric speech

Results for DYS2 are remarkable SI: High WERs, esp. for NUM & PFU SD: sometimes better than REF Low speaking rate!

Automatic recognition of dysarthric speech is possible. Better results: Lower speaking rate Speaker dependent models

Even better: also speaker dependent lexicon

SPACE symposium 31

ConclusionsConclusionsST & pathologyST & pathology

Applications: Many already exist Many more are possible

Needed: Tailor made, flexible applications User tests, adequacy evaluation

SPACE symposium 32

ReferencesReferences

http://lands.let.ru.nl/TSpublic/strik/pres/p97-SPACE.ppt

E. Sanders, M. Ruiter, L. Beijer, H. Strik (2002) Automatic recognition of dutch dysarthric speech: A pilot study. ICSLP-2002, Denver, USA, pp. 661-664.

T. Rietveld & I. Stolte (2005)Taal- en spraaktechnologie en communicatieve beperkingen

SPACE symposium 33

Can speech technology be useful for people with dysarthria? Speech technology & pathology

speech disorder

bothvoice speech functions

recognition results

mental functions

stutteringmotorial functions

dysarthric speechquestions

communicationinteractive

writing tools

Documents

Practice Guidelines for Dysarthria

Articulatory deficits Parkinsonian dysarthria: acoustic...

MedSchool.NeuroCom2012€¦ · motor speech disorders ........

Speech synthesis technology

Speech Recognition Technology

Chapter 6: Motor Speech Disorders: Apraxia and...

Cortical dysarthria and dysprosody of speech · If...

Chapter 12: Motor speech disorders: apraxia and dysarthria

Rhythmic performance in hypokinetic dysarthria Relationship....

A Lady with Dysarthria

Language & Speech Technology

Evidence-Based Practice Guidelines for Dysarthria ... · PDF...

Speech Technology Part I : Automatic Speech Recognition ·....

Neuroanatomic pathways Symptoms suggesting...

Comparison of Childhood Apraxia of Speech, Dysarthria …...

Effect of Speech Task on Intelligibility in Dysarthria: A...