Top Banner
Adaptation of orofacial clones to the morphology and control strategies of target speakers for speech articulation Julián Andrés VALDÉS VARGAS Jury: Michel DESVIGNES (President) Yves LAPRIE (Reviewer) Rudolph SOCK (Reviewer) Thierry LEGOU (Examiner) Pierre BADIN (Thesis Director) 1
58

Adaptation of orofacial clones to the morphology and control strategies

Jan 03, 2016

Download

Documents

yeo-sexton

Adaptation of orofacial clones to the morphology and control strategies of target speakers for speech articulation. Julián Andrés VALDÉS VARGAS Jury: Michel DESVIGNES (President) Yves LAPRIE (Reviewer) Rudolph SOCK (Reviewer) Thierry LEGOU (Examiner) Pierre BADIN (Thesis Director). 1. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptation of orofacial clones  to the morphology and control strategies

Adaptation of orofacial clones to the morphology and control strategies of target speakers for speech articulation

Julián Andrés VALDÉS VARGAS

Jury:

Michel DESVIGNES (President)

Yves LAPRIE (Reviewer)

Rudolph SOCK (Reviewer)

Thierry LEGOU (Examiner)

Pierre BADIN (Thesis Director)

1

Page 2: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Context of visual articulatory feedback

• Articulatory data

• Individual models and characterisation

• Multi-speaker models

• Conclusions and perspectives

2

Summary

Page 3: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Context of visual articulatory feedback

• Articulatory data

• Individual models and characterisation

• Multi-speaker models

• Conclusions and perspectives

3

Summary

Page 4: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Context• Mastery of articulators for speech production

• Skill maintained/improved by Perception-action loop (Matthies et al., 1996)

• Feedback in speech– Auditory

– proprioceptive

4

Page 5: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Vision of articulators• Augmented speech Visual feedback

– Display of articulators

• Vision of lips and face– Improves speech intelligibility (Sumby and Pollack, 1954)

– Speech imitation is faster (Fowler et al., 2003)

• Vision of hidden articulations– Increases intelligibility (Badin et al.,2010)

5

Page 6: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Visual articulatory feedback system• System of visual articulatory feedback (Ben Youssef et al.,

2011)

• Applications– Speech rehabilitation– Computer Aided Pronunciation Training (CAPT)

6

Speech sound

signal of a given

speaker

Visual articulatory feedback system

Clone’s animation

Page 7: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Problem of articulatory adaptation• Animation of clone based on a single speaker

• Adaptation to several speakers

7

Speech sound

speaker 1

Visual articulatory feedback system

Speech sound

speaker 2

Speech sound

speaker n

Animation based onreference speakerMismatch between

clone’s animation and real speakers

Acoustic Adaptation

(Atef BEN YOUSSEF) Articulatory

adaptation

Animation based onentry speaker

Page 8: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Morphology – Different vocal tracts

• Size, vertical / horizontal lengths ratios• Shape (e.g. concave / flat palates)

• Articulatory control strategies– Cope with morphology different articulatory strategies to achieve sounds

considered equivalent for speech communication purposes

8

Inter-speaker variability

Page 9: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Illustration of speaker differences

/a/

/i/

/u/

Speaker PB Speaker AA Speaker YL

9

Page 10: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab10

Objectives• Articulatory adaptation (Initial objective)

normalization: extraction of common components (patterns) to control the articulators of several speakers.

• To acquire knowledge about inter-speaker variability

Page 11: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Context of visual articulatory feedback

• Articulatory data

• Individual models and characterisation

• Multi-speaker models

• Conclusions and perspectives

11

Summary

Page 12: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Articulatory data• Type of data Articulatory data Building

articulatory models• Inter-speaker variability:

• 11 French speakers (6 males and 5 females)

• Articulatory phonetic coverage: • 13 vowels• 10 consonants in 5 vocalic contexts

(vowel-consonant-vowel) • 63 articulations in total

12

Page 13: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Recording Methods• Several recording methods considered:

• X-ray (Meyer (1907) ,Mosher (1927))

• Difficult to accurately identify the contours

• Electro-Magnetic Articulography (EMA)• No recording of the whole vocal tract

• Magnetic Resonance Imaging (MRI)

(Rokkaku et al., 1986)

• Tomographic (imaging by sections)

• Maintained vocal tract positions

• Speakers in supine position Gravitational effect is moderate

(Engwall (2003; 2006) )

13

Page 14: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Decision to use MRI • Whole vocal tract information ≠ EMA

• Contours easier to identify compared to X-ray

• No health hazard compared to X-ray

• Recording parameters:• Midsagittal image of the vocal tract

• Slice thickness: 4 mm

• Spatial resolution: 1 mm / pixel

• Acquisition time: 8 -16 seconds

14

Page 15: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

MRI Recording• The speaker is asked to go through several stages

• Speakers lay in supine position

• Bed shifted into the MRI machine

• Setting up of alignment recording properties

• Maintained pronunciation of articulations for 8-16 seconds.

• Speakers are asked not to move

their heads

15

Page 16: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Processing of MRI• Midsagittal contours manually edited

16

• Rigid contours are drawn once for a given speaker• Positioning of palate using skull bones as reference• Rotation and translation

• Positioning of jaw by means of rototranslations• Edition of deformable contours: Lips, tongue, velum, etc.• Palate of all articulations are aligned• Avoidance of noise introduced by head moving

/a/ /i/ /u/

Page 17: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Contours modelled• Upper tongue: 150 (x,y) points• Lips: 100 (x,y) points• Velum: 150 (x,y) points

17

• Static data Articulatory study/models

Page 18: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Context of visual articulatory feedback

• Articulatory data

• Individual models and characterisation

• Multi-speaker models

• Conclusions and perspectives

18

Summary

Page 19: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Universal control parameters• Extraction of common set of patterns (components)

• Goals:– Building individual-speaker articulatory models

– Controlling all individual articulatory models from a universal set of components

19

UniversalSet of

Components

Speaker 1

Speaker 2

/a//i//u/

/a//i//u/

/a//i//u/

/a//i//u/

Articulator contours of individual

speakers

Universal model

Universal model

Speaker specificweights

Speaker specificweights

CP/a/CP /i/CP/u/

CP/a/CP /i/CP/u/

CP/a/CP/i/CP/u/

CP/a/CP/i/CP/u/

Components

Mspeaker1Mspeaker1Speaker 1

Speaker 2 Mspeaker2Mspeaker2

/a//i//u/

/a//i//u/

/a//i//u/

/a//i//u/

Articulator contours of individual

speakers

CP/a/CP/i/CP/u/

CP/a/CP/i/CP/u/

Individual articulatory

models

Page 20: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Method for individual models of speakers

• Principal component analysis (PCA)• dimensionality reduction extraction of orthogonal components

20

Page 21: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Evaluation of model for a individual speaker X• Variance explanation

• Root Mean Square Error (RMSE)

21

Assessment of models

Page 22: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Performance of models to reconstruct data that was not used for training

• Leave-one-out cross validation procedure (a.k.a. Jackknife)

• Observation left out Reconstruction of observation left out by inverting the model

Validation of generalization properties

Valuable predictors retained

22

Generalization properties of models

Page 23: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Guided PCA model (Badin & Serrurier (2006))

• 4 components extracted

23

Individual tongue models• First component extracted by Linear regression

• Jaw Height (predictor)

Three degrees of freedom: x,y translation and rotation (Edwards & Harris, 1990)

Normalized value of the y-coordinate of the lower incisor (Badin & Serrurier (2006))

(X,Y)

Corr(Y, θ) ≈ 0.92

Page 24: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab24

Individual tongue models• Other 3 components extracted by PCA from the

residue:• Tongue Body (TB)

• Tongue Dorsum (TD)

• Tongue Tip (TT)

Page 25: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab25

Individual tongue models• Other 3 components extracted by PCA from the

residue:• Tongue Body (TB)

• Tongue Dorsum (TD)

• Tongue Tip (TT)

Page 26: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab26

Individual tongue models• Other 3 components extracted by PCA from the

residue:• Tongue Body (TB)

• Tongue Dorsum (TD)

• Tongue Tip (TT)

Page 27: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Speakers

Per

cent

age

of v

aria

nce

expl

aine

d by

eac

h co

mpo

nent

%

JH

PB YL LH RL LD BR HL AA MG AK MGO0

5

10

15

20

25

30

JH 4.23%Subject AK TB 36.68%

TD 23.39% TT 16.92%

JH 4.31%Subject RL TB 41.41%

TD 22.36% TT 13.70%

JH 26.11%Subject LD TB 29.40%

TD 20.02% TT 12.06%

27

Comparison between components

• JH component:• Max. variance: LD• Min. variance: RL, MG, AK• Compensation strategy of MG

0 50 100 150-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6comparison of slopes

BACK

Co

eff

icie

nts

of

LR

: J

H v

s. t

on

gu

e v

ert

ex

TONGUE TIP

RL

LDMG

AK

• TB component:• Represents more variance

than other components• Horizontal/diagonal back-front

movement

JH 25.28%Subject LD TB 31.47%

TD 20.07% TT 10.79%

JH 4.31%Subject RL TB 41.41%

TD 22.36% TT 13.70%

JH 6.21%Subject AK TB 35.92%

TD 22.73% TT 16.65%

Speaker LD Speaker RL Speaker AK

JH 25.28%Subject LD TB 31.47%

TD 20.07% TT 10.79%

JH 4.31%Subject RL TB 41.41%

TD 22.36% TT 13.70%

JH 6.21%Subject AK TB 35.92%

TD 22.73% TT 16.65%

• TD component:• vertical/diagonal arching

movement

• TT component:• Used in different proportion

according to the speaker

JH 25.28%Subject LD TB 31.47%

TD 20.07% TT 10.79%

JH 4.31%Subject RL TB 41.41%

TD 22.36% TT 13.70%

JH 6.21%Subject AK TB 35.92%

TD 22.73% TT 16.65%

Y-Tongue = Coefficients_LR * JH• Nomograms: graphical representation of components

• Variation between -3 to 3

Page 28: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab28

3 4 5 6 7

7

8

9

10

11

12

13var(UL): 1.68% - var(LL): 30.90%

JHJHJHJHJHJHJHJHJHJHJHJHJH

7

8

9

10

11

12

13var(ULP): 21.93% - var(LLP): 34.85%

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

var(ULH): 55.03% - var(LLH): 20.51%

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

Speaker RL

Individual lips models

3 4 5 6 7

7

8

9

10

11

12

13var(UL): 25.19% - var(LL): 44.59%

JHJHJHJHJHJHJHJHJHJHJHJHJH

7

8

9

10

11

12

13var(ULP): 52.74% - var(LLP): 28.64%

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

ULP

LLP

var(ULH): 12.75% - var(LLH): 15.36%

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

ULH

LLH

Speaker LD• 3 components extracted by

Guided PCA model (Badin et al., 2012)

• Jaw Height• More influence on LL than UL

• Little influence on UL for RL

• Protrusion• ULP > LLP for speaker LD

• LLP > ULP for speaker RL

• Lip height• ULH > LLH for all speakers

Except for speaker LD

25.2%

44.6%

52.7%

28.6%

12.7%

15.4%

1.7%

31%

21.9%

34.8%

55%

20.5%

Page 29: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• 2 components extracted by PCA (Serrurier & Badin, 2008):

• Velum levator (Oblique movement) - VL

• Superior pharyngeal constrictor (horizontal movement) - VS

29

Individual velum models

4 6 8 10 12 144

5

6

7

8

9

10

11

12

13

14

15var(PCA-1): 77.67 %

PCA-1

Speaker AA

4

5

6

7

8

9

10

11

12

13

14

15var(PCA-2): 17.32 %

PCA-2

VL VS

Page 30: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab30

Individual velum models: consonant /ʁ/

Speaker AA Speaker HL

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-8

-6

-4

-2

0

2

4

kEka ke kiku

tEta

te titu

E Oaan eX oon

xuy i

RERaRe

Ri

Ru

SESa

Se

SiSufEfafefi fulE

lalelilumE

ma memi

mu

nEna neni

nu

pE papepi

pu

sE sasesisu

PCA-1 vs PCA-2 for speaker aa

PCA-1

PC

A-2

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-8

-6

-4

-2

0

2

4

kE

ka

ke kiku tE

ta

te

tituE

O

aane

X

o

on

x

u

yi

RE

RaReRi

RuSE

Sa

Se

SiSufEfa

fefifulEla

leli

lu

mE

mame

mimunE

na

ne

ninu

pE

pa

pepi pu

sE

sa

se

si

su

PCA-1 vs PCA-2 for speaker hl

PCA-1

PC

A-2

/ʁa/

VL VL

VS VS

Page 31: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab31

Conclusions: individual models• Tongue PCA models: 4 components

(JH,TB,TD,TT)• Variance Explained: 93%, RMSE: 0.13 cm

• Lip models: 3 components (JH, Protrusion, Height)• Variance Explained: 94%, RMSE: 0.04 cm

• Velum models: 2 components (VL, VS)• Variance Explained: 90%, RMSE: 0.08 cm

Page 32: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Context of visual articulatory feedback

• Articulatory data

• Individual models and characterisation

• Multi-speaker models

• Conclusions and perspectives

32

Summary

Page 33: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab33

Literature on multi-speaker models

• PARAFAC models : 2 components extracted

• Studies based on EMA (Hoole(1998), Geng(2000), Hu(2006))• 6-7 speakers, 10-15 vowels, 3-4 sensors on the

tongue, 80%-96% variance explained.

• Study based on X-ray: Harshman(1977)

• 5 speakers, 10 vowels, 13 points, 92.7%

• Studies based on MRI (Hoole(2000), Zheng(2003), Ananth(2010))

• 3-9 speakers, 7-13 vowels, 13-150 points, 71%-87% of variance exp.

Page 34: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Multi-speaker decomposition methods Extraction of common set of components PARAFAC (Harshman,1970) (three-way factor

analysis, diagonal speaker adaptation matrix)

34

Page 35: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

TUCKER 3 Extension of PARAFAC Decomposition in all modes of variation

35

Multi-speaker decomposition methods

Page 36: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Joint PCA (two-way analysis adapted to multi-speaker models) (Ananthakrishnan et al. (2010) – KTH(Sweden))

All speakers articulatory measurements for one phoneme considered as one set of data

forces common components

36

Multi-speaker decomposition methods

Page 37: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• RMSE and Variance Explained (VarEx)• multi-speaker model (red, green, black) vs.

• average of individual speakers’ models (blue)

VarEx RMSE

2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number Of Components

Per

cent

age

Var

ianc

e E

xpla

ined

Average Variance explained of methods

Average PCA

Joint PCA

PARAFAC

TUCKER

2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number Of Components

RM

S E

rror

in c

m

Average Rmse of methods

Average PCA

Joint PCAPARAFAC

TUCKER

Comparison of performance between methods

37

Page 38: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Reference PCA model with 4 components

• Total number of components: 11 x 4 = 44

• Student's t-test for RMSE at 5% signif. level

• Joint PCA: 14 – 21 components ( TUCKER )

• PARAFAC: 21 components

VarEx RMSE

2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number Of Components

Per

cent

age

Var

ianc

e E

xpla

ined

Average Variance explained of methods

Average PCA

Joint PCA

PARAFAC

TUCKER

2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number Of Components

RM

S E

rror

in c

m

Average Rmse of methods

Average PCA

Joint PCAPARAFAC

TUCKER

Multi-speaker Tongue models

38

• Student's t-test -> determine if the RMSE of models are significantly different from each other

Page 39: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Individual models:

• Reference PCA model with 44 (11 x 4) components• VarEx: 93.23 %

• RMSE: 0.13 cm

• Multi-speaker models:

• Joint PCA with 4 components• VarEx: 72.16 %

• RMSE: 0.27 cm

• Interpretation of components: JH, TB, TD and TT

• Equivalent solution: Joint PCA, 21 components• VarEx: 94.88%

• RMSE: 0.12 cm

• Lack of interpretation from the 5th component

Literature

No. Components: 2VarExp: 71% - 96%Corpus: 7-15 vowelsSpeakers: 3-9

Present studyCorpus: 63 articulations (vowels and consonants)Speakers: 11 speakers

Multi-speaker Tongue models

39

Page 40: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Multi-speaker modelslips and velum• Lips and velum models comparable with tongue models

• Lips

individual models: 33 components (3 * 11)

multi-speaker joint PCA models: equivalent with 21 components

Reduced no. of components: 3 interpretable components

(JH, protrusion, lip height)

• Velum

individual models: 22 components (2 * 11)

multi-speaker joint PCA models: equivalent with 14 components

Reduced no. of components: 2 components

(Oblique, horizontal)

40

Page 41: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Context of visual articulatory feedback

• Articulatory data

• Individual models and characterisation

• Multi-speaker models

• Conclusions and perspectives

41

Summary

Page 42: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Conclusions Data

Unique set of articulatory data for French MRI for the whole vocal tract for 11 French speakers Contours Vowels and consonants More speakers compared to the literature

Characterisation of different speakers’ strategies Tongue Upper and lower lip Velum

Multi-speaker models (normalisation) of tongue, lips and velum contours

No work in the literature on lips and velum

42

Page 43: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Perspectives

43

More speakers Relation between articulatory strategies and acoustics Cross-speaker velum variability

Influence of the tongue movement Nasality

new modelling solutions Non-linear methods:

Kernel PCA Artificial Neural Networks (ANN) Support Vector Machines (SVM)

Page 44: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Acknowledgments Laurent Lamalle (IRMaGe, Grenoble) Speakers ARTIS project (GIPSA-lab, LORIA)

43

Page 45: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Thank you for your attention

Questions?

44

Page 46: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Maeda S. (1979) Fix grid

• Busset J.(2013) : Adaptive grid system Euclidean coordinates (intersections) Distances and extreme angles Polar coordinates (distances and angles for each grid line)

• Beautemps et al. (2001): adapted to each articulation

Euclidean coordinates

Distances and TngAdv + TngBot

46

Grid system

Page 47: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

PB = 0.6611

YL = 0.7385

LH = 0.7174

RL = 0.3946

LD = 0.8423

BR = 0.7764

HL = 0.7913

AA = 0.4952

MG = 0.4151

AK = 0.8317

MGO = 0.9228

47

Corr(Y-jaw,Angle_rotation)

(X,Y)

Page 48: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Grid system

Midsagittal function vocal tract area function (series of areas and lengths of

each sagittal section) α , β models (Beautemps et al.1995; Heinz & Stevens, 1965)

A = Area of a given grid section, d = midsagittal distance

α , β coefficients depending on subject and vocal tract location

α , β according to speaker of reference: PB

vocal tract acoustic transfer function (Fant, 1960; Badin & Fant, 1984)

Formants

48

Acoustic simulation

d A

Page 49: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab49

No. Coefficients by method

2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

3.5

4x 10

5

No. of components

No.

of

coef

ficie

nts

PCA

PARAFACJoint PCA

TUCKER

Page 50: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

“Essentially, all models are wrong, but some are useful“

George Edward Pelham Box

50

Page 51: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Joint PCA (two-way analysis adapted to multi-speaker models) (Ananthakrishnan et al. (2010) – KTH(Sweden))

All speakers articulatory measurements for one phoneme considered as one set of data

forces common components

51

Multi-speaker decomposition methods

Page 52: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab52

Generalisation

Page 53: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Estimation of non visible landmarks (Tongue tip and jaw attachment)

• Computed as the average position of the articulations in which is distinguishable

53

Articulatory data

Not distinguishable tongue tip Not distinguishable jaw attachment

Page 54: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

State of the art on articulatory normalisation

• Articulatory normalisation based on linear decomposition methods• PARAFAC tongue models, 2 components extracted

• Data: 7 – 15 vowels, 3 – 9 speakers

• Performance: 71% - 96% of variance explanation

• Geometric normalisation• Scaling transformations -> do not normalise articulatory control

strategies employed by different speakers

• Challenge• Modelling of other contours such as lips and velum

• Extension to consonants54

Page 55: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Linear regression between couple of speakers

2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Per

cent

age

varia

nce

expl

aine

d

Number of components

Variance explanation of prediction of speaker pb

PCA model of pbPrediction from yl

Prediction from lh

Prediction from rl

Prediction from ld

Prediction from brPrediction from hl

Prediction from aa

Prediction from mg

Prediction from akPrediction from mgo

2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

RM

S E

rror

in c

m

Number of components

RMSE of prediction of speaker pb

• Prediction of PCA control parameters of a target speaker (πTS) from PCA control parameters of a source speaker (πSS) Multi-linear Regression

TS SSi

n

cmpi

1

VarEx RMSE

• Overfitted from 10th component on LOOCV

• 10th components 64.32 % variance explained, 0.37 cm (RMSE)

55

Page 56: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Speakers

Per

cent

age

of v

aria

nce

expl

aine

d by

eac

h co

mpo

nent

%

JH

TB

TD

TT

PB YL LH RL LD BR HL AA MG AK MGO0

10

20

30

40

50

60

70

80

90

56

Individual tongue modelsJH 25.28%Subject LD TB 31.47%

TD 20.07% TT 10.79%

JH 4.21%Subject RL TB 22.84%

TD 41.23% TT 13.50%

JH 6.21%Subject AK TB 35.92%

TD 22.73% TT 16.65%

1 1 1

1

1

Individual tongue models: Synergy jaw-tongue

Max Min ~= speakers RL, MG,AK

0 50 100 150-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6comparison of slopes

BACK

Co

eff

icie

nts

of

LR

: J

H v

s. t

on

gu

e v

ert

ex

TONGUE TIP

rl

ldmg

ak

Y-coordinate tongue contour

Page 57: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

• Evaluation of model for a individual speaker X• Variance explanation

• Root Mean Square Error (RMSE)

Xp = speaker data predicted, n = number of observations , m = number of articulator measurements

57

mn

n miX

iX

XVARIANCE.

2)1 1 ()(

)(

)(),(_

XVARIANCEp

XVARIANCE

pXXEXPLAINEDVARIANCE

mn

n mpredictedi

XiX

RMSE.

2)1 1 _(

Assessment of models

Page 58: Adaptation of orofacial clones  to the morphology and control strategies

gipsa-lab

Multi-speaker modelslips and velum• Lips and velum models comparable with tongue

models• Lips

individual models: 33 components (3 * 11)

multi-speaker joint PCA models: 21 components

Reduced no. of components: 3 interpretable components

• Velum

individual models: 22 components (2 * 11)

multi-speaker joint PCA models: 14 components

Reduced no. of components: 2 components

58

Contour

Average PCA Joint PCA according to Student's t-test Joint PCA with reduced no. of components

No. Components

Variance Exp.

RMSENo.

ComponentsVariance Exp. RMSE

No. Components

Variance Exp. RMSE

Upper tongue 44 (4 *11) 93.23% 0.13 cm 21 94.88% 0.12 cm 4 72.16% 0.27 cm

Upper lip 33 (3*11) 94.89% 0.03 cm 21 96.67% 0.03 cm 3 74.28% 0.08 cmLower lip 33 (3*11) 94.50% 0.05 cm 21 96.85% 0.04 cm 3 69.26% 0.15 cm

Velum 22(2*11) 90% 0.08 cm 14 94.20% 0.07 cm 2 76.01% 0.14 cm