Top Banner
1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech Processing Ludwig-Maximilians-Universität München, Germany Special Thanks to: Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, RaOLG Tourneur
30

1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

Dec 28, 2015

Download

Documents

Darleen Horton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

1VIU Seminar 14. - 17. April 2009

Alcoholized Speech:F0 and Rhythm

Florian Schiel Bavarian Archive for Speech Signals

Institute of Phonetics and Speech ProcessingLudwig-Maximilians-Universität München, Germany

Special Thanks to:Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, RaOLG Tourneur

Page 2: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

2VIU Seminar 14. - 17. April 2009

Overview

Motivation, Goals and Earlier Work ALC Corpus F0 Analysis Rhythm Analysis Discussion: Prosodic Features for ALC

Motivation

Page 3: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

3VIU Seminar 14. - 17. April 2009

Why is alcoholized speech interesting? Phonetic Forensics:

Speaker identification from alcoholized speech samples

Determine alcoholization from air traffic recordings (for example Exxon Valdez crash in 1987)

Traffic accidents: determine alcoholization from in-car recordings, if blood samples are not available

Motivation

Page 4: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

4VIU Seminar 14. - 17. April 2009

Why is alcoholized speech interesting? Speech Production:

How does intoxication influences planing and motor control?

Speech Perception:

Can listeners judge the alcoholisation from a speech sample? Which features do listeners use for their judgement?

Motivation

Page 5: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

5VIU Seminar 14. - 17. April 2009

Why is alcoholized speech interesting? Traffic security

Can a voice controlled car judge the alcoholization of its driver (and then take measures)?

OnFocus / OffFocus}

Motivation

Page 6: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

6VIU Seminar 14. - 17. April 2009

What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features (10) Recognition (2)

Motivation

Common problems: mostly male speakers number of speakers is low (<40), statistics not valid intoxication measured by breath alcohol concentration (BRAC) lab speech ('Northwind and the Sun' etc.) results partly contradictory

Page 7: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

7VIU Seminar 14. - 17. April 2009

What features have been investigated? F0 parameters formant Parameters RMS / Loudness spectral tilt of signal or source signal speech rate parameters pause length, number mispronunciations: deletions, insertions, repairs, stutter errors in phonetic gestures - incomplete gestures (measurement?) - lateralisation /r/ -> /l/ (measurement?) - shift of place /s/ -> /S/ or /s/ -> /T/ - nasalisation, de-nasalization (?)

Motivation

Page 8: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

8VIU Seminar 14. - 17. April 2009

... and what has not been investigated? dysfluencies centralisation of vowels rhythm prosodic contours

Motivation

female speech 'outside the lab' speech command & control speech dialogue speech statistically valid data (>100 speakers, > 2 Mio phonemes

.... so lets do it! (Yes, we can!)

Page 9: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

9VIU Seminar 14. - 17. April 2009

Our goals: verify/falsify reported findings on a larger database check for rhythm parameters check for prosodic contours (with Uwe's help?) check for centralization of vowels check for 'linguistic irregularities' check for gender / age / speech type influences check on sober control group preception experiments: what features are important?

Motivation

Help wanted!

Page 10: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

10VIU Seminar 14. - 17. April 2009

alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration : 0.05 – 0.2% breath sample (BRAC) and blood sample test (BAC) 15 minutes recording in two cars, SpeechRecorder, 2 mics read, monologue, dialogue, command&control (with engine) annotation SpeechDat extended by Verbmobil tags export into BAS Partitur Format, canonical pronunciation by BALLOON, MAUS segmentation import into Emu hierachy, F0, formants, RMS analysis using R

ALC Corpus

Page 11: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

11VIU Seminar 14. - 17. April 2009

ALC Corpus

Page 12: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

12VIU Seminar 14. - 17. April 2009

ALC Corpus

Page 13: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

13VIU Seminar 14. - 17. April 2009

ALC Corpus

Examples

Page 14: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

14VIU Seminar 14. - 17. April 2009

ALC Corpus

Nov 2007 | 2008 | 2009 |

First recordings 14 speakersrecorded

LREC 2008

First contactwith Legal Medicine

61 speakersrecorded

82 speakersrecorded

First F0 Analysis

Rhythm features Analysis

Analysis of irregularities

First perception tests

150 speakersrecorded

DFG application

Time line and estimates

75 female + 75 male speakers age 22 – 75 BAC 0.00 - 0.20%

Page 15: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

15VIU Seminar 14. - 17. April 2009

ALC Corpus

Problems 2nd sober recording : loss rate of 20%

MAUS segmentation of dialogues unreliable solution : pre-segmentation into speaker and non- speaker parts, then MAUS on each speaker part

gender balance: we need more male speakers

age balance: very few speakers above 50

Page 16: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

16VIU Seminar 14. - 17. April 2009

Analysis

RM-ANOVA requires one measurement per speaker and within-factor combination.

between-factors: sex, age, (drinking habits)within-factors: alc, speech type, (content, car noise)

Definition:utterance group (UG) : all utterances of one speaker

and one within-factor combination

Example:

UG(speaker=006, alc=a, type=spont) =3 monologues, 2 dialogues and 5 spontaneous commands

Page 17: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

17VIU Seminar 14. - 17. April 2009

F0 Analysis

F0 from Vincent-Schaefer pitch period detector (Emu)

1. F0 Median Fm over utterance group (UG)

Page 18: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

18VIU Seminar 14. - 17. April 2009

F0 Analysis

2. F0 quarter-quantile distances Fqq

over UG

Page 19: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

19VIU Seminar 14. - 17. April 2009

F0 Analysis

3. F0 in lexically accented vowels /a: e: E: i: u: o:/ in same context in read speech

22 female / 24 male speakers

Results:

Median and quarter-quantile distance of F0 behave like global values with following exceptions:

no significant increase of Fm for male speakers in

back vowels /o:/ and /u:/

no significant increase of Fqq

in /a:/, /o:/ and /u:/

Page 20: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

20VIU Seminar 14. - 17. April 2009

F0 Analysis

4. F0 change per speaker

read speechF

m(alc) – F

m(non-alc)

45 female37 male

Page 21: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

21VIU Seminar 14. - 17. April 2009

F0 Analysis

5. Hypothesis: F0 + energy contours differ

Example:

simple declarative sentences with single phrase

calculate F0 by Vincent-Schaefer

linear interpolated F0 gaps

calculated 2nd (tilt) and 3rd (curvature) coefficients of Discrete Cosine Transform (DCT)

Page 22: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

22VIU Seminar 14. - 17. April 2009

F0 Analysis

blue : raw F0 red : linear interpolationgreen : DCT coefficients 0-2

DCT-0 = 313.62 (bias)DCT-1 = 35.73 (tilt)DCT-2 = -0.93 (curvature)

DCT-0 = 338.17 (bias)DCT-1 = 31.01 (tilt)DCT-2 = -3.92 (curvature)

Page 23: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

23VIU Seminar 14. - 17. April 2009

F0 Analysis

2-dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature)

-> centriods identical,variation increases for alcoholized speech

Page 24: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

24VIU Seminar 14. - 17. April 2009

Rhythm

Rhythm in this context:The segmental structure of V, C and P clusters

syllable nuclei = middle of V cluster

Page 25: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

25VIU Seminar 14. - 17. April 2009

Rhythm

Rhythm features

Two basic types of measurements:

counts (normalized over time or on number of syllables) or proportions, calculated across the UG -> one measurement per UG : <feature>

multiple measurements (e.g. per syllable) averaged across UG, usually expressed as mean (.m) and standard deviation (.sd) -> two values per UG : <feature>.m, <feature>.sd

Usually the initial and final silence interval of a recording is disregarded.

Page 26: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

26VIU Seminar 14. - 17. April 2009

Rhythm feature overview

Voicing %V : proportion (time) of voiced signal

Speech rate sylrate : number of syllables (nuclei) per sec

Silence intervals ps-persyl : number of short pauses (<1sec) per syllable ps-persec : number of short pauses per sec pl-persyl : number of long pauses (>1sec) per syllable pl-persec : number of long pauses per sec

Rhythm

Page 27: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

27VIU Seminar 14. - 17. April 2009

Rhythm feature overview (Cont.)

Silence dimensions durs : length of short pauses (<1sec)

Cluster dimensions deltaV, deltaC (Ramus et al 1999) : voiced and unvoiced cluster lengths deltaSN : nuclei distances

Cluster structure nPVI-V, nPVI-C (Grabe&Low 2004) : length difference of consecutive clusters normalized to average length of both clusters nPVI-SN : distance difference of consecutive syllable nuclei normalized to average length of both distances

Rhythm

Page 28: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

28VIU Seminar 14. - 17. April 2009

Some results (45 female + 37 male, read + command speech)

Rhythm

RM-ANOVA: p = 0.0014 p > 0.05 p < 0.001 p > 0.05 p = 0.049

Post hoc speech type: no - only command - only read

Post hoc gender: No interaction in gender in all features

Page 29: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

29VIU Seminar 14. - 17. April 2009

Conclusion

Work in progress, therefore: no conclusions!

But ...

Page 30: 1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech.

30VIU Seminar 14. - 17. April 2009

Prosodic Features for ALC?

---

Thank you!

Discussion