Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems

Vocalic Markers of Deception and Cognitive Dissonance for Automated

Emotion Detection Systems

Dr. Aaron C. ElkinsThe University of Arizona

Emotional Voice

2

Can computers perceive vocal emotion?

Yes…. but,The science of the emotional voice is

youngCommunication is complex and

dynamicMoods and emotions contextually switch

Emotion is computationally ill-definedMeasuring emotion may inform theory

3

Emotional Dimensions

4

DISGUST?

Four Components of Speech

Voiced vs. Unvoiced sounds [v] vs. [f]

Airstream through mouth or nose [m] vs. [o]

5

Speech Sounds

(1) pitch, (2) loudness, and (3) qualitySound is small variations in air pressure that

occur rapidly in successionVocal folds superimpose outgoing air of

voiced soundsThe vocal folds vibrate to create a periodic

vibration (100 – 250 Hz)We measure these features digitally

6

Recording Father – Digital Audio

7

Waveform measures pulses of vocal foldsBased on air pressure disturbance (dB)

Voiced vs. Unvoiced (low pressure)

Each peak occurs every 100th of a second (100 Hz)

Vowel Articulation

8

Source-Filter Theory (Müller, 1848)

Vocal Folds vibrate at same speed (pitch)

Resonance changes in vocal tract to filter frequencies (formants)

9

Vocalics

Vocalic AnalysisExamines how it

was saidAmplitude Pitch (frequency)Response latencyTempo

LinguisticsExamines what

was said

Sound Production is Complex

When we tense our muscles, such during stress, our larynx tenses Higher Pitch

The process is complex Emotions affect the normal operation Deception takes away cognitive resources away and is

stressful More mistakes, lower quality, increased average and variation in

pitch Sympathetic Nervous system response

Increased auditory acuity Heightened arousal

10

Standard Vocal Measures Calculated with Praat and Custom Signal Processing Software

11

Nemesysco LVA 6.50Commercial Vocalic Software Evaluated

12

Five Vocalic Studies Summarized

Study One (Deception Experiment)Study Two (Cognitive Dissonance)Study Three (Embodied Conversational

Agent and Trust)Study Four (Embodied Conversational

Agent Security Screening - Bomber)Study Five (Embodied Conversational

Agent Security Screening - Imposter)

Vocal Deception (Study 1) – Experimental Design

N = 96 $10 reward for appearing credible to professional

interviewer Two Sequences:

First Sequence: DT DDTT TD TTDD TSecond Sequence: DT TTDD TD DDTT T

13 Short-Answer Questions Only 8 had variation both within and between subjects Two types of questions: Charged and Neutral

14

Results

Built-in classification performed at chance level Vocal measures independent of system discriminated

deception: FMain, AVJ, and SOS Possible Latent Variables measuring Conflicting Thoughts,

Cognitive Effort, and Emotional Fear Logistic regression performed best on charged questions

Higher pitch, cognitive effort, and hesitations are predictive of deception in more stressful interactions

The claim that the vocal analysis software measures stress, cognitive effort, or emotion cannot be completely dismissed

Deception and Stress can be predicted by Acoustic measures of Voice Quality and Pitch when controlling for speaker characteristics

15

Vocal Dissonance (Study 2) –Experimental Design

Modified Induced-Compliance ParadigmParticipants (N=52) made two vocal

counter-attitudinal arguments for cutting funding for service for the disabled

Choice is manipulated High vs. Low (IV) High N = 24, Low N = 28

Participants report attitude towards argument issue (DV)

Arousal (Vocal Pitch)

17

High choice had a 10Hz higher pitchF(1,50) = 4.43,

p = .04All participants

reduced their pitch over timeF(1,50) = 4.90,

p = .03

Cognitive Difficulty

High Choice had nearly 2x the response latency on argument twoF(1,50) = 4.53, p

= .04Arousal

moderation

18

Cognitive Difficulty

Participants spoke with 33% more nonfluencies on the second argumentF(1,50) = 4.03,

p = .05

19

The Importance of Language (Imagery as Abstract Language)

20

Vocal Dissonance Model

χ²(1, N = 51), p = .49 SRMR = .02R² Attitude Change = .17, Imagery = .11

21

From the lab to the AVATAR

22

First Kiosk

23

Kiosk from Last Year

24

Third-Generation Kiosk

25

Gender and Demeanor

26

Vocal Trust (Study 3) – Experimental Design

• Participants completed pre-survey

• Packed bag before ECA screening interviewing

• Completed security screening

• All responses to ECA recorded for vocal analysis

ECA Demeanor and Gender

28

Question Block 1

Question Block 2

Question Block 3

Question Block4

Repeated Measures Latin Square DesignAll participants interacted with all demeanor and gender ECA combinations4 Questions Per block, 16 Total Questions

N = 88 Participants (53 Males, 35 Females)

Trust and Time

Main effects Initial Trust = 4.09 Trust Rate of Change

.04 per second increase p < .01

Duration .05 decrease in trust for

every second spent answering the ECA overthe 7.6 second average

p < .001

29

Multilevel Growth Model Specified with Trust as the DV (N = 218) with Subject as random effect (N=60)

Vocal Pitch, Time, and Trust

Main Effect of Pitch For every 1Hz increase

in pitch over 156Hz trust drops by .01

p = .03 Interaction Pitch and

Time Pitch x Time b = 9.3e-

05, p = .03 Over time pitch predicts

trust less and less

30

Results

Human perceptions of trust transfer to ECA Time plays in important role in the interaction All participants trusted the ECA more over

time, particularly when it smiled 48 increase in trust when ECA smiles

Vocal measures of pitch predicted trust, but only early on For every 1Hz increase in pitch over 156Hz trust

drops by .01 Over time pitch predicts trust less and less

31

Vocalics of a Bomber (Study 4) Experimental Design

• 29 EU border guards were randomly assigned to build a bomb (N = 16) or Control (N = 13) then pack a bag

• Identical to Study 3,but no breaks in the interview

• Only male neutral demeanor ECA interviewed participants

• Bomb Makers were instructed to successfully smuggle the bomb past the ECA

Vocal Analysis

Recorded responses to question:“Has anyone given you a prohibited

substance to transport through this checkpoint?”Average Response 2.68 sec (SD = 1.66)

Responses such as “No” or “of course not”Vocal measures of Pitch and Pitch

Variation

33

Results of Vocal Pitch

Voice Quality, Gender, and Intensity included as covariates

No difference in mean vocal pitch F(1,22)=0.38, p = .54

Main Effect of pitch variation Bomb Makers had 25.34%

more variation F(1,22)=4.79, p=.04

34

Pitch Contours

35

36

Eye Gaze: Guilty

37

Eye Gaze: Innocent

Vocalics of an Imposter (Study 5) – Experimental Design

38 EU Border GuardsAll required to present visa and passport

through multiphase screening E-gate Manual Processing AVATAR Screening Interview

Four randomly assigned imposters carrying false documents with hostile intentions through screening

AVATAR Interaction Example

iPad Output for Screener

40

Voice Quality Change from Baseline Question (What is your full name?)

41

Vocalic Classification Model

42

Vocalic Resulting Classification

7 innocents falsely classified as terrorists 27 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 81%

TPR = 100% TNR = 79% FPR = 20% FNR = 0%

43

Eye Fixations on Visa

44

Date of Birth Results – Correct?

45

Final Decision Model

46

Vocalic Resulting Classification

3 innocents falsely classified as terrorists One of these three was actually lying

Actually a True Positive 31 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 94.47%

TPR = 100% TNR = 88.24% FPR = 5.8% Reduced by 3/4 FNR = 0%

47

Questions?

Isn’t the voice amazing?

Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems

Documents

vocal emotion

vocal tract

vocal deception study

vocal dissonance model1

cognitive effort

cognitive dissonance

cognitive resources

predictive of deception