Vocalic Markers of Deception and Cognitive Dissonance for Automated
Emotion Detection Systems
Dr. Aaron C. ElkinsThe University of Arizona
Emotional Voice
2
Can computers perceive vocal emotion?
Yes…. but,The science of the emotional voice is
youngCommunication is complex and
dynamicMoods and emotions contextually switch
Emotion is computationally ill-definedMeasuring emotion may inform theory
3
Emotional Dimensions
4
DISGUST?
Four Components of Speech
Voiced vs. Unvoiced sounds [v] vs. [f]
Airstream through mouth or nose [m] vs. [o]
5
Speech Sounds
(1) pitch, (2) loudness, and (3) qualitySound is small variations in air pressure that
occur rapidly in successionVocal folds superimpose outgoing air of
voiced soundsThe vocal folds vibrate to create a periodic
vibration (100 – 250 Hz)We measure these features digitally
6
Recording Father – Digital Audio
7
Waveform measures pulses of vocal foldsBased on air pressure disturbance (dB)
Voiced vs. Unvoiced (low pressure)
Each peak occurs every 100th of a second (100 Hz)
Vowel Articulation
8
Source-Filter Theory (Müller, 1848)
Vocal Folds vibrate at same speed (pitch)
Resonance changes in vocal tract to filter frequencies (formants)
9
Vocalics
Vocalic AnalysisExamines how it
was saidAmplitude Pitch (frequency)Response latencyTempo
LinguisticsExamines what
was said
Sound Production is Complex
When we tense our muscles, such during stress, our larynx tenses Higher Pitch
The process is complex Emotions affect the normal operation Deception takes away cognitive resources away and is
stressful More mistakes, lower quality, increased average and variation in
pitch Sympathetic Nervous system response
Increased auditory acuity Heightened arousal
10
Standard Vocal Measures Calculated with Praat and Custom Signal Processing Software
11
Nemesysco LVA 6.50Commercial Vocalic Software Evaluated
12
Five Vocalic Studies Summarized
Study One (Deception Experiment)Study Two (Cognitive Dissonance)Study Three (Embodied Conversational
Agent and Trust)Study Four (Embodied Conversational
Agent Security Screening - Bomber)Study Five (Embodied Conversational
Agent Security Screening - Imposter)
Vocal Deception (Study 1) – Experimental Design
N = 96 $10 reward for appearing credible to professional
interviewer Two Sequences:
First Sequence: DT DDTT TD TTDD TSecond Sequence: DT TTDD TD DDTT T
13 Short-Answer Questions Only 8 had variation both within and between subjects Two types of questions: Charged and Neutral
14
Results
Built-in classification performed at chance level Vocal measures independent of system discriminated
deception: FMain, AVJ, and SOS Possible Latent Variables measuring Conflicting Thoughts,
Cognitive Effort, and Emotional Fear Logistic regression performed best on charged questions
Higher pitch, cognitive effort, and hesitations are predictive of deception in more stressful interactions
The claim that the vocal analysis software measures stress, cognitive effort, or emotion cannot be completely dismissed
Deception and Stress can be predicted by Acoustic measures of Voice Quality and Pitch when controlling for speaker characteristics
15
Vocal Dissonance (Study 2) –Experimental Design
Modified Induced-Compliance ParadigmParticipants (N=52) made two vocal
counter-attitudinal arguments for cutting funding for service for the disabled
Choice is manipulated High vs. Low (IV) High N = 24, Low N = 28
Participants report attitude towards argument issue (DV)
Arousal (Vocal Pitch)
17
High choice had a 10Hz higher pitchF(1,50) = 4.43,
p = .04All participants
reduced their pitch over timeF(1,50) = 4.90,
p = .03
Cognitive Difficulty
High Choice had nearly 2x the response latency on argument twoF(1,50) = 4.53, p
= .04Arousal
moderation
18
Cognitive Difficulty
Participants spoke with 33% more nonfluencies on the second argumentF(1,50) = 4.03,
p = .05
19
The Importance of Language (Imagery as Abstract Language)
20
Vocal Dissonance Model
χ²(1, N = 51), p = .49 SRMR = .02R² Attitude Change = .17, Imagery = .11
21
From the lab to the AVATAR
22
First Kiosk
23
Kiosk from Last Year
24
Third-Generation Kiosk
25
Gender and Demeanor
26
Vocal Trust (Study 3) – Experimental Design
• Participants completed pre-survey
• Packed bag before ECA screening interviewing
• Completed security screening
• All responses to ECA recorded for vocal analysis
ECA Demeanor and Gender
28
Question Block 1
Question Block 2
Question Block 3
Question Block4
Repeated Measures Latin Square DesignAll participants interacted with all demeanor and gender ECA combinations4 Questions Per block, 16 Total Questions
N = 88 Participants (53 Males, 35 Females)
Trust and Time
Main effects Initial Trust = 4.09 Trust Rate of Change
.04 per second increase p < .01
Duration .05 decrease in trust for
every second spent answering the ECA overthe 7.6 second average
p < .001
29
Multilevel Growth Model Specified with Trust as the DV (N = 218) with Subject as random effect (N=60)
Vocal Pitch, Time, and Trust
Main Effect of Pitch For every 1Hz increase
in pitch over 156Hz trust drops by .01
p = .03 Interaction Pitch and
Time Pitch x Time b = 9.3e-
05, p = .03 Over time pitch predicts
trust less and less
30
Results
Human perceptions of trust transfer to ECA Time plays in important role in the interaction All participants trusted the ECA more over
time, particularly when it smiled 48 increase in trust when ECA smiles
Vocal measures of pitch predicted trust, but only early on For every 1Hz increase in pitch over 156Hz trust
drops by .01 Over time pitch predicts trust less and less
31
Vocalics of a Bomber (Study 4) Experimental Design
• 29 EU border guards were randomly assigned to build a bomb (N = 16) or Control (N = 13) then pack a bag
• Identical to Study 3,but no breaks in the interview
• Only male neutral demeanor ECA interviewed participants
• Bomb Makers were instructed to successfully smuggle the bomb past the ECA
Vocal Analysis
Recorded responses to question:“Has anyone given you a prohibited
substance to transport through this checkpoint?”Average Response 2.68 sec (SD = 1.66)
Responses such as “No” or “of course not”Vocal measures of Pitch and Pitch
Variation
33
Results of Vocal Pitch
Voice Quality, Gender, and Intensity included as covariates
No difference in mean vocal pitch F(1,22)=0.38, p = .54
Main Effect of pitch variation Bomb Makers had 25.34%
more variation F(1,22)=4.79, p=.04
34
Pitch Contours
35
36
Eye Gaze: Guilty
37
Eye Gaze: Innocent
Vocalics of an Imposter (Study 5) – Experimental Design
38 EU Border GuardsAll required to present visa and passport
through multiphase screening E-gate Manual Processing AVATAR Screening Interview
Four randomly assigned imposters carrying false documents with hostile intentions through screening
AVATAR Interaction Example
iPad Output for Screener
40
Voice Quality Change from Baseline Question (What is your full name?)
41
Vocalic Classification Model
42
Vocalic Resulting Classification
7 innocents falsely classified as terrorists 27 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 81%
TPR = 100% TNR = 79% FPR = 20% FNR = 0%
43
Eye Fixations on Visa
44
Date of Birth Results – Correct?
45
Final Decision Model
46
Vocalic Resulting Classification
3 innocents falsely classified as terrorists One of these three was actually lying
Actually a True Positive 31 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 94.47%
TPR = 100% TNR = 88.24% FPR = 5.8% Reduced by 3/4 FNR = 0%
47
Questions?
Isn’t the voice amazing?