Turn-taking Discourse and Dialogue CS 359 November 6, 2001
Jan 01, 2016
Turn-taking
Discourse and Dialogue
CS 359
November 6, 2001
Agenda
• Motivation– Silence in Human-Computer Dialogue
• Turn-taking in human-human dialogue– Turn-change signals– Back-channel acknowledgments– Maintaining contact
• Exploiting to improve HCC– Automatic identification of disfluencies, jump-in
points, and jump-ins
Turn-taking in HCI
• Human turn end:– Detected by 250ms silence
• System turn end:– Signaled by end of speech– Indicated by any human sound
• Barge-in
• Continued attention:– No signal
Missed turn example
Gesture, Gaze & Voice
• Range of gestural signals:– head (nod,shake), shoulder, hand, leg, foot
movements; facial expressions; postures; artifacts– Align with syllables
• Units: phonemic clause + change
• Study with recorded exchanges
Yielding the Floor
• Turn change signal– Offer floor to auditor/hearer
• Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause
• Likelihood of change increases with more cues
• Negated by any gesticulation
Taking the Floor
• Speaker-state signal– Indicate becoming speaker
• Occurs at beginning of turns
• Cues:– Shift in head direction
• AND/OR
– Start of gesture
Retaining the Floor
• Within-turn signal– Still speaker: Look at hearer as end clause
• Continuation signal– Still speaker: Look away after within-turn/back
• Back-channel:– ‘mmhm’/okay/etc; nods,
• sentence completion. Clarification request; restate
– NOT a turn: signal attention, agreement, confusion
Segmenting Turns
• Speaker alone:– Within-turn signal->end of one unit;– Continuation signal -. Beginning of next unit
• Joint signal:– Speaker turn signal (end); auditor ->speaker;
speaker->auditor– Within-turn + back-channel + continuation
• Back-channels signal understanding
– Early back-channel + continuation
Regaining Attention
• Gaze & Disfluency– Disfluency: “perturbation” in speech
• Silent pause, filled pause, restart
– Gaze:• Conversants don’t stare at each other constantly
• However, speaker expects to meet hearer’s gaze– Confirm hearer’s attention
• Disfluency occurs when realize hearer NOT attending– Pause until begin gazing, or to request attention
Improving Human-Computer Turn-taking
• Identifying cues to turn change and turn start
• Meeting conversations:– Recorded, natural research meetings– Multi-party– Overlapping speech– Units = “Spurts” between 500ms silence
Text + Prosody
• Text sequence:– Modeled as n-gram language model– Implement as HMM
• Prosody:– Duration, Pitch, Pause, Energy– Decision trees: classify + probability
• Integrate LM + DT
Decision Trees
A
B C
D E F G
X=t X=f
Y>1 Y<=1 Y>2 Y<=2
DisfluencySentence End Sentence End None
Interpreting Breaks
• For each inter-word position:– Is it a disfluency, sentence end, or continuation?
• Key features:– Pause duration, vowel duration
• 62% accuracy wrt 50% chance baseline– ~90% overall
• Best combines LM & DT
Jump-in Points
• (Used) Possible turn changes– Points WITHIN spurt where new speaker starts
• Key features:– Pause duration, low energy, pitch fall
• Accuracy: 65% wrt 50% baseline
• Performance depends only on preceding prosodic features
Jump-in Features
• Do people speak differently when jump-in?– Differ from regular turn starts?
• Examine only first words of turns– No LM
• Key features:– Raised pitch, raised amplitude
• Accuracy: 77% wrt 50% baseline– Prosody only
Summary
• Prosodic features signal conversational moves
– Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation
– Jump-ins occur at locations that sound like sent. ends
– Raise voice when jump in