Generative Modeling and Classification of Dialogs by Low-Level Features Marco Cristani Marco Cristani , Anna Pesarin, Alessandro Tavano, , Anna Pesarin, Alessandro Tavano, Carlo Drioli, Alessandro Perina, Carlo Drioli, Alessandro Perina, Vittorio Murino Vittorio Murino 2 S t-1 2 S t 2 S t+4 1 S t-1 1 S t 1 S t+4 … BLA BLA BLA A. Markov A. Pentland BLA BLA BLA BLA PRINT ME IN GRAYSCALE
36
Embed
Generative Modeling and Classification of Dialogs by Low-Level Features Marco Cristani, Anna Pesarin, Alessandro Tavano, Carlo Drioli, Alessandro Perina,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Generative Modeling and Classification of
Dialogs by Low-Level Features
Marco CristaniMarco Cristani, Anna Pesarin, Alessandro Tavano, , Anna Pesarin, Alessandro Tavano, Carlo Drioli, Alessandro Perina, Carlo Drioli, Alessandro Perina, Vittorio MurinoVittorio Murino
2St-12St
2St+4
1St-11St
1St+4
…
BLA BLA
BLA
A. Markov
A. Pentland
BLABLA
BLABLA
PRINT ME IN GRAYSCALE
Summary Summary
• Goal
• Introduction
• Our approach
• Experiments
• Conclusions
GoalGoal
• To model and to classify dyadic conversational audio situations
• The situations are characterized by: – the kind of subjects involved within (adults,
children)– a predominant mood (flat or arguing discussion)
• Examples
1
2 3
GoalGoal (2) (2)
• Our guidelines for the modeling are:– to exploit the conversational turn-taking– to not model the content of the conversations (too
difficult)
• Our contribute– A novel kind of features (the Steady Conversational
Periods, SCP) + a very simple generative framework
• In practice…– We are able to finely characterize the turn-taking
encoding also the timing of the turns
Introduction – Social signallingIntroduction – Social signalling• Our aim can be cast as social signalling
problem• Social signals [Vinciarelli et
al. 2008] – the expression of one’s
attitude towards social situation and interplay
– manifested through a multiplicity of non-verbal behavioural cues (facial expressions, gestures, and vocal outbursts)
• Social signalling– recent
formalization
SocialPsychology
Pattern Recognition
Social Signalling
Introduction (2) – social signalsIntroduction (2) – social signals• Bricks for social signals, [Vinciarelli et al.
2008]
OUR FOCUS
Introduction (3) - DefinitionsIntroduction (3) - Definitions• A taxonomy for the social signals
– behavioural/social cues (or thin slice of behavior)• a set of temporal changes in neuromuscular and
physiological activity that last for short intervals of time (milliseconds to minutes)
– social signals (or social behaviours)• multiple behavioural cues
• attitudes towards others or specific social situations that can last minutes to hours
• Markov chaining for multiple agents: connections
• The core of the model is the transition probability (c,d=1,2)
2St-12St
2St+4
1St-11St
1St+4
…
•Problem: computational burden–for C processes, the joint states give transition matrices of O(NCxNC), where N is the number of states for the single processes
• each single process choses the next state independently from the other single process(es) – reasonable! – O(NCxN) space complexity, still hard to deal with
• Once a model Ψ={ϴ,λ} and a test dialog I (an ordered pair of arrays O1 and O2 composed by {S,T} symbols) are provided, we want the likelihood P(I| Ψ) = P(O1 , O2 | Ψ)
1. SCP are extracted2. SCP Gaussian labels are estimated from ϴ,
• Restricted extended dataset: – We add conversations
• 5 flat non-structured conversations
• 9 disputes between adults (an operator pushed for fighting, the other subject naturally reacted)
–We instantiate 4 classification tasks
(A) flat vs dispute - (cat:1 vs cat:3);(B) flat vs dispute, general - ((cat:1 U cat:2) vs cat:3);(C) with vs without child - (cat:2 vs cat:1);(D) all vs all;
–We gather three categories of dialogs
1.Flat dialog between adults (18 samples)2.Flat dialog between a child and an adult (14 samples)3.Dispute (9 samples, only between adults)
• Comparative strategies– SCP histograms (SCP)
• normalized histogram of the SCPs (silence, speech) as signature
• Bhattacharyya distance for the classification
– Turn taking influence model (TTIM)• In practice, it is as we had “SCP” with the same
duration [Basu et al. 01]
– Mixture of Gaussian classifier on a set of acoustic cues (MOG) [Shriberg 98] [Fernandez et al. 02] :• pitch range measure (for the intonation)
• “enrate” speech rate (articulation velocity)
• spectral flatness measure (SFM)
• drop-off of spectral energy above 1000 Hz (DO1000) for the emotion modelling
(A) flat vs dispute - (cat:1 vs cat:3);(B) flat vs dispute, general - ((cat:1 U cat:2) vs cat:3);(C) with vs without child - (cat:2 vs cat:1);(D) all vs all;
• lower accuracy in the task A – some flat conversations are misclassified– sometimes timing of flat conversations is built by
subjects which utters very short sentences, similar to dispute
– this behavior is captured by our model and disregarded by TTIM
– SOLUTION: augment the features, not only SCPs!
Conclusions Conclusions
• A novel way to model dialogs has been proposed
• The main contributions are– Steady Conversational Periods (SCP), as a way to
synchronize a dialog, making feasible first-order Markov treatment
– The embedding of SCP in an Observed Influence Model, resulting in a detailed way to describe the turn taking of a conversation
• The future improvements– From a methodological point of view
• Inserting uncertainty in the SCP states, i.e., move to a full Influence Model
• Enrich the model with different prosodic features
– From a practical point of view• Enlarge the data set
• Try novel situations
Publications Publications • A.Pesarin, M.Cristani, V.Murino, C.Drioli and A.Perina,A statistical signature for automatic dialogue
classification. In proceedings of the International Conference on Pattern Recognition (ICPR 2008) Tampa, Florida.
• M.Cristani, A.Pesarin, C.Drioli, A.Tavano, A.Perina, V.Murino, Auditory Dialog Analysis and Understanding by Generative Modelling of Interactional Dynamics In proceedings of the Second IEEE Workshop on CVPR 2009 for Human Communicative Behavior Analysis.
• M.Cristani, A.Tavano, A.Pesarin, C.Drioli, A.Perina, V.Murino, Generative Modeling and Classification of Dialogs by Low-Level Features, submitted to System Man and Cybernetics:Part B (under review)
References References • [Vinciarelli et al. 2008] Vinciarelli, A., Pantic, M., Bourlard, H., and Pentland, A. 2008. Social
signal processing: state-of-the-art and future perspectives of an emerging domain. In Proceeding of the 16th ACM international Conference on Multimedia MM '08.
• [Choudhury et al. 2004] T. Choudhury and S. Basu. Modeling conversational dynamics as a mixed memory markov process. In Proc. NIPS, 2004.
• [Meyn 2005] S. P. Meyn and R.L. Tweedie, 2005. Markov Chains and Stochastic Stability. Second edition to appear, Cambridge University Press, 2008
• [Asavathiratham 2000] C. Asavathiratham, “A tractable representation for the dynamics of networked markov chain,” Ph.D. dissertation, Dept. of ECS, MIT, 2000.
• [Saul et al. 99] L. Saul and M. Jordan, “Mixed memory markov models: Decomposing complex stochastic processes as mixtures of simpler ones,” Machine Learning, vol. 37, no. 1, pp. 75–87, 1999.
• [Basu et al. 01] S. Basu, T. Choudhury, B. Clarkson, and A. Pentland, “Learning human interaction with the influence model,” MIT MediaLab, Tech. Rep. 539, 2001.
• [Shriberg 98] E. Shriberg, “Can prosody aid the automatic classification of dialog acts in conversational speech?” Language and Speech, vol. 41, no. 4, pp. 439–487, 1998.
• [Fernandez et al. 02] R. Fernandez and R. Picard, “Dialog act classification from prosodic features using support vector machines,” in Proc. of Speech Prosody, 2002.