-
Affective StorytellingAutomatic Measurement of Story
Effectiveness fromEmotional Responses Collected over the
Internet
Daniel McDuffPhD Proposal in Media Arts & Sciences
Affective Computing Group, MIT Media
[email protected]
June 6, 2012
Executive SummaryEmotion is key to the effectiveness of
narratives and storytelling whether it be in influencing
memory, likability or persuasion. Stories, even if fictional,
have the ability to induce a genuineemotional response. However,
the understanding of the role of emotions in storytelling and
ad-vertising effectiveness has been limited due to the difficulty
in measuring emotions in real-lifecontexts. Video advertising is a
ubiquitous form of a short story usually 30-60 seconds designedto
influence, persuade and engage, in which media with emotional
content is frequently used andthis will be one of the focuses of
this thesis. The lack of understanding of the effects of emotion
inadvertising results in large amounts of wasted time, money and
other resources.
Facial expressions, head gestures, heart rate, respiration rate
and heart rate variability can in-form us about the emotional
valence, arousal and attention of a person. In this thesis I
propose todemonstrate how automatically detected naturalistic and
spontaneous facial responses and physio-logical responses can be
used to predict the effectiveness of stories.
I propose a framework for automatically measuring facial and
physiological responses in addi-tion to self-report and behavioral
measures to content (e.g. video advertisements) over the Internetin
order to understand the role of emotions in story effectiveness.
Specifically, I will present anal-ysis of the first large scale
data of facial, physiological, behavioral and self-report responses
tovideo content collected in-the-wild using the cloud. I will
develop models for evaluating the ef-fectiveness of stories (e.g.
likability, persuasion and memory) based on the automatically
extractedfeatures. This work will be evaluated on the success in
predicting measures of story effectivenessthat are useful in
creation of content whether that be in copy-testing or content
development.
i
-
Affective StorytellingAutomatic Measurement of Story
Effectiveness fromEmotional Responses Collected over the
Internet
Daniel McDuffPhD Proposal in Media Arts & Sciences
Affective Computing Group, MIT Media Lab
Thesis Committee
Rosalind PicardProfessor of Media Arts and Sciences, MIT Media
LabThesis Supervisor
Jeffrey CohnProfessor of PsychologyUniversity of Pittsburgh
Ashish KapoorSenior Research ScientistMicrosoft Research,
Redmond
Thales TeixeiraAssistant Professor of Business
AdministrationHarvard Business School
ii
-
Abstract
Emotion is key to the effectiveness of narratives and
storytelling whether it be in influ-encing memory, likability or
persuasion. Stories, even if fictional, have the ability to induce
agenuine emotional response. However, the understanding of the role
of emotions in storytellingand advertising effectiveness has been
limited due to the difficulty in measuring emotions inreal-life
contexts. Video advertising is a ubiquitous form of a short story
usually 30-60 sec-onds designed to influence, persuade and engage,
in which media with emotional content isfrequently used and this
will be one of the focuses of this thesis.
Facial expressions, head gestures, heart rate, respiration rate
and heart rate variability caninform us about emotional valence and
arousal and attention. In this thesis I propose to demon-strate how
automatically detected naturalistic and spontaneous facial
responses and physiolog-ical responses can be used to predict the
effectiveness of stories. The results will be used toinform the
creation and evaluation of new content.
I propose a framework for automatically measuring facial and
physiological responses inaddition to self-report and behavioral
measures to content (e.g. video advertisements) over theInternet in
order to understand the role of emotions in story effectiveness.
Specifically, I willpresent analysis of the first large scale data
of facial, physiological, behavioral and self-reportresponses to
video content collected in-the-wild using the cloud. I will develop
models forevaluating the effectiveness of stories (e.g. likability,
persuasion and memory) based on theautomatically extracted
features.
1 IntroductionThere remains truth in Ray and Batras [28]
statement: an inadequate understanding of the roleof affect in
advertising has probably been the cause of more wasted advertising
money than anyother single reason. This statement applies beyond
advertising to many other forms of media andis due in part to the
lack of understanding about how to measure emotion. This thesis
proposaldeals with evaluating the effectiveness of emotional
content in storytelling and advertising beyondthe laboratory
environment using remotely measured facial and physiological
responses. I will an-alyze challenging ecologically valid data
collected over the Internet in the same contexts in whichthe media
would normally be consumed and build a framework and set of models
for automaticevaluation of effectiveness based on affective
responses.
The face is one of the richest sources of communicating
affective and cognitive informa-tion [11]. In addition,
physiological reactions, such as changes in heart rate and other
vital signs,are partially controlled by the autonomic nervous
system and as such are manifestations of emo-tional processes [36].
Recent work has demonstrated that both facial behavior and
physiologicalinformation can be measured directly from videos of
the human face and as such emotion valenceand arousal can be
measured remotely.
Previous work has shown that many people are willing to engage
and share visual images fromtheir webcam over the Internet and
these images and videos can be used for training
automaticalgorithms for learning [32, 34, 22]. Moreover, webcams
are now ubiquitous and have become astandard component on many
media devices, laptops and tablets. In 2010, the number of
cameraphones in use totaled 1.8 billion, which accounted for a
third of all mobile phones1. In addition,
1http://www.economist.com/node/15865270
1
-
about half of the videos shared on Facebook every day are
personal videos recorded from a desktopor phone camera2.
Traditionally consumer testing of video advertising, whether by
self-report, facial response orphysiology, has been conducted in
laboratory settings. Lab-based studies, while controlled,
aresubject to bias from the presence of an experimenter and other
factors (e.g. comfort with the con-text) unrelated to advertising
interest that may impact the participants emotional experience
[35].Conducting experiments outside a lab-based context can help
avoid such problems.
Self-report is the current standard measure of affect, where
people are typically interviewed,asked to rate their feeling on a
Likert scale or turn a dial to quantify their state (affect dial
ap-proaches). While convenient and inexpensive, self-report is
problematic because it is also subjectto biasing from the context,
increased cognitive load and other factors of little relevance to
thestimulus being tested [30]. Self-report has a number of
drawbacks including the difficulty for peo-ple to access
information about their emotional experiences and their willingness
to report feelingseven if they didnt have them [8]. For many the
act of introspection is challenging to performin conjunction with
another task and may in itself alter that state [21]. Although
affect dial ap-proaches provide a higher resolution report of a
subjects response compared to a post-hoc survey,subjects are often
required to view the stimuli twice in order to help the participant
introspect ontheir emotional state.
Unlike self-report, facial expressions and physiological
responses are implicit, non-intrusiveand do not interrupt a persons
experience. In addition, as with affect dial ratings, facial
andphysiological responses allow for continuous and dynamic
representation of how affect changesover time. This represents a
much richer data than can be obtained via a post-hoc survey. Asmall
number of marketing studies consider the measurement of emotions
via physiological [6],facial [18] or brain responses [3]. However,
these are invariably performed in laboratory settingsand are
restricted to a limited demographic.
Advertising and online media is global: movie trailers,
advertisements and other content cannow be viewed the world over
via the Internet and not just on selected television networks. It
isimportant that marketers understand the nuances in responses
across a diverse demographic and abroad set of geographic
locations. For instance, advertising that works in certain cultural
contextsmay not be effective in others. A majority of the studies
of emotion in advertising have onlyconsidered a homogeneous subject
pool, such as university undergraduates or a group from
onelocation. There is evidence to suggest that emotions can be
universally expressed on the face [10]and our framework allows for
the evaluation of advertising effectiveness across a large and
diversedemographic much more efficiently than is possible via
lab-based experiments.
The aim of the proposed research is to utilize a framework for
measuring facial, physiological,self-report and behavioral
responses to commercials over the Internet in order to understand
therole of emotions in advertising effectiveness (e.g. likability,
persuasion and sales) and to designan automated system for
predicting success based on these signals. This incorporates
first-in-the-world studies of measurement of these parameters via
the cloud and allows the robust explorationof phenomena across a
diverse demographic and a broad set of geographic locations.
2http://gigaom.com/video/facebook-40-of-videos-are-webcam-uploads/
2
-
2 ContributionsThe main contributions of this thesis are
described below:
1. To use a custom cloud based framework for collecting a large
corpus of response videosto online media content (advertisements,
movie trailers, etc.) with ground truth success(sharing,
likability, persuasion and sales). To collect data from a diverse
population to abroad range of content.
2. To automatically analyze facial responses, gestures and
physiological reactions using com-puter vision algorithms.
3. To design, train and evaluate, a set of models for predicting
key measures of story/advertisementeffectiveness based on facial
responses, gestures and physiological features
automaticallyextracted from the videos.
4. To propose generalizable emotional profiles that describe an
effective story/advertisement inorder to practically inform the
development of new content.
5. To implement a system (demo) that incorporates the findings
into a fully automated classi-fication of a response to a
story/advertisement. The predicted label will be the effect of
thestory in changing likability/persuasion.
3 Background and Related Work
3.1 Storytelling, Marketing and EmotionEmotion is key to the
effectiveness of narratives and storytelling [15]. Stories, even if
fictional,have the ability to induce a genuine emotional response
[14]. However, there are nuances in theemotional response to
narrative representations compared to everyday social dialogue [25]
andtherefore context specific models need to be designed.
Marketing, and more specifically advertising, makes much use of
narratives and stories. Therole of emotion in marketing and
advertising has been considered extensively since early work
byZajonc [37] that argued that emotions function independently of
cognition and can indeed over-ride it. It is widely held that
emotions play a significant part in the decision-making process
ofpurchasing and advertising is often seen as an effective source
of enhancement of these emotionalassociations [24]. In advertising
the states of amusement, surprise and confusion are of
particularinterest and measurement of valence and arousal should be
useful in distinguishing between thesestates.
In a study of TV commercials, Hazlett and Hazlett [18] found
that facial responses, mea-sured using facial electromyography
(EMG), were a stronger discriminator between commercialsand was
more strongly related to recall than self-report information. Lang
[20] found that pha-sic changes in heart rate could act as an
indication of attention and tonic changes could act as anindication
of arousal. The combination of physiology and facial responses is
likely to improverecognition of emotions further still.
3
-
Sales is arguably the key measure of success of advertising and
predicting behavioral measuresof success from responses will be our
main focus. However, the success of an advertisement variesfrom
person to person and sales figures at this level are often not
available, therefore I will alsoconsider other measures of success,
in particular liking, memory (recall and recognition) and
per-suasion. Ad liking was found to be the best predictor of sales
success in the Advertising ResearchFoundation Copy validation
Research Project [17]. Biel [5] and Gordon [13] state that
likabilityis the best predictor of sales effectiveness. Explicit
memory of advertising (recall and recognition)is one of the most
frequently used metrics for measuring advertising success.
Independent studieshave demonstrated the sales validity of recall
[17, 24]. Indeed, recall was found to be the secondbest predictor
of advertising effectiveness (after ad liking) as measured by
increased sales in theAdvertising Research Foundation Copy
validation Research Project [17].
Behavioral methods such as ad zapping or banner click through
rates are frequently used meth-ods of measuring success. Teixeira
et al. [33] show that inducing affect is important in
engagingviewers in online video adverts and in reducing the
frequency of zapping (skipping the adver-tisement). They
demonstrated that joy was one of the states that stimulated viewer
retention in thecommercial. With our web based framework I can test
behavioral measures (such as sharing orclick through) outside the
laboratory in natural consumption contexts.
3.2 Facial Actions, Physiology, and EmotionsCharles Darwin was
one of the first to demonstrate universality in facial expressions
in his book,The Expression of the Emotions in Man and Animals [9].
Since then a number of other studieshave demonstrated that facial
actions communicate underlying emotional information and thatsome
of these expressions are consistent across cultures [10].
There are two main approaches for coding of facial displays,
sign judgment and messagejudgment. Sign judgment involves the
labeling of facial muscle movements or actions, such asthose
defined in the FACS [12] taxonomy, message judgments are labels of
human perceptualjudgment of the underlying state. In this proposal
I focus on sign judgments, specific action unitintensities, as they
are objective and not open to contextual variation.
The Facial Action Coding System (FACS) [12] is the most
comprehensive labeling system.FACS 2002 defines 27 action units
(AU) - 9 upper face and 18 lower face, 14 head positionsand
movements, 9 eye positions and movements and 28 other descriptors,
behaviors and visibilitycodes [7]. The action units can be further
defined using five intensity ratings from A (minimum)to E
(maximum). More than 7000 AU combinations have been observed
[29].
Physiological changes, such as heart rate (HR), respiration rate
(RR) and heart rate variability(HRV), are partially controlled by
the autonomic nervous system, these are important in
describingemotional responses in the real world [16]. Physiological
changes can contain information aboutboth the emotional arousal and
valence of a person.
By measuring facial responses, gestures, HR, RR and HRV we are
able to capture elementsof both the valence and arousal dimensions
of emotion. In addition, we can capture levels ofviewer attention.
These three dimensions are likely to be important in predicting
effectivenessfrom responses.
4
-
3.3 Remote Measurement of Facial Actions and PhysiologyThe first
example of automated facial expression recognition was presented by
Suwo et al. [31].Over the past 20 years there have been significant
advances in the state of the art in action unitrecognition [38].
Our preliminary work has shown that certain actions, such as
smiles, can beaccurately detected in low resolution, unconstrained
videos collected via the Internet [23].
We have shown that heart rate (HR), respiration rate (RR) and
heart rate variability (HRV) canbe measured remotely using camera
based technology [26, 27]. This method has been validatedon webcam
videos with a resolution of 640x480 pixels and a frame rate of 15
fps (correlationwith contact sensor measurements for HR: r=1.00;
for RR: r=0.94; for HRV HF and LF: 0.94;all correlations p
-
sharing and likability. Achieving this aim will involve the
identification of generalizable facial ac-tion and physiological
features and models that are adaptable to contexts. This work is
the firstlarge scale study to consider physiological and facial
responses measured in-the-wild via thecloud to understand the
impact of emotional content in storytelling and advertising and how
to useit to maximum effect. Figure 4 shows a summarization of the
framework proposed which is basedon Barrett et al.s dual-process
model of emotion [4]. The valence, arousal and attention of theuser
may be represented by latent variables within the models that are
trained and not predictedexplicitly.
4.2 MethodologyI will use a web based framework for collecting
responses over the Internet. The first iterationof this framework
was presented in [22] and is shown in Figure 1. This framework
allows theefficient collection of thousands of naturalistic and
spontaneous responses to online videos. Fig-ure 2(a) shows example
frames from data collected via this framework. Recruitment of
participantshas initially been performed by creating a social
interface that allows people to share a graph oftheir automatically
analyzed smile response with others but recruitment can also be
performed viaMechanical Turk, or another crowd marketplace, with
financial incentives. The latter will be usedfor more in depth
studies in which voluntary participation is difficult to
obtain.
The facial response videos, an example of which is shown in
Figure 2(b), will be analyzedusing automated facial action unit
detection algorithms developed by Affectiva or MIT. As anexample,
Affectivas AU12 algorithm is based on Local Binary Pattern (LBP)
features with theresulting features being classified using decision
tree classifiers. This outputs a frame-by-framemeasurement of smile
probability. An example of the smile probability output is also
shown inFigure 2(b). Although the algorithms will be trained with
binary examples (e.g. AU12 vs. non-AU12) the probability outputs
tend to be positively correlated with the intensity of the
action,as shown in Figure 2(b). However, we must acknowledge that
this interpretation not always beaccurate. Classifiers for AU1+2
(Frontalis/eyebrow raise), AU4 (Corrugator/brow furrow) andAU12
(Zygomatic Major/smile) will be used in addition to any others that
are available by thetime that analysis is performed. AU1+2, AU4 and
AU12 should capture the main components ofsurprise, confusion and
amusement responses. Head turning, tilting and general motion will
becalculated through the use of a head pose detector and facial
feature tracker. The intention is tocapture information about the
attention of the viewers.
Heart rate, respiration rate and heart rate variability features
are calculated using a non-contactmethod described in [26, 27].
Figure 3 shows graphically how our algorithm can be used to
extractthe blood volume pulse (BVP) and subsequently HR, RR and HRV
information from the RGBchannels in a video containing a face.
Specifically, the facial region within each video frame issegmented
automatically and a spacial average of the RGB color values
calculated for the regionof interest (ROI). For a given time window
(typically 20-30s) the raw RGB signals are normalizedand detrended.
A blind source separation technique (Independent Component Analysis
(ICA)) isthen used to calculate a set of source signals. The source
signal with the strongest BVP signal isfiltered and used to
calculate the HR, RR and HRV. This method has been validated
against contactsensors and proven to be accurate.
There will be limitations involved in collecting data over the
Internet, the uncontrolled natureof this research presents several
challenges. Firstly, clean data is not always available, motion
6
-
MEDIA
Video of webcam
footage stored
Video processed to
calculate facial and
physiological response
SELF-REPORT
3.
4. 5.
6.
Flash capture of webcam footage.
Frames sent to server.
Media clip played simultaneously.
SERVER
CLIENT
7.
User can answer self-report questions
CONSENT
2.
Participant asked if they will allow
access to their webcam stream.
Behavioral measures - sharing/
click through - recorded
1.
Participant visits site and is
introduced to the study.
HOMEPAGE/
INTRODUCTION
Figure 1: Overview of what the user experience and web-based
framework that is used to crowd-source the facial videos. The video
from the webcam is streamed in real-time to a server whereautomated
facial expression analysis is performed. All the video processing
can be performed onthe server side.
and context of the users will vary considerably and result in
greater noise within our measurementsthan if the data were
collected in a laboratory. In addition, the video recordings are
likely to havea lower frame rate and resolution compared to those
that could be collected in a laboratory. Inwhich case some more
subtle and faster micro-expressions may be missed and the
physiologicalmeasurements will be noisier. Secondly, detailed and
reliable profiles of the participants may bedifficult to ensure in
all cases. In order to address these weaknesses we will compare the
resultsobtained against those from analyses of datasets collected
within controlled laboratory settings.The computer vision methods
for extracting facial and physiological response features will
bevalidated in controlled studies with ground truth measures and
against videos of differing qualitiesin order to ensure reliability
on data collected over the Internet. Specifically, I intend to
recruit anumber of subjects (10-20) and record video that matches
those collected over the Internet withground truth measures of
physiology. The accuracy of the system can be characterized under
theseconditions. The AU detection algorithms will be tested against
hand labeled examples of framescollected over the Internet as shown
in [23].
By performing analysis online we can collect data from large
populations with considerablerepresentation from diverse subgroups
(gender/age/cultural background). We will recruit 150 par-ticipants
for the second study proposed below and a similar number for the
subsequent studies. Inthese cases recruitment will be possible
through existing market research participant pools. How-ever,
recruitment can also occur through a variety of other mechanisms
(such as voluntary meansand paid crowd marketplaces) and by using
self-report measures of age, gender and cultural back-ground.
The extracted features will be collected alongside self-report
responses, as these are the currentstandard, and behavioral
metrics. In order to minimize effects due to primacy and recency
theorder in which advertisements are presented will be randomized.
I plan to collaborate with MIT
7
-
(a)
(b)
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Smile
Pr
obab
ility
Time (s)
Ad
Response
SmileTrack
Figure 2: a) Example frames of data collected using a web-based
framework similar to thatdescribed in Figure 1. b) A series of
frames from one particular video, showing an AU12(smile/amusement)
response. The smile track demonstrates how greater smile intensity
is posi-tively correlated with the probability output from the
classifier.
Red Channel
t1t2
tn
Green Channel
t1t2
tn
Blue Channel
t1t2
tn
Red Signal
Green Signal
Blue Signal
Separated Source 1
Separated Source 2
Separated Source 3
t1t2
tn
(a) Automated face tracking (b) Channel separation
SignalSeparation
(c) Raw traces (d) Signal components (e) Analysis of BVP
Heart rate
Respiration rate
Heart rate variability
HF/LF
Figure 3: Graphical illustration of our algorithm for extracting
heart rate, respiration rate and heartrate variability from video
images of a human face as described in [27].
8
-
Physiology
HR, RR, HRV
Facial Behavior
Head Gestures
Valence
Arousal
Attention
EectEmotion
Story/Narrative
Stimuli Measured Response
Likeability
Memory
Persuasion
Purchase
Sharing
Controlled Processing
Figure 4: Schematic of the proposed research model. Inspired by
Barrett et al.s dual-process viewof emotion [4]. The measured
responses will capture information about the valence, arousal
andattention of the viewer and will be used to predict the effects
of the story/narrative.
Media Lab member companies in order to obtain sales data related
to the advertisements.
4.3 StudiesI propose to carry out a series of studies in this
research. A preliminary study has already beenperformed and was the
first-in-the-world attempt to collect facial responses to videos on
a largescale over the Internet. This involved testing three
commercials which appeared during the 2011Super Bowl. The website
was live for over a year and can be found at [1]. Visitors to the
websitewere asked to opt-in to watch short videos and have their
facial expressions recorded and analyzed.Immediately following each
video, visitors completed a short self-report questionnaire. The
videosfrom the webcam were streamed in real-time at 15 frames a
second at a resolution of 320x240 toa server where automated facial
expression analysis is performed. Approximately 7,000 videoswere
collected in this study. This data will be used to build models for
predicting advertising likingpurely from automatically measured
behavior. In addition, I will investigate whether advertisingliking
can be predicted effectively from only a subset of the response
(e.g. the first 25% or 50%).
The second study will extend the framework and methodology used
in the first study to a muchgreater number of commercials and I
will extend the self-report questioning to cover more
in-depthquestions. Specifically, I will be collecting and analyzing
data for 150 viewers and 16 commercials(with each viewer watching a
subset of the commercials). Video recordings of the
participantsresponses to the content will be collected and analyzed
as described in the Methodology section.Self-report measures of
persuasion, likability and familiarity will be recorded (post
viewing Likertscale reports). Pre- and post-launch sales data for
the products will be available. The videoscollected in this study
will be of a similar quality as above (resolution: 320x240, frame
rate: 15fps). This dataset will allow me to extend the modeling
carried out in the preliminary study tobuild and evaluate models
for predicting likability, persuasion and sales.
9
-
The third study I propose will be collecting and analyzing data
for a set of advertisement con-cepts around different product
ranges. This will involve approximately 100 viewers watching
mul-tiple (2 or 3) advertisement concepts. Self-report measures of
persuasion, likability and familiaritywill be recorded. This study
will compare similar but different advertising concepts for the
sameproduct. I will investigate the ability for measured emotional
responses to distinguish between theefficacy of subtly different
concepts for the same product.
The structure of the latter two studies will allow for richer
data to be collected and a morecontrolled experimental design
whilst still allowing us to collect naturalistic and spontaneous
datain-the-wild. I will investigate the role of facial behavior and
head gestures, HR, RR and HRV inpredicting the variables of
persuasion, likability and sales. The dimensions of valence,
arousal andattention will be modeled as latent variables within the
model.
As described above I will be carrying out small-scale lab based
studies to evaluate the accuracyof the physiological measurement
under a greater range of conditions. This will involve a
smallernumber of participants (10-20) viewing content on a computer
or laptop whilst a video is recordedof their face. The method will
be evaluated by its correlation with, and accuracy when comparedto,
measurements from contact sensors. Data for 16 participants has
been collected already, ifnecessary further data collection can be
performed. For these experiments recruitment can be fromthe local
community.
4.4 Plan for Completion of the ResearchTable 1 shows my
tentative plan for completion of the research described in this
proposal.
Timeline Work ProgressJanuary-March 2011 Analysis of Data from
preliminary study completedApril-June 2012 Design of studies
ongoingSeptember-November 2012 Implementation of studies
plannedNovember-March 2013 Analysis of data collected plannedMarch
2013 First thesis outline plannedApril-June 2013 Complete analysis
of study data plannedJuly 2013 Second thesis outline
plannedAugust-December 2013 Thesis writing plannedJanuary-February
2014 Thesis defense planned
Table 1: Plan for completion of my doctoral thesis research.
4.5 Human Subjects ApprovalThe protocol for all studies will be
approved by the Massachusetts Institute of Technology Com-mittee On
Use of Humans as Experimental Subjects (COUHES).
10
-
4.6 CollaborationsI will be collaborating with Thales Teixeira
at Harvard Business School on the modeling of effec-tiveness based
on emotional responses. I will be working at Affectiva for one
semester in order tocomplete parts of the data collection
described. I will be building on the data collection frameworkand
using the facial action unit detection algorithms.
5 BiographyDaniel McDuff is a PhD candidate in the Affective
Comput-ing group at the MIT Media Lab. McDuff received his
bach-elor degree, with first-class honors, and master degree in
engi-neering from Cambridge University. Prior to joining the
MediaLab, he worked for the Defense Science and Technology
Labo-ratory (DSTL) in the UK. He is interested in using computer
vi-sion and machine learning to enable the automated recognitionof
affect, particularly in the domain of storytelling and
advertis-ing.
Email: [email protected]: media.mit.edu/djmcduff
References[1] Web address of data collection site:
http://www.forbes.com/2011/02/28/detect-smile-webcam-affectiva-mit-
media-lab.html.
[2] Z. Ambadar, J.F. Cohn, and L.I. Reed. All smiles are not
created equal: Morphology and timing of smilesperceived as amused,
polite, and embarrassed/nervous. Journal of nonverbal behavior,
33(1):1734, 2009.
[3] T. Ambler, A. Ioannides, and S. Rose. Brands on the brain:
Neuro-images of advertising. Business StrategyReview, 11(3):1730,
2000.
[4] L.F. Barrett, K.N. Ochsner, and J.J. Gross. On the
automaticity of emotion. Social psychology and the uncon-scious:
The automaticity of higher mental processes, pages 173217,
2007.
[5] A.L. Biel. Love the ad. buy the product? Admap, September,
1990.
[6] P.D. Bolls, A. Lang, and R.F. Potter. The effects of message
valence and listener arousal on attention, memory,and facial
muscular responses to radio advertisements. Communication Research,
28(5):627651, 2001.
[7] J.F. Cohn, Z. Ambadar, and P. Ekman. Observer-based
measurement of facial expression with the Facial ActionCoding
System. Oxford: NY, 2005.
[8] R.R. Cornelius. The science of emotion: Research and
tradition in the psychology of emotions. Prentice-Hall,Inc,
1996.
[9] C. Darwin, P. Ekman, and P. Prodger. The expression of the
emotions in man and animals. Oxford UniversityPress, USA, 2002.
[10] P. Ekman. Facial expression and emotion. American
Psychologist, 48(4):384, 1993.
[11] P. Ekman, W.V. Freisen, and S. Ancoli. Facial signs of
emotional experience. Journal of Personality and SocialPsychology,
39(6):1125, 1980.
11
-
[12] P. Ekman and W.V. Friesen. Facial action coding system.
1977.
[13] W. Gordon. What do consumers do emotionally with
advertising? Journal of Advertising research, 46(1), 2006.
[14] M.C. Green. Transportation into narrative worlds: The role
of prior knowledge and perceived realism. DiscourseProcesses,
38(2):247266, 2004.
[15] M.C. Green, J.J. Strange, and T.C. Brock. Narrative impact:
Social and cognitive foundations. LawrenceErlbaum, 2002.
[16] H. Gunes, M. Piccardi, and M. Pantic. From the lab to the
real world: Affect recognition using multiple cues andmodalities.
Affective computing: focus on emotion expression, synthesis, and
recognition, pages 185218, 2008.
[17] R.I. Haley. The arf copy research validity project: Final
report. In Transcript Proceedings of the Seventh AnnualARF Copy
Research Workshop, 1990.
[18] R.L. Hazlett and S.Y. Hazlett. Emotional response to
television commercials: Facial emg vs. self-report. Journalof
Advertising Research, 39:724, 1999.
[19] M. E. Hoque and R.W. Picard. Acted vs. natural frustration
and delight: many people smile in natural frustration.In Automatic
Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE
International Conference on.IEEE, 2011.
[20] A. Lang. Involuntary attention and physiological arousal
evoked by structural features and emotional content intv
commercials. Communication Research, 17(3):275299, 1990.
[21] M.D. Lieberman, N.I. Eisenberger, M.J. Crockett, S.M. Tom,
J.H. Pfeifer, and B.M. Way. Putting feelings intowords.
Psychological Science, 18(5):421, 2007.
[22] D. McDuff, R. El Kaliouby, and R. Picard. Crowdsourced data
collection of facial responses. In Proceedings ofthe 13th
international conference on Multimodal Interaction. ACM, 2011.
[23] D. J. McDuff, R. E. Kaliouby, and R. W. Picard.
Crowdsourcing Facial Responses to Online Videos. IEEETransactions
on Affective Computing, 2012.
[24] A. Mehta and S.C. Purvis. Reconsidering recall and emotion
in advertising. Journal of Advertising Research,46(1):49, 2006.
[25] B. Parkinson and A.S.R. Manstead. Making sense of emotion
in stories and social life. Cognition & Emotion,7(3-4):295323,
1993.
[26] M.Z. Poh, D.J. McDuff, and R.W. Picard. Non-contact,
automated cardiac pulse measurements using videoimaging and blind
source separation. Optics Express, 18(10):1076210774, 2010.
[27] M.Z. Poh, D.J. McDuff, and R.W. Picard. Advancements in
noncontact, multiparameter physiological measure-ments using a
webcam. Biomedical Engineering, IEEE Transactions on, 58(1):711,
2011.
[28] M.L. Ray and R. Batra. Emotion and persuasion in
advertising: What we do and dont know about affect.Graduate School
of Business, Stanford University, 1982.
[29] K.R. Scherer and P. Ekman. Methodological issues in
studying nonverbal behavior. Handbook of methods innonverbal
behavior research, pages 144, 1982.
[30] N. Schwarz and F. Strack. Reports of subjective well-being:
Judgmental processes and their methodologicalimplications.
Well-being: The foundations of hedonic psychology, pages 6184,
1999.
[31] M. Suwa, N. Sugie, and K. Fujimora. A preliminary note on
pattern recognition of human emotional expression.In International
Joint Conference on Pattern Recognition, pages 408410, 1978.
[32] G.W. Taylor, I. Spiro, C. Bregler, and R. Fergus. Learning
Invariance through Imitation. In Proceedings of IEEEConference on
Computer Vision and Pattern Recognition, 2011.
[33] T. Teixeira, M. Wedel, and R. Pieters. Emotion-induced
engagement in internet video ads. Journal of MarketingResearch,
(ja):151, 2010.
12
-
[34] J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J.
Movellan. Toward practical smile detection. PatternAnalysis and
Machine Intelligence, IEEE Transactions on, 31(11):21062111,
2009.
[35] F.H. Wilhelm and P. Grossman. Emotions beyond the
laboratory: Theoretical fundaments, study design, andanalytic
strategies for advanced ambulatory assessment. Biological
Psychology, 84(3):552569, 2010.
[36] P. Winkielman, G.G. Berntson, and J.T. Cacioppo. The
psychophysiological perspective on the social mind.Blackwell
handbook of social psychology: Intraindividual processes, pages
89108, 2001.
[37] R.B. Zajonc. Feeling and thinking: Preferences need no
inferences. American psychologist, 35(2):151, 1980.
[38] Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang. A survey
of affect recognition methods: Audio, visual, andspontaneous
expressions. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 31(1):3958, 2009.
13
-
Committee BiographiesJeffrey CohnProfessor of
PsychologyUniversity of Pittsburg
Jeffrey Cohn is Professor of Psychology at the University of
Pittsburgh and Adjunct Facultyat the Robotics Institute at Carnegie
Mellon University. He has led interdisciplinary and
inter-institutional efforts to develop advanced methods of
automatic analysis of facial expression andprosody; and applied
those tools to research in human emotion, social development,
non-verbalcommunication, psychopathology, and biomedicine. He
co-chaired the 2008 IEEE InternationalConference on Automatic Face
and Gesture Recognition (FG2008) and the 2009
InternationalConference on Affective Computing and Intelligent
Interaction (ACII2009). He has co-editedtwo recent special issues
of the Journal of Image and Vision Computing. His research has
beensupported by grants from the National Institutes of Health,
National Science Foundation, AutismFoundation, Office of Naval
Research, Defense Advanced Research Projects Agency, and the
Tech-nical Support Working Group.
Ashish KapoorSenior Research ScientistMicrosoft Research,
Redmond
Ashish Kapoor is a researcher with the Adaptive Systems and
Interaction Group at MicrosoftResearch, Redmond. He is focusing on
Machine Learning and Computer Vision with applicationsin User
Modelling, Affective Computing and Computer-Human interaction
scenarios. Ashish dida PhD at the MIT Media Lab and his Doctoral
thesis looked at building Discriminative Modelsfor Pattern
Recognition with incomplete information (semi-supervised learning,
imputation, noisydata etc.). Most of the earlier work focused on
building new machine learning models for affectrecognition. A
significant part of that work involved automatic analysis of
non-verbal behaviorand physiological responses.
Thales TeixeiraAssistant Professor of Business
AdministrationHarvard Business School
Thales Teixeira is Assistant Professor in the Marketing
Department of the Harvard Business School.His research focuses on
the economics of attention. He explores the rules of (implicit)
transactionof attention in a marketplace in which consumer
attention is a scarce resource, arguably evenscarcer than money or
time. His work has also appeared in Marketing Science. He received
hisPhD in Business from University of Michigan and holds a Master
of Arts in Statistics (Universityof Sao Paulo, Brazil) and a
Bachelor of Arts in Administration (University of Sao Paulo,
Brazil).Before entering academia, he consulted for companies such
as Microsoft and Hewlett-Packard. AtHarvard, he teaches an MBA
course in Marketing.
14