Detecting and Responding to Student Emotion
Post on 07-Apr-2017
7 Views
Preview:
Transcript
Detecting and Responding to Student Emotion within an
Online Tutor
Beverly Park WOOLFCollege of Information and Computer Sciences,
University of Massachusetts Amherst
•1
•Supported by•National Science Foundation; U.S Dept of Education;
•Bill & Melinda Gates Foundation
General Motivation
• What to do in the moment when students are frustrated, bored, etc.?– Increase Challenge? Decrease Challenge?– Provide extra scaffolds?– Provide “affective” scaffolds? What are those?– Encourage to stop and think about what is going on?
• How to measure changes in student affect, capturing micro-changes in student affective states?
Many people: •relate to computers in the same way they relate to
humans (Nass, 2010);•continue to engage in frustrating tasks significantly
longer after an empathic digital response (Picard); •have lowered stress levels after receiving an empathetic message from a digital character (Arroyo et al., 2009); •recalled more information when interacting with an
artist agent compared to scientist agent; •report reduced frustration and more general interest
when working with gender-matched characters.
People and Computers
In Summary
•4
• Empathetic characters help decrease students’ anxiety and boredom. • Simple 2D characters instead of 3D characters work well with students.• Non-natural language processing based tutoring system work well.• Learning companions that show empathy help with students’ negative
affective states,• Growth mindset messages provide a boost in student math learning. • Empathic messages yield higher math performance and learning.• Simple success/failure comments are harmful to students, in
comparison to other conditions
Negative Emotion and Learning
• Confusion is associated with learning under certain conditions
D'Mello, S. K., Lehman, B., Pekrun, R. & Graesser, A. C. (2014) Confusion Can be Beneficial For Learning. Learning & Instruction.
• Boredom reduces task performancePekrun, R., Goetz, T., Daniels, L., Stupinsky, R. & Perry, R. (2010) Boredom in Achievement
Settings: Exploring Control–Value Antecedents and Performance Outcomes of a Neglected Emotion. Journal of Educational Psychology. 102(3), 531-549.
• Boredom increases ineffective behaviors such as gamingBaker, R. S. J. d., D'Mello, S. K., Rodrigo, M. M. T. & Graesser, A. C. (2010). Better to Be Frustrated than Bored: The Incidence, Persistence, and Impact of Learners' Cognitive-
Affective States during Interactions with Three Different Computer-Based Learning Environments. International Journal of Human-Computer Studies. 68(4), 223-241.
Negative Valence Emotions
•6
Affective learning companions congratulate students on effort exerted and talk to them about their effort and learning.
Affective Learning Companions
Incorrect ResponseStudent effort shown/correct response
Student effort shown /incorrect response
Agent Emotion
Agents support frustrated students by acting helpful, bored, or confused.
Arroyo et al., AIED2009
Agent EmotionEffort Attribution Shrug High interest
Students believe agents are part of the learning experience, mentors. . . who are together with students against the computer, . . . who are more knowledgeable (most of the time) cognitively and
emotionally. Arroyo et al., AIED2009
Methodology
Measure students’ cognitive and affective attributes, (skills, motivation, engagement) in real-time.
Offer appropriate and timely interventions.
Measure the impact of each intervention
Reduced Frustration
•More Frustrated
•Less Frustrated
•Neutral•Frustration
•Level
Increased Interest•Less boredom for math at posttest time in LC condition. •+F(94,1)=3.4,p=.07
•More Interested
•More Bored
•Neutral•Interest•Level
Improved Confidence
Detect Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
Remediate Student Emotion– Teacher-based– Peer-based– Game-based
•14
AGENDA
•15
Use Sensors
•16
The Students
•Rural-Area High School in MA (35 students)•Geometry and Algebra classes
•UMASS 114 (29 students)•Math for Elementary School Teachers
•17
Detect Emotions with Each Sensor
Conclusion: a camera can detect most emotion.
The MathSpring System
• Detection of Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
• Remediation of Student Emotion– Teacher-based– Peer-based– Game-based
•19
AGENDA
•20
Can Models Reproduce Sensors?
Confidence InterestFrustration Excitement
Math Pretest
Math PosttestMath Self-concept
Before Tutor
After Tutor
Math Self-concept Math Value
Math Value Learning Orientation
Learning Orientation
Learned?Perception
Liked?Perception
•21
Measuring cognition, meta-cognition and affect.
Pre- and Post-tutor assessments of:• Self-concept in Mathematics (3 items)• Math Value/Liking (3 items)• Mastery Learning (Learning Orientation) 2 items• Mathematics Test (15 items)• Math emotions Baseline (4 items)
• Frustration/Ease, Confidence/Anxiety, • Interest/Boredom, Excitement/lack of Excitement
Post-tutor assessments only• Perception of Learning• Liking the tutoring software
•22
Models of EmotionsFrom Tutor-Context Variables Only
Linear Models to Predict Emotions Variables Entered in Stepwise Regression
Confidence InterestFrustration Excitement
# Hints Seen
Solved? 1st Attempt
# Incorrectattempts
Gender Ped. Agent
Seconds to1st Attempt
Time in Tutor
Seconds To Solve
Tutor Context Variables (for the last problem)
R=0.53 R=0.43 R=0.37R=0.49
•23
Models of Emotions with SensorsFrom Tutor-Context Variables and Sensors
Linear Models to Predict Emotions Variables Entered in Stepwise Regression
Confidence InterestFrustration Excitement
SitForwardStdev
“Concentrating”Max. Probability
Camera Facial Detection Software
SitForwardMean
Seat Sensor
# Hints Seen
Solved? 1st Attempt
# Incorrectattempts
Gender Ped. Agent
Seconds to1st Attempt
Time in Tutor
Seconds To Solve
Tutor Context Variables (for the last problem)Tutor Only All Sensors+Tutor
R=0.53 R=0.43 R=0.37R=0.49
R=0.72 R=0.70 R=NAR=0.82
Sensor Variables (Mean, Min, Max, Stdev for the last problem)
• Detection of Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
• Remediation of Student Emotion– Teacher-based– Peer-based– Game-based
•24
AGENDA
How are you feeling? Please rate your level of interest in this
•26
Students Self-Report EmotionsFour bipolar emotional axes
Resulting Data
How can we understandstudent affect?
How can student’s actual words be used?
Research to Detect Affect
Two major approaches to examine affect:•Categorical
– Each affect is considered a discrete category– Approach usually used in ITS
•Dimensional models of emotion– Valence (+/-), activation (high/low)– Locus of control (internal/external)– Previous results mixed
Self-report Methods• Student was prompted for affect every 5
problems or 8 minutes (whichever came first)– Asked either frustration, confidence, excitement
or interest each time• Asked to rate their affect on
5-point Likert scale and toanswer “why”
Phase 1: Open Coding
• 450 random responses from 2011 given to 5 coders
• Each coder independently created ~10 categories and tagged the responses using these, i.e., students words were an example of emotion XX;– Coders covers at least 70% of all responses– Could tag a response with multiple categories if
appropriate
Phase 2: Axial Coding• 3 coders used the resulting 5 schemes to create 10 final
categories:– IDK- student doesn’t know why they feel that way– Boring- student says they are bored/material is boring– Easy- student says material is easy– Hard- student says material is hard– Internal- student attributes feelings to self– External- student attributes feelings to something outside
self– Positive- valence of comment is positive– Negative- valence of comment is negative– Supportive- student says tutor is helpful or supports them– Unsupportive- student says tutor is not helpful or does not
support them
Phase 3: Application and Validation of Tags
• Four coders each coded the 2015 data using the 10 agreed upon categories
• Inter-rater reliability by Cohen’s Kappa• Used the coded data from the coder who
had highest agreement with others overall Highest Kappa between any 2 coders for Each Tag
Results & Analyses
Frequency of Each Code Out of a Total Sample Group Frequency of Each Code Out of a Total Sample Group (2015 N = 449; 2011 N = 464) (2015 N = 449; 2011 N = 464)
The tutor seems to improve in promoting positive student affect.
Results & AnalysesPercentage of reports containing each tag Percentage of reports containing each tag
broken down by affectbroken down by affect
20112011
20152015
More positive, less negative affect.More internal, less external affect.
Less boring material.
Results & Analyses
Frequent CombinationsFrequent Combinations
Discussion
• Many students tend to externalize affect– Especially when negative; “the problems are too
difficult”• Populations differed on when they reported the
material was “hard”• The tutor seems to be improving in promoting
positive student affect.
Representations of Affect
Use External Coders
•She looks “Angry”
•She looks “Angry”
•She looks “Angry”
Bromp Observation Method Protocol
•Ocumpaugh, J., Baker, R., and Rodrigo, M.M.T. •Baker-Rodrigo observation method protocol (BROMP) 1.0. Training manual version 1.0. Technical Report. New York, NY: EdLab. Manila, Philippines: Ateneo Laboratory for the Learning Sciences, 2012.
Inter-Rater Reliability
• Reliability is a decent “goodness” metric
• Reliability ≠ Validity
• Good face validity
• ≈ “Angry”
• ≈“Constipat
ed”?
Internal Representation ≈ Experience?
• Self Appraisal
• Self Report
• Relate an Experience to a Representation
•Can I understand how I feel?
Self-Report Requires Matching Experience to Representation
Let’s Address This
• Self-Report Method Reliability
• Establish Method to Measure Reliability
• Distinguish Relative Reliability of Different Methods
Participants
• Eighty One (81) Seventh Graders from two California Middle Schools
• Majority Latin American
• Close to California Median Income
Students Were Given
& the Following Stickers
Angry Anxious Bored Confident Confused
Enjoying Excited Frustrated Interested Relieved
Motivations For Stickers
• Words based on relevance in education and similarity to emotes (faces)
• Emoticons (faces) based on Broekens & Brinkman 2013 Affect Button– Has Extremes of Valence (Pleasure), Activation, &
Dominance– We chose each extreme for faces (2^3)
Averaged Self-Report Values
Can progress reports from virtual teachers improve
student interest, excitement? Given that they encourage students to
stop and reflect... And give them a choice…
Do progress reports have the capacity
to improve student interest and excitement?
What do researchers know about showing progress to students?
• Cognitive side: basic progress charts every 6 problems in an intelligent tutor led to higher learning gains
Arroyo et al. (2007). Repairing disengagement with non-invasive interventions. AIED 2007.
• Meta-cognitive side: progress reports reduce student gaming behaviors (e.g., hint abuse)
Arroyo et al. (2007)
Variance of Student Placements
Discussion
• Two things to address:
– Assuming These Results are “Representative”, What Does It Mean for Self-Report?
– “Mistakes were made…”
What Does This Mean for Self-Report?
• One Student’s “High Boredom” ≠ Another’s
• Student’s don’t see affect as fitting Russell’s “Core Affect” wheel
• Should inform decisions of whether to use & how much to trust self-report– Cheap & Easy– High Variance, but other measures are more
tightly controlled
Possible Sources of Error
• Experience ≠ Recall (Bieg et al)
• Culture & Representations
• Relative Ordering (us specific)
•Can I recall how
I felt?
Detect Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
Remediate Student Emotion– Teacher-based– Peer-based– Game-based
•56
AGENDA
Detect Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
Remediate Student Emotion– Teacher-based– Peer-based– Game-based
•57
AGENDA
•ACTIVATING EMOTION EXPERIMENTS
•DEACTIVATING EMOTION EXPERIMENTS
•Frustration, anxiety
•Boredom•Unexcitement
•Characters
•Student Progress Page
• NSF Cyberlearning DIP Collaborative Research: Impact of Adaptive Interventions on Student Affect, Performance and Learning (2.5 more years)
•ACTIVATING EMOTION EXPERIMENTS
•DEACTIVATING EMOTION EXPERIMENTS
•Frustration, anxiety
•Boredom•Unexcitement
•Characters
•Student Progress Page
The MathSpring System
The Student Progress Page
•61
Dovan Rai, PhD, WPI
•Pekrun, R., Elliot, A. J. & Maier, M. A. (2009)
In Ed Psych, well accepted that motivation/emotions influence achievement, but…
Three Experimental Conditions
•63
• No access to the Student Progress Page
• “My Progress” button was present (student choice)
• Prompt invitation to see “my progress” upon bored (disinterested) or unexcited
• Force student to see my progress when student bored (disinterested) or unexcited
Initial Results about receiving SPP
• SPP access indeed increased across conditions– no-button M = 1.3, – Button present M = 3.1, – SPP offered upon low affect M = 6.0, – SPP forced upon low affect M = 8.8.
• Confirmed that there were no differences between conditions in terms of baseline interest and excitement as measured by the pre-affect survey (ns).
• Our current Gold standard/Truth: self-reports• During Experiment:
– Ask students about their “interest” level and their “excitement” level every 7 minutes, on average.
Gathering Student Affect “During”
Low
High•Neutral / Middle
Resulting Models•Interest: Pearson's R= 0.464 , Kappa=0.281
•Interest Model=• 0.425 * INTERESTED PRE•- 0.253 * numMistakes•+ 0.140 * FRUSTRATION PRE•- 0.219 * ERRORS_INCOMP•+ 0.367 * SHINT Last3probs•+ 0.140 * EXCITED PRE•+ 0.535 * isSolved•+ 0.146 * LEARN_ORIENT PRE•+ 0.128 * AVOIDANCE PRE•+ 0.131 * OVERHELP PRE•- 0.088
•Excitement: Pearson's R= 0.431, Kappa=0.18
•Excitement Model=• 0.376 * LIKE MATH PRE•+ 0.260 * INTERESTED PRE•- 0.284 * ERRORS_INCOMP PRE•+ 0.017 * AverageLogTimeToSolve•+ 0.140 * LEARN_ORIENT•- 4.065 * GIVEUP SOFLast3•+ 0.162 * LEARNOWN PRE•- 0.197 * LEARNGOAL PRE•+ 0.150 * FRUSTRATED PRE•+ 0.149 * EXCITED PRE•+ 0.303
Bold = features about student interaction with tutor
Predictions of Interest/ExcitementDISINTDISINTNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALDISINTNEUTRALNEUTRALDISINTDISINTDISINTDISINTNEUTRALNEUTRALNEUTRALINTERINTERINTERINTERINTERINTERINTERINTERINTER
NEUTRALNEUTRALEXCNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALEXCNEUTRALNEUTRALNEUTRALUNEXCUNEXCNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALNEUTRALEXCEXCEXCEXCEXC
Use models to classify all student-problem interactions as low/neutral/high interest and low/neutral/high excitement
How to analyze changes in student affect, from moment to moment?
MARKOV CHAIN MODELS
Student Interest •Ten thousand Data Points N~230.
SPPAbsent
SPPPresent
SPP Prompted
SPP Forced
Student Excitement•Ten thousand Data Points N~230.
SPPAbsent
SPP•Present
SPP Prompted
SPP Forced
How to compare Markov Chain Models, quantitatively?
• Probability of following a specific path• What is the probability that a student will end
up excited, after 3 transitions?• Given that they started in a specific state?
Conclusions
• We have refined a methodology to analyze how specific interventions produce changes in [affective] states– Randomized Controlled trials Model
creation/application Markov Chain Models Path Analysis.
• Evidence that having the SPP present instead of absent can lead towards interest and excitement– Not shooting for an ideal policy, just a policy that
works to some extent, capable to compete against others
• Detection of Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
• Remediation of Student Emotion– Teacher-based– Peer-based– Game-based
•73
AGENDA
• Detection of Student Emotion– Sensors;– Mathematical models;– Student Self-Reports Emotion.
• Remediation of Student Emotion– Teacher-based– Peer-based– Game-based
•74
AGENDA
Remaining Questions
• What if students started unexcited/bored instead of neutral?
• Can we compute the probability of “improving” their affective state, regardless of where they started?
Future Work
• Look into more detail on:– Internal vs. external tag– Relationship between difficulty and affect
• Possible new affective constructions (e.g., persistence)
• Create better predictive models for specific reasons
• Use NLP to label the responses rather than have human coders do it.
Unresolved Questions• No clear evidence that encouraging the SPP at moments of low-
affect is better than simply having the button available• Choice in Button-Present condition might give students a sense of agency, that in turn might make them “feel good”, engaged, etc.
• Might be that intervening only based on the last report of affect is not good enough? Intervene after episodes of boredom/lack of excitement?
• Maybe SPP at moments of boredom/lack of excitement is not that great to repair those states.
• Maybe our models are not so great and that impacts both the affective paths and the results.
The Research Team!
•Ivon ArroyoIvon Arroyo
•Naomi WixonNaomi Wixon •Danielle AllessioDanielle Allessio •Kasia MuldnerKasia Muldner
•Winslow BurlesonWinslow Burleson •Beverly WoolfBeverly Woolf
Detecting and Responding to Student Emotion within an
Online Tutor
Thank You.
Any Questions?
•79
This research was funded by the National Science Foundation, #1324385, Cyberlearning DIP, Impact of Adaptive Interventions on Student Affect, Performance, and Learning; Burleson, Arroyo and Woolf (PIs). Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
top related