Using Computational Linguistics to Support Students and Teachers during Peer Review of Writing Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development Center Director, Intelligent Systems Program University of Pittsburgh Pittsburgh, PA 15217 USA Joint work with Professors K. Ashley, A. Godley & C. Schunn 1
43
Embed
Using Computational Linguistics to Support Students and Teachers during Peer Review of Writing Diane Litman Professor, Computer Science Department Senior.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Using Computational Linguistics to Support Students and Teachers during Peer Review of Writing
Diane Litman
Professor, Computer Science Department Senior Scientist, Learning Research & Development Center
Director, Intelligent Systems Program
University of PittsburghPittsburgh, PA 15217 USA
Joint work with Professors K. Ashley, A. Godley & C. Schunn1
Peer Review Research is a Goldmine for Computational Linguistics
New Educational Technology! Learning
Science at Scale!
Can we automate
human coding?
Outline
• SWoRD (Computer-Supported Peer Review)• Supporting Students with Review Scaffolding • Keeping Teachers Well-informed • Summary and Current Directions
SWoRD: A web-based peer review system[Cho & Schunn, 2007]
• Authors submit papers (or diagrams)• Peers submit reviews • Authors provide back-reviews to peers
Pros and Cons of Peer ReviewPros • Quantity and diversity of review feedback • Students learn by reviewing• Useful for MOOCs
Cons• Reviews are often not stated in effective ways• Reviews and papers do not focus on core aspects• Information overload for students and teachers
Outline
• SWoRD (Computer-Supported Peer Review)• Supporting Students with Review Scaffolding • Keeping Teachers Well-informed • Summary and Current Directions
The Problem
• Reviews are often not stated effectively• Example: no localization – Justification is sufficient but unclear in some parts.
• Our Approach: detect and scaffold– Justification is sufficient but unclear in the section on
African Americans.
Detecting Key Properties of Text Reviews
• Computational Linguistics to extract attributes from text, e.g.– Regular expressions (e.g. “the section about”)– Domain lexicons (e.g. “federal”, “American”)– Syntax (e.g. demonstrative determiners)– Overlapping lexical windows (quotation identification)
• Machine Learning to predict whether reviews contain properties correlating with feedback implementation– Localization – Solutions– Thesis statements
Paper Review Localization Model [Xiong, Litman & Schunn, 2010]
Localization in Diagram Reviews
Although the text is minimal, what is written is fairly clear.
Study 17 doesn’t have a connection to anything, which makes it unclear about it’s purpose.
• Pattern-based detection algorithm – Numbered ontology type, e.g. citation 15– Textual component content, e.g. time of day hypothesis– Unique component, e.g. the con-argument– Connected component, e.g. support of 2nd hypothesis– Numerical regular expression, e.g. H1, #10
12
Learned Localization Model
Localized?
yes
no
no
Pattern Algorithm = yes
yes no
yes
Pattern Algorithm = no
#domainWord> 2 #domainWord ≤ 2
windowSize> 16
windowSize≤ 16
windowSize≤ 12
windowSize> 12
#domainWord≤ 0
#domainWord> 0
13
Localization Scaffolding
Localization model
applied
Localization model applied
System scaffolds (if needed)
Reviewer makes
decision
A First Classroom Evaluation[Nguyen, Xiong & Litman, 2014]
• Computational linguistics extracts attributes in real-time• Prediction models use attributes to detect localization• Scaffolding if < 50% of comments predicted as localized • Deployment in undergraduate Research Methods
Results: Can we Automate?
Diagram review Paper reviewAccuracy Kappa Accuracy Kappa
Majority baseline 61.5%(not localized)
0 50.8% (localized)
0
Our models 81.7% 0.62 72.8% 0.46
• Comment Level
• Review Level Diagram review Paper review
Total scaffoldings 173 51
Incorrectly triggered 1 0
Results: New Educational Technology
Reviewer response REVISE DISAGREE
Diagram review 54 (48%) 59 (52%)
Paper review 13 (30%) 30 (70%)
• Response to Scaffolding
• Why are reviewers disagreeing? • No correlation with true localization ratio (diagrams)
A Deeper Look: Revision Performance# and % of comments
(diagram reviews)
NOT Localized → Localized 26 30.2%
Localized → Localized 26 30.2%
NOT Localized → NOT Localized 33 38.4%
Localized → NOT Localized 1 1.2%
• Comment localization is either improved or remains the same after scaffolding
A Deeper Look: Revision Performance# and % of comments
(diagram reviews)
NOT Localized → Localized 26 30.2%
Localized → Localized 26 30.2%
NOT Localized → NOT Localized 33 38.4%
Localized → NOT Localized 1 1.2%
• Open questions• Are reviewers improving localization quality?• Interface issues, or rubric non-applicability?
Other Results: Non-Scaffolded RevisionNumber (pct.) of comments of diagram reviews
Scope=In Scope=Out Scope=No
NOT Loc. → Loc. 26 30.2% 7 87.5% 3 12.5%
Loc. → Loc. 26 30.2% 1 12.5% 16 66.7%
NOT Loc. → NOT Loc. 33 38.4% 0 0% 5 20.8%
Loc. → NOT Loc. 1 1.2% 0 0% 0 0%
• Localization continues after scaffolding is removed
Outline
• SWoRD (Computer-Supported Peer Review)• Supporting Students with Review Scaffolding • Keeping Teachers Well-informed • Summary and Current Directions
21
Observation:Teachers rarely read peer reviews
• Challenges faced by teachers
– Reading all reviews (scalability issues)
– Simultaneously remembering reviews across students to compare and contrast (cognitive overload)
– Knowing where to start (cold start)
22
Solution: RevExplore• SWoRD
• RevExplore: An Interactive Analytic Tool for Peer-Review Exploration for Teachers[Xiong, Litman, Wang & Schunn, 2012]
Peer-review content
23
RevExplore ExampleWriting assignment:
“Whether the United States become more democratic, stayed the same, or become less democratic between 1865 and 1924.”
Reviewing dimensions:– Flow, logic, insight
• Goal– Discover student group difference in writing issues
• User study (extrinsic evaluation)– 1405 free-text reviews of 24 history papers– 46 recruited subjects
• Research questions– Are topic words useful for peer-review analytics?– Does the topic-word extraction method matter?– Do results interact with analytic goal, grading rubric,
and user demographics?
30
Topic Signatures in RevExplore• Domain word masking via topic signatures [Lin & Hovy,
2000; Nenkova & Louis, 2008]– Target corpus: Student papers– Background corpus: English Gigaword– Topic words: Words likely to be in target corpus (chi-square)
• Comparison-oriented topic signatures– User reviews are divided into groups
• High versus low writers (SWoRD paper ratings)• High versus low reviewers (SWoRD helpfulness ratings)
– Target corpus: Reviews of user group– Background corpus: Reviews of all users
31
Comparing Student Reviewers
Method Reviews by helpful students Reviews by less helpful studentsTopic Signatures Arguments, immigrants, paper,
Frequency Paper, arguments, evidence, make, also, could, argument paragraph
Page, think, argument, essay
33
Experimental Results• Topic words are effective for peer-review analytics– Objective metrics (e.g. correct identification of high vs.
low student groups)– Subjective ratings (e.g. “how often did you refer to the
original reviews?”)• Topic signature method outperforms frequency• Interactions with:– Analytic goal (i.e. reviewing vs. writing groupings)– Reviewing dimensions (i.e. grading rubric)– User demographics (e.g. prior teaching experience)
Outline
• SWoRD (Computer-Supported Peer Review)• Supporting Students with Review Scaffolding • Keeping Teachers Well-informed • Summary and Current Directions
35
SummaryComputational linguistics for peer review to improve both student reviewing and writing
• Scaffolding useful feedback properties– reviews are often not stated in effective ways
• Incorporation of argument diagramming– reviews and papers do not focus on core aspects
• Topic-word analytics for teachers– teacher information overload
• Deployments in university and high school classes
Current Directions• Additional measures of review quality– Solutions to problems [Nguyen & Litman, 2014]
– Argumentation [Falakmasir et al., 2014; Ong et al., 2014]
– Impact on paper revision [Zhang & Litman, 2014]
• New scaffolding interventions• Teacher dashboard – Review and paper revision quality – Topic-word analytics– Helpfulness guided review summarization • Talk at 2pm at Oxford tomorrow [Xiong & Litman, submitted]
Thank You!
• Questions?
• Further Information– http://www.cs.pitt.edu/~litman– http://www.pantherlearning.com