Top Banner
Natural Language Processing to Improve Student Engagement Becky Passonneau November 30, 2017
36

Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Jan 22, 2018

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Natural Language Processing

to Improve Student Engagement

Becky Passonneau

November 30, 2017

Page 2: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

CollaboratorsPyramid content evaluation: Ani Nenkova, (Columbia, U Penn; 2004)

Automated scoring by unigram overlap: Ani Nenkova, Aaron Harnly, Owen Rambow (Columbia; 2005)

Automated scoring by distributional semantics: Emily Chen, Ananya Poddar, Guarav Gite (Columbia; 2013

- 2016)

Comparison to educational rubric (main ideas): Dolores Perin (Teachers; 2013 - 2016)

Automated pyramid and scoring by triple extraction and similarity graphs based on WordNet: Qian Yang

(Tsinghua; PSU; 2016), Alisa Krivokapic (Columbia; 2016)

Automated pyramid and scoring by parsing, distributional semantics, and novel bin packing algorithm:

Yanjun Gao (Penn State; 2017)

2

Page 3: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Asking students to summarize the main ideas of a text helps

their reading and writing

3

Page 4: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Psychologists have posited three cognitive processes involved

in summarization:

● selection of important ideas

● generalization to omit detail

● inference of implicit connections

4

Page 5: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Summaries that are equally good will have some ideas in

common, and some differences

Very much like a Venn diagram

5

Page 6: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Summaries that are equally good will have some ideas in

common, and some content differences

Idea 2

Idea 3

Idea 1

Idea 4

Idea 8

Idea 7

Idea 9

Idea 11

Idea 10

Idea 12

Idea 5

Idea 4

Idea 6

Very much like a Venn diagram

6

Page 7: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Designing a reliable rubric to measure how many important ideas each

summary contains is labor intensive and potentially subjective

7

Page 8: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Summaries are concise

● Each idea is expressed once○ Selection of important ideas

○ Omission of unnecessary detail

● Content evaluation task has two steps

○ Define a standard from expert summaries -- the distinct ideas weighted by

importance

○ Compare the summaries to the standard -- quantify the proportion of important

ideas

8

Page 9: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Pyramid summary content annotation builds a content model of distinct ideas

from summaries written by a wise crowd (size N)

9

CU 2

CU 3

CU 1

CU 4

CU 8

CU 7

CU 9

CU 11

CU 10

CU 12

CU 5

CU 4

CU 6

What is pyramid content analysis?

Importance of ideas (content units, or CUs)

● Emerges from the wise crowd

● Distinguishes quality of ideas by quantity of occurrence

● Simple but effective

Page 10: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Pyramid summary content annotation builds a content model of distinct ideas

from N reference summaries written by a wise crowd (size N)

10

CU 2

CU 3

CU 1

CU 4

CU 8

CU 7

CU 9

CU 11

CU 10

CU 12

CU 5

CU 4

CU 6

A list of all the distinct ideas or Content Units (CUs), and their

weights, i.e., how many summaries each occurs in

TEXT: WHAT IS MATTER

CU 1: Matter is classified by physical and chemical properties

W=3

CU 3: All matter has energy

W=2

. . .

CU 12: Matter can be a solid, liquid or gas

W=1

What is pyramid content analysis?

Page 11: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

W=5

W=4

W=3

W=2

W=1

A Pyramid of CUs from a wise crowd of 5

What is wide crowd content analysis?

Page 12: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Application of Pyramid Content Model

● In a new summary, find all the phrases that mention a model CU

● Sum the weights of the mentioned CUs

● Normalize the sum

12

5

4

2

Raw sum = 5 + 4 + 2 = 11

What is wide crowd content analysis?

Page 13: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Normalization

● A summary can express each CU once at most

● Sum the weights of the identified CUs

● Normalize the sum in one of two ways:

○ QUALiTY: The maximum sum of weights for the same number of CUs

Did the summary mention mostly important ideas?

○ COVERAGE: The maximum sum of weights for the average number of CUs in

the reference summaries

Did the summary mention most of the important ideas?

13What is wide crowd content analysis?

Page 14: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

● 9 reference summaries

● All content models with m

summaries, for m ∈ [1,9]

● All pairs of summaries A, B where

A > B using 9 reference summaries

● Result○ The variance around scores for A and B

diverges given 4 to 5 references

● Conclusion○ No misranking with 5 references

14How reliable is wide crowd content analysis?

How reliable is it? Can misranking errors occur?

Page 15: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Five additional reliability tests

15

Number of reference summaries for probability of

misranking to be ≤ 0.1

5

Number of reference summaries for scores to

correlate with gold standard scores

5

Interannotator agreement on content model, 10

different pairs of models

0.71 to 0.89

Interannotator agreement on application of content

model to new summaries, 5 models

0.77 to 0.81

Correlation of scores of 16 systems using different

content models

0.71 to 0.96

How reliable is wide crowd content analysis?

Page 16: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Key differences between manual and automated methods:

Humans

● Consider a few alternative segmentations

● Sameness of meaning is a binary (yes-no) judgement

Automated methods

● Consider many possible segmentations

○ Simpler decisions

○ Many more of them

● Metric for similarity of meaning is graded from 0 to 1

● Must select the optimal segmentations and meaning similarities

16How did we automate wise crowd content analysis?

Page 17: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Human segmentation into “ideas” and similarity

Sentence: Matter can be measured because it contains volume and mass

CU106: Matter has volume and mass (W=4)Ref Sum 1: because it contains both volume and mass

Ref Sum 2: it takes up space defined as volume and contains . . . mass

Ref Sum 3: Matter is anything that has mass and takes up space (volume)

Ref Sum 4: Matter contains volume and mass

17How did we automate wise crowd content analysis?

Page 18: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Three Automated Methods

● No large scale machine learning required

● All components are pre-trained

● Requires only 5 wise-crowd summaries on same summarization task

18How did we automate pyramid content analysis?

Page 19: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Three Automated Methods

● PyrScore: Requires existing manual content model

○ Brute force segmentation -- considers all possibilities

○ Distributional (statistical) semantics

● PEAK:

○ Open Information Extraction tools extracts subj-pred-obj triples

○ Symbolic semantics (WordNet)

● PyrEval:

○ Sentence decomposition into clauses

○ Distributional (statistical semantics)

19How did we automate pyramid content analysis?

Page 20: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PyrScore Segmentation: Brute Force

● Calculates all ngram segmentations of each sentence in a new summary

All | matter | has | energy | volume | and | mass 7 unigrams

All | matter | has | energy | volume | and | mass 5 unigrams + 1 bigram

All | matter | has | energy | volume | and | mass 5 unigrams + 1 bigram

. . .

All matter has | energy | volume | and | mass 4 unigrams + tri gram

. . .

All matter has energy volume and mass 1 7gram

20How did we automate wise crowd content analysis?

Page 21: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PyrScore Semantics

● Generates a latent vector representation of each phrase in a CU

CU106: Matter has volume and mass (W=4)because it contains both volume and mass

it takes up space defined as volume and contains a certain amount of material defined as mass

Matter is anything that has mass and takes up space (volume)

Matter contains volume and mass

21How did we automate wise crowd content analysis?

● Latent semantics:

○ Weighted Text Matrix Factorization (WTMF;

Guo and Diab, 2012)

○ Assigns small weight to unseen words

○ Word vectors trained offline

Page 22: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PyrScore Scoring

● Generates a WTMF vector representation of each CU phrase

● Generates a WTMF vector representation of each segment in a new

summary

● Similarity to CU is the average cosine similarity to all phrases in the CU

● Optimal assignment of candidate ngrams to CUs

○ A maximum weighted independent set problem

○ Applies a greedy algorithm (WMIN; Sakai et al 2003)

22How did we automate wise crowd content analysis?

Page 23: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PEAK (Pyramid Evaluation by Automated Knowledge Extraction)

● Segmentation: Applies Open Information Extraction tools to extract Subj-

Pred-Obj (SPO) triples from sentences

Matter can be detected and measured because it contains volume and mass

Subj(Matter) Pred(Detected and measured) Obj(because it contains volume and

mass)

Subj(Matter) Pred(contains) Obj(volume and mass)

. . .

● Semantics: Uses explicit representation of meaning (random walks over

WordNet)

23How did we automate wise crowd content analysis?

Page 24: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PEAK Aligns SPO Triples

● From different reference summaries to construct the model

● Uses a hypergraph

○ Triples are hyperedges of SPO nodes

○ Edges between nodes are semantic similarity

● Each CU is a weighted triple

24How did we automate wise crowd content analysis?

Page 25: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PEAK Aligns SPO Triples

● Each CU is a weighted triple

● New summary is a list of triples

● Edges in bipartite graph added from CUs to SPOs

if semantic similarity ≥ 0.50

● Uses the Munkres-Kuhn algorithm with CU

weights as edge costs

25How did we automate wise crowd content analysis?

Page 26: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PyrEval extends PyrScore

● Builds full pyramid using new weighted independent set algorithm

● Decomposes sentences into syntactically meaningful units (roughly clauses)

● Uses the same distributional semantics○ WTMF performs better than Word2Vec

○ WTMF performs better than Glove

● Uses the same scoring algorithm

26

Page 27: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PyrEval constructs a pyramid by a novel set allocation method

● Nested sets

○ Every sentence has a set of segmentations, only one of which can be

selected

○ Every CU is a set of segments, each from a different summary

○ Every pyramid layer is a set of CUs

27

Page 28: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

EDUA: Emergent discovery of units of attraction

28

● Constructs a graph

○ Nodes are segments

○ Edges weighted by force of “attraction” (e.g.,

semantic similarity)

● Edge types

○ Dashed edges: attraction(ni,nj) > 𝛂○ Solid edges: connect segments into CUs

Page 29: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Assignment of segments to a CU obeys constraints

● Maximize the average Weighted Avg Similarity within each pyramid layer n

● Capacity of each layer y given segments x

● Relative size of each layer

● No empty layers

● One segmentation per sentence; at most one CU per segment

29

Page 30: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

PyrEval and humans construct similar pyramids

● CUs: 69 (PyrEval) versus 60 (Annotator 1) or 46 (Annotator 2)

● Similar distribution○ PyrEval: 1 w5, 2 w4, 7 w3, 22 w2, 37 w1

○ Annotator 1: 3 w5, 7 w4, 13 w3, 15 w2, 22 w1

● Example same weight○ PyrEval (w5): Physical props can occur without changing the identity or nature of the matter

○ Annotator 1 (w5): Physical props can be observed without changing the identity of the matter

● Example different weight○ PyrEval (w4): Unlike physical change, chemical change occurs when the chemical properties

of the matter have changed and a new substance is produced

○ Manual (w3): The difference between a physical change and a chemical change is that a

chemical change creates a new substance

30

Page 31: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

A Rubric for Contextualized Curricular Support

● From a study of 16 community college classrooms

● 120 students wrote summaries of a middle school text,

What is Matter?○ Read the passage

○ Answered main ideas questions

○ Wrote the summary

● Researchers identified 14 main ideas

● Main ideas score of a summary: % of main ideas ○ Included partial credit

○ Interrater reliability: Pearson correlation: 0.92

31What assessment rubric did we compare it to?

Page 32: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Pearson correlations of automated and manual methods

32

Correlation

PyrScore 0.95

PEAK 0.82

PyrEval 0.87

What were the results?

Page 33: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Pearson correlations of 120 Main Ideas scores and automated methods

33

Manual Test 120

PyrScore 0.83

PEAK 0.70

What were the results?

Page 34: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Content scores are transparent, can support feedback

● Does the summary have enough important ideas, given its length? (Quality

score)

● Does the summary have enough important ideas, given the set of possible

important ideas (Coverage score)

● Does the summary have a good balance of both (Comprehensive score)

● Which important ideas were expressed?

● Which important ideas were missed?

34

Page 35: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Conclusion

● Wise Crowd Content Analysis

○ Works well to identify important ideas

○ Importance emerges from the wise crowd

○ Correlates with an independently developed main ideas rubric

○ Requires only 5 reference summaries

● Fully automated methods: PyrEval and PEAK

○ Pretrained methods, and parameter tuning on small development set

○ Perform less well if sentences are very complex (e.g., automatic

summarizers on newswire)

○ Potential to inform revision

35Conclusion

Page 36: Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

What’s Next? Content assessment of essays

● Same ideas are referenced multiple times in the same essay, through

multiple means

○ Paraphrase, definite descriptions (“the evidence shown

here”), deictic pronouns (“This indicates . . .”)

○ Will require more complex methods to detect “the same” idea

● Discourse structure and function

○ Interrelations among ideas within the text

○ Discursive versus argumentative

36What’s next?