Top Banner
Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova
31

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for

Scientific Articles

Annie LouisUniversity of Pennsylvania

Advisor: Ani Nenkova

Page 2: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Quality of content and writing in the text

Useful to know text quality in different settings

Eg: Search Lots of relevant results Further rank by content and writing quality

Text quality: Well-written nature

2

Page 3: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Problem definition “Define text quality factors that are

1. generic (applicable to most texts) 2. domain-specific (unique to writing about science) and develop automatic methods to quantify them.”

Two types of science writing

3

1. Conference and journal publications

2. Science journalism

Page 4: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Application settings

4

2. Writing feedback 3. Science news recommendation

1. Evaluation of system-generated summaries Generic text

quality

Domain-specific text quality

Page 5: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Previous work in text quality prediction

Focus on generic indicators of text quality Word familiarity, sentence length, syntax, discourse

5

Machine-produced text• Summarization, machine translation

Human-written text• Predicting grade level of an article• Automatic essay scoring

1 2

Page 6: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Thesis contributions1. Make a distinction between generic and domain-

specific text quality aspects

2. Define new domain-specific aspects in the genre of writing about science

3. Demonstrate the use of these measures in representative applications

6

Page 7: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Overview1. Generic text quality factors and summary

evaluation

2. Predicting quality for science articles and applications

7

Page 8: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

I. Generic text quality- Applied to Automatic Summary

Evaluation

8

Page 9: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Automatic Summary Evaluation Facilitates system development

Lots of summaries with human ratings available From large scale summarization evaluations

Goal: find automatic metrics that correlate with human judgements of quality

9

1. Content quality- What is said in the

summary?

2. Linguistic quality- How it is conveyed?

Page 10: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

1. Content evaluation of summaries [Louis, Nenkova, 2009]

Input-summary similarity ~ summary content quality

Best way to measure similarity: Jensen-Shannon divergence

JSD: How much two probability distributions differ

Word distributions: ‘input’ I, ‘summary’ S

10

)||(2

1)||(

2

1)||( ASKLAIKLSIJS

ceKLdivergenKL

SIA

2

Page 11: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Performance of the automatic content evaluation method When systems are ranked by JS divergence

scores, the ranking correlates highly with human assigned ranks: 0.88

Among the best systems for evaluating news summaries

11

Page 12: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

2. Linguistic quality evaluation for summaries [Pitler, Louis, Nenkova, 2010]

Consider numerous aspects Syntax, referring expressions, discourse connectives,

1. Language models: familiarity of words A huge table of words and their probabilities in large

corpus of general text Use these probabilities to predict familiarity of new texts

2. Syntax: sentence complexity Parse tree depth Length of phrases

12

Page 13: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Performance of evaluation method3. Word coherence: flow between sentences

Learn conditional probabilites (w2/w1) where w1 and w2 occur in subsequent sentences from a large corpus

Use to compute likelihood of a new sentence sequence

The method is 80% accurate for ranking systems and evaluated on news summaries

13

Page 14: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Why domain-specific factors? Generic factors matter for most texts and give us

useful applications

What are other domain-specific factors?

They might aid developing other interesting applications

14

Page 15: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

II. Predicting quality of science articles

- Publications and science news

15

Page 16: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Science writing has distinctive characteristics Their function is different from informational texts

Academic writing in several genres involve properly motivating the problem and approach

Science journalists should create interest in research study among lay readers

16

Page 17: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Academic and Science News writing

17

… We identified 43 features … from the text and that could help determine the semantic similarity of two short text units. [Hatzivassiloglou et. al, 2001]

A computer is fed pairs of text samples that it is told are equivalent -- two translations of the same sentence from Madame Bovary, say. The computer then derives its own set of rules for recognizing matches. [Technology Review, 2005]

Page 18: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

My hypotheses Academic writing

1. Subjectivity: opinion, evaluation, argumentation2. Rhetorical zones: role of a sentence in the article

Science journalism1. Visual nature: aid explaining difficult concepts2. Surprisal: present the unexpected thereby creating

interest

18

Page 19: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

First challenge: Defining text quality Academic writing

Citations Annotations: are not highly correlated with citations

Science journalism New York Times articles from Best American Science

Writing books Negative examples are sampled from NYT corpus around

similar topic during the same time

19

Page 20: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Annotation for academic writing

20

Abstract, introduction, related work, conclusion Focus annotations using a set of questions

Introduction Why is this problem important? Has the problem been addressed before? Is the proposed solution motivated and explained?

Pairwise: Article A vs. Article B More reliable than ratings on a scale (1-5)

Page 21: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Text quality factors for writing about science

21

Academic writing

• Subjectivity• Rhetorical zones

Science news

• Surprisal• Visual quality

1 2

Page 22: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Subjectivity: Academic writing Opinions make an article interesting!

“Conventional methods to solve this problem are complex and time-consuming.”

22

Page 23: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Automatic identification of subjective expressions1. Annotate subjective expressions: clause level

2. Create a dictionary of positive/negative words in academic writing using unsupervised methods

3. Classify a clause as subjective or not, depending on polar words and other features

Eg. Context: subjective expressions often occur near causal relations and near statements which describe technique/approach

23

Page 24: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Rhetorical zones: Academic writing Defined for each sentence: function of the

sentence in the article

Previous work in this area have devised annotation schemes and have shown good performance on automatic zone prediction Used for information extraction and summarization

24

Aim … Background … Own work … Comparison

Page 25: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Rhetorical zones and text quality Hypothesis: good and poorly-written articles

would have different distribution and sequence of rhetorical zones

Approach Identify zones Compute features related to sizes of zones and likelihood under transition model of good articles

25

aim

motivation

exampleprior work

comparison

0.7

0.2

0.6

0.5

0.4

0.8

0.2

Sequences in good articles

Page 26: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

A simple authoring tool for academic writing Highlighting based feedback

Mark zone transitions that are less preferable

Low levels of subjectivity

26

Page 27: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Surprisal: Science news “Sara Lewis is fluent in firefly.”

Syntactic, lexical, topic correlates of surprise Surprisal under language model Parse probability Verb-argument compatibility Order of verbs Rare topics in news

27

Page 28: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Visual quality: Science news Large corpus of tags associated with images

Visual words and article quality Concentration of visual words Position in the article (lead, beginning of paragraphs) Variety in visual topics (tags from different pictures)

28

Lake, mountain, tree, clouds …

Visual words

Page 29: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Article recommendation for science news People who like reading science news

Ask for a preferred topic and show matching articles Ranking 1: based on relevance to keyword Ranking 2: incorporate visual and surprisal scores with

relevance

Evaluate how often ranking 2 is preferred

29

Page 30: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Summary In this thesis, I develop text quality metrics which

are Generic: Summary evaluation Domain-specific: Focused on scientific writing Evaluated in relevant application-settings

Challenges Defining text quality Technical approach Designing feedback in the authoring support tool

30

Page 31: Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Thank you!

31