Top Banner
Lexical Simplification
19

A subtask of text simplification Replacing words or short phrases by simpler variants in a context aware fashion Motivation To reach out to wider.

Dec 17, 2015

Download

Documents

Eric Carr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Lexical Simplification

Page 2: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Lexical Simplification

A subtask of text simplification Replacing words or short phrases by

simpler variants in a context aware fashion

Motivation To reach out to wider range of readers

having limited vocabulary▪ Children▪ People with low literacy level or cognitive

disability▪ Second language learners

Page 3: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Involved Processes

Identification of complex words or phrases

Substitute lookup Synonyms from thesaurus Distributional similarity

Context-based ranking

Page 4: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Examples

Technical Medical Language Hypertension risk factors include obesity,... High blood pressure risk factors include excessive

weight,... Legal Language

The Products transacted through the Service are... The Products managed through the Service are...

Low Literacy Readers Hitler committed terrible atrocities during the

second World War Hitler committed terrible cruelties during the

second World War

Page 5: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Related Approaches

Knowledge-based approach Using thesaurus, Wordnet Hard to capture all simplification contexts

Lexical simplification as paraphrasing Paraphrasing does not deal with complexity

reduction specifically Lexical simplification as machine

translation Requires a complex-simple parallel corpora Wikipedia-Simple Wikipedia corpora▪ Not comparable

Page 6: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Wikipedia: Resource for Lexical Simplification

Simple English Wikipedia (SEW) Edition of normal or Complex English

Wikipedia (CEW) written in simpler constructs with restricted vocabulary

Wikipedia for children, low literacy readers, second language readers etc.

121,095 content pages Semi-parallel to it’s complex counterpart

Resource: For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia, Yatskar et al.

Page 7: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Wikipedia: Evolution of an Article

Version 1

Version 2 Edits

Version n Edits

Page 8: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model

An Wikipedia article evolves from one version to other with different types of edits fix edits (): correction of grammar or

factual contents simplify (): simplification of lexical items

or phrases no-op (): no edit spam ():removal of spam

Page 9: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model

Edits in SEW versions are mix of different types of edits

The task Separate out only simple edits from

other edits

Page 10: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model

Definitions article in Wikipedia correspond to a title sequence of article versions caused

by successive edits for article A word or phrase if there is version in

that contains Lexical edit instances: ▪ in one version was changed to in the next

Page 11: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model

probability that is applied to probability of being modified to

under operation Probability that a phrase is edited to

Our interest Probability of for simplification edit

operation () ▪ Estimate

Page 12: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Simplifying Assumptions

For the sake of simplicity, discard spam edits ()

For no-op edit ()

Page 13: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Probability Estimates

Assumption occurrences of simplification in

ComplexEW are negligible in comparison to fixes▪ Only edits occur in ComplexEW

fraction of in containing modifications in

Probability estimation of fix edit

Page 14: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Probability Estimates

fraction of in containing modifications in

Assumption: probability of any particular fix operation being applied in SimpleEW is proportional to that in ComplexEW SimpleEW fix rate might be dampened because

already-edited ComplexEW articles are copied over

fix + simple edit

Page 15: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Probability Estimates

probability that A is changed to a different word in SimpleEW

Estimate of

Estimate of

Page 16: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Probability Estimates

Estimate of Fix operations are estimated from

ComplexEW

Page 17: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Probability Estimates

Estimate of Both and considered to occur in

SimpleEW

Page 18: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Edit Model: Probability Estimates

Page 19: A subtask of text simplification  Replacing words or short phrases by simpler variants in a context aware fashion  Motivation  To reach out to wider.

Lexical Simplification is Contextual

Resource: Putting it Simply: a Context-Aware Approach to Lexical Simplification, Biran et al.

Self Study