Enriching the Web with Readability Metadata Kevyn Collins-Thompson Context, Learning, and User Experience for Search Group Microsoft Research PITR 2012 : NAACL HLT 2012 Workshop Predicting and improving text readability for target reader populations June 7, 2012 - Montréal
60
Embed
Enriching the Web with Readability Metadata Kevyn Collins-Thompson Context, Learning, and User Experience for Search Group Microsoft Research PITR 2012.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Enriching the Webwith Readability Metadata
Kevyn Collins-Thompson
Context, Learning, and User Experience for Search GroupMicrosoft Research
PITR 2012 : NAACL HLT 2012 WorkshopPredicting and improving text readability for target reader populations
June 7, 2012 - Montréal
Enriching the Web with Readability Metadata
Acknowledgements
Joint work with my collaborators:
Paul Bennett, Ryen White, Sue Dumais (MSR)Jin Young Kim (U. Mass.)
Sebastian de la Chica (Microsoft)Paul Kidwell (LLNL)
Guy Lebanon (GaTech)David Sontag (NYU)
Bringing together readability and the Web… sometimes in unexpected ways
We use the comparative and superlative form to compare and contrast different objects in English. Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York is the most exciting city in the USA.Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects We use the comparative and superlative form to compare and contrast different objects in English. Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York is the most exciting city in the USA.Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects
Syntax
Vocabulary
Coherence
Visual Cues
Topic Interest
Reading level predictionTopic prediction
Text Readability Modelingand Prediction
Search Engines
Bringing together readability and the Web… sometimes in unexpected ways
We use the comparative and superlative form to compare and contrast different objects in English. Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York is the most exciting city in the USA.Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects We use the comparative and superlative form to compare and contrast different objects in English. Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York is the most exciting city in the USA.Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects
Syntax
Vocabulary
Coherence
Visual
Topic Interest
Billions of pages, millions of sites, billions of users
Readability of content
Reading proficiency
and expertise of users
The Web
Enriching the Web with Readability Metadata
How Web interactions can be enriched with reading level metadata
• Prelude: Predicting reading level of Web pages• Web applications:
– Personalization [Collins-Thompson et al.: CIKM 2011]
– Search snippet quality– Modeling user & site expertise [Kim et al. WSDM 2012]
– Searcher motivation • Challenges and opportunities for readability
modeling and prediction
It’s not relevant …if you can’t understand it.
A search result should be at the reading level the user wants for that query.
Enriching the Web with Readability Metadata
Search engines try to maximize relevance but have traditionally ignored text difficulty
(at least, not immediately)
Intent Models Content ModelsMatching
Web pages occur at a wide range of reading difficulty levels
Query [insect diet]: Lower difficultyEnriching the Web with Readability Metadata
Medium difficulty [insect diet]
Enriching the Web with Readability Metadata
Higher difficulty [insect diet]
Enriching the Web with Readability Metadata
Users also exhibit a wide range of proficiency and expertise
• Students at different grade levels• Non-native speakers• General population
– Large variation in language proficiency– Special needs, language deficits– Familiarity or expertise in specific topic areas
• Even for a single user there can be broad variation in intent across search queries
Enriching the Web with Readability Metadata
Default results for [insect diet]
Enriching the Web with Readability Metadata
Relevance as seen by an elementary school student (e.g. age 10)
X Technical
X Technical
X Relevance
X Technical
X Relevance
X Relevance
X Technical
Enriching the Web with Readability Metadata
Blending in lower difficulty results would improve relevance for this user
X Technical
X Relevance
X Relevance
X Technical
Enriching the Web with Readability Metadata
Reading difficulty has many factors
• Factors include:– Semantics, e.g. vocabulary – Syntax, e.g. sentence structure, complexity– Discourse-level structure– Reader background and interest in topic– Text legibility– Supporting illustrations and layout
• Different from parental control, UI issues
Enriching the Web with Readability Metadata
Traditional readability measures don’t work for Web content
• Flesch-Kincaid (Microsoft Word)
• Problems include:– They assume the content has well-formed sentences– They are sensitive to noise– Input must be at least 100 words long
• Web content is often short, noisy, less structured– Page body, titles, snippets, queries, captions, …
• Billions of pages → computational constraints on approaches
• We focus on vocabulary-based prediction models that learn fine-grained models of word usage from labeled texts
• Documents can contain high-difficulty words but still be lower grade level• e.g. teaching new concepts
• We introduce a statistical model of (r, s) readabilityr : familiarity threshold for any word
A word w is familiar at a grade if known by at least r percent of population at that grade
s : coverage requirement for documentsA document d is readable at level t if s percent of the words in d are familiar at grade t.
• Estimate word acquisition age Gaussian (μw, σw) for each word w from labeled documents via maximum likelihood
• (r, s) parameters can be learned automatically or specified to tune the model for different scenarios
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Grade Level
The r parameter controls the familiarity threshold for words
Enriching the Web with Readability Metadata
“red” “perimeter”
qRED(0.80) = 3.5 qPERIMETER(0.80) = 8.2Level quantile for word w: qw (r)
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Grade Level
CDF
Suppose: p(“red” | d) = p(“perimeter” | d) = 0.5
The s parameter controls required document coverage
Enriching the Web with Readability Metadata
“red” “perimeter”
Predicted grade with s = 0.70: 8.8Predicted grade with s = 0.50: 3.5
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Grade Level
CDF
Multiple-word example
Enriching the Web with Readability Metadata
“red”
“perimeter”
“the”
“ants”“explored”
“The red ants explored the perimeter.”
Predicted grade with s = 0.70: 5.3
Enriching the Web with Readability Metadata
New metadata based on reading level
• Documents:– Posterior distribution over levels– Distribution statistics:
• Expected reading difficulty• Entropy of level prediction
– Temporal / positional series– Vocabulary models
• Key technical terms• Regions needing augmentation (Text, images, links to sources)
• Web sites:– Topic, reading level expectation and entropy across pages
• User profiles:– Aggregated statistics of documents and sites based on short- or long-term
search/browse behavior
1 2 3 4 5 6 7 8 9 10 11 120
0.050.1
0.150.2
0.250.3
Health article: Bronchitis, efficacy …
Local readability within a document Movie dialogue in “The Matrix: Reloaded”
Architect’s speech
Keanu Reeves
enters
MerovingianScene (French)
[Kidwell, Lebanon, Collins-Thompson. J. Am. Stats. 2011]
Enriching the Web with Readability Metadata
Enriching the Web with Readability Metadata
Application:Personalizing Search Results
by Reading Level
Enriching the Web with Readability Metadata
Personalization by modeling users and content
Desired reading level0
0.5
1
Content reading level
Re-ranker
Session
User and Intent
User profile Long-term
Short-term (this talk)
How could a Web search engine personalize results by reading level?
1. Model a user’s likely search intent:– Get explicit preferences or instructions from a user– Learn a user’s interests and expertise over time
2. Extract reading-level and topical features:– Queries and Sessions: (Query text, results clicked, … )– User Profile (Explicit or Implicit from history)– Page reading level, Result snippet level
3. Use these features for personalized re-ranking
Enriching the Web with Readability Metadata
Enriching the Web with Readability Metadata
A simple session model combines the reading levels of previous satisfied clicks
insect diet
grasshoppers
insect habits
Session reading level distribution
Enriching the Web with Readability Metadata
Typical features used for reading level personalization
What types of queries are helped most by reading level personalization?
• Gain for all queries, and most query subsets (205, 623 sessions)– Size of gain varied with query subset– Science queries benefited most in our experiment
• Beating the default production baseline is very hard: Gain ≥ 1.0 is notable• Net +1.6% of all queries improved at least one rank position in satisfied click
– Large rank changes (> 5 positions) more than 70% likely to result in a win
Enriching the Web with Readability Metadata
Point-Gain in Mean Reciprocal Rank of Last-SAT click
What features were most important for reading level personalization?
Enriching the Web with Readability Metadata
Session user model confidenceSession prev query count
Page levelSnippet level
Snippet-page diff confidenceQuery length (words)
Query vs snippetDale snippet difficulty
Snippet vs pageSession level vs pageQuery length (chars.)
Relative snippet difficultyReciprocal rank
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Average reduction in residual squared error over all trees and over all splits
relative to the most informative feature.
Enriching the Web with Readability Metadata
Application:Improving snippet quality
Enriching the Web with Readability Metadata
Users can be misled by a mismatch between snippet readability and page readability
Page Difficulty: High
Snippet Difficulty: Medium
Click!
Retreat!!
Users abandon pages faster when actual page is more difficult than the search result snippet suggested
Page harder than its result snippet
Page easier than its result snippet
Future goal:Expected snippet difficulty
should match the underlying document
difficulty
Enriching the Web with Readability Metadata
[Collins-Thompson et al., CIKM 2011]
Enriching the Web with Readability Metadata
Application:
Modeling expertise on the Webusing reading level + topic metadata
Enriching the Web with Readability Metadata
Topic drift can occur when the specified reading level changes
Results suggest that there are both expert (high RL) and novice (low RL) users for computer topics
User Reading Level against P(Topic)
Enriching the Web with Readability Metadata
Using reading level and topic together to model user and site expertise
Four features that aggregate metadata over pages:Reading level:
1. Expected reading level E(R) over site/user pages 2. Entropy H(R) of reading level over site/user pages
Topic:3. Top-K ODP category predictions over site/user pages4. Entropy H(T) of ODP category distribution for
site/user pages
Enriching the Web with Readability Metadata
Sites with low topic entropy (focused) tend to be expert-oriented
Website H(T|S) T1 P1 T2 P2 T3 P3www.prosportsdaily.com 0.83 Sports 0.74 Sports/Football 0.26www.organize.com 0.91 Shopping 0.67 Shop/Home&Garden 0.33www.trulia.com 0.92 Business 0.78 Society 0.18 Bus./Construction 0.04www.fandango.com 0.95 Arts 0.63 Arts/Movies 0.36www.hobbytron.com 0.96 Recreation 0.62 Shopping 0.38
Sites with focused topical content: Low Entropy, H(T|S) < 1
Enriching the Web with Readability Metadata
Sites with high topic entropy (breadth) tend to be for general audiences
Website H(T|S) T1 P1 T2 P2 T3 P3www.prosportsdaily.com 0.83 Sports 0.74 Sports/Football 0.26www.organize.com 0.91 Shopping 0.67 Shop/Home&Garden 0.33www.trulia.com 0.92 Business 0.78 Society 0.18 Bus./Construction 0.04www.fandango.com 0.95 Arts 0.63 Arts/Movies 0.36www.hobbytron.com 0.96 Recreation 0.62 Shopping 0.38
Website H(T|S) T1 P1 T2 P2 T3 P3ezinearticles.com 4.27 Business 0.12 Health 0.09 Home 0.08www.dummies.com 4.28 Computers 0.17 Computers/HW 0.09 Business 0.08en.allexperts.com 4.38 Recreation 0.12 Home 0.09 Recreation/Pets 0.07phoenix.about.com 4.38 Recreation 0.12 Society 0.09 Arts 0.07www.wisegeek.com 4.40 Health 0.12 Business 0.10 Science 0.09
Sites with focused topical content: Low Entropy, H(T|S) < 1
Sites with very broad topical content: High Entropy : H(T|S) > 4
Enriching the Web with Readability Metadata
Reading level entropy measures breadth of a site’s content difficulty
Which features were most correlated with site expertise?
Baseline(predict most likely class) 65.8%
Classifier accuracy 82.2%
Feature Correl. with Expertness Description
DivRLT(U,s) -0.56 Distance of visitors’ RLT profile from site's
DivT(U,s) -0.55 Distance of visitors’ Topic profile from site's
DivRT(U) -0.45 Average distance among visitors’ RLT profile
E[R|s] +0.23 Expectation of Site's RL
E[R|Qs] +0.34 Expectation of Surfacing Query's RL
E[R|Us] +0.44 Expectation of Visitor's RL
Enriching the Web with Readability Metadata
Application:Searcher motivation
Enriching the Web with Readability Metadata
Readability metadata may also help predict when searchers are highly motivated
• Sites that are popular but also have large difference from average reading level
Website Type of site
socialsecurity.gov Government retirement/disability
collegeboard.com Entrance exam preparation, college application help
softwarepatch.com Find software patches
fileinfo.com Find programs to open file types
msdn.microsoft.com Technical reference
Enriching the Web with Readability Metadata
‘Stretch’ tasks: what are people searching for when they deviate from their typical reading level profile?
Capturing stretch behaviors:– Estimate a user’s typical reading level profile over
time, from historical search data– Collect search sessions where
E[R|Session] – E[R|User] > 4 grade levels– Build language models from titles of clicked pages– Compare word probability in clicked vs. all titles
Enriching the Web with Readability Metadata
‘Stretch’ tasks: what are people searching for when they deviate from their typical reading level profile?
Highest association with stretch reading
Title word Log ratiotests 2.22test 1.99sample 1.94digital 1.88options 1.87aid 1.87effects 1.84education 1.77forms 1.76plan 1.74pay 1.71medical 1.69learning 1.62
[Kim et al, WSDM 2012] Based on 2-month user profiles from Bing search log data
Medical testsCollege entrance
Gov’t formsJob search
Financial aid
Enriching the Web with Readability Metadata
‘Stretch’ tasks: what are people searching for when they deviate from their typical reading level profile?
Highest association with stretch reading
Lowest association with stretch reading
Title word Log ratio Title word Log ratiotests 2.22 best -0.42test 1.99 football -0.45sample 1.94 store -0.46digital 1.88 great -0.47options 1.87 items -0.52aid 1.87 new -0.53effects 1.84 sale -0.61education 1.77 games -0.65forms 1.76 sports -0.78plan 1.74 food -0.81pay 1.71 news -0.82medical 1.69 music -1.02learning 1.62 all -1.35
Medical testsCollege entrance
Gov’t forms
Financial aid
Future work:
1. Identify & predict stretch tasks2. Decide how and when to
provide support3. Determine helpful background
or alternatives
[Kim et al, WSDM 2012] Based on 2-month user profiles from Bing search log data
Shopping!ExplorationLeisure
Three key innovation directions for readability modeling and prediction
We use the comparative and superlative form to compare and contrast different objects in English. Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York is the most exciting city in the USA.Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects We use the comparative and superlative form to compare and contrast different objects in English. Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York is the most exciting city in the USA.Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects
Syntax
Vocabulary
Coherence
Visual
Topic Interest
The Web
Data-driven
User-centric
Knowledge-based
Some key challenges and opportunitiesfor readability research
Enriching the Web with Readability Metadata
Basi
c Ad
vanc
emen
t of K
now
ledg
e
Relevance for applications
• Deep content understanding - Identifying gaps and assumptions - Concepts and their dependencies• Deep user understanding - Your expertise & changes over time - Learning plans tailored for you - Cognitive models of learning
• Web-scale speed and reliability• Exploiting new content forms
- Blogs, wiki structure & edits• Adapting to different tasks
and populations• Human computation/crowdsource• Predicting quality/authority
• Data-driven, personalized readability measures
• Adapting content to users- Enrich, augment, rewrite
• Adapting users to content• Influencing search presentation
and interaction
• Analyzing movie scripts withKeanu Reeves dialogue