Top Banner
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research
25

Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Dec 13, 2015

Download

Documents

Oscar Reynolds
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Generating Succinct Titles for Web URLs

Kunal Punera

joint work with Deepayan Chakrabarti and Ravi KumarYahoo! Research

Page 2: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Agenda

• Motivation

• Our Approach

• Comparison from Previous Work

• Experimental Results

Page 3: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Titles on Search Results Page

• HTML Titles – Too long

– Can be missing

– Non-html results• Pictures, video and

audio clips

• Other Apps– Site-map generation

Page 4: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Titles for “Quicklinks”

• Strict length restrictions

• Links displayed in context of home page

Quicklink Titles

Homepage Context

Page 5: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Agenda

• Motivation

• Our Approach

• Comparison from Previous Work

• Experimental Results

Page 6: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

“Sources” of Information about URLs (URL: http://www.barackobama.com/issues/)

URL-Tokens “barack obama issues”

Web page content

(HTMLTitle, KeyPhrases)

“Barack Obama | Change We Can Believe In | Issues”

“Issues”, “Civil Rights”, “Defense”, “Economy”

Anchor text on incoming links (IntrasiteAT, IntersiteAT, HomepageAT)

“Issues”, “Economic Issues”

“Barack Obama’s Plan for America”

Search engine queries

(QueryView, QueryClick, QueryClickPos1)

“obama issues”, “obama platform”, “obama campaign issues”, “barack obama platform”

User generated tags

(DeliciousTags)

“obama campaign platform”, “cool”, “nice webpage”

URL-Tokens “barack obama issues”

Web page content

(HTMLTitle, KeyPhrases)

“Barack Obama | Change We Can Believe In | Issues”

“Issues”, “Civil Rights”, “Defense”, “Economy”

Anchor text on incoming links (IntrasiteAT, IntersiteAT, HomepageAT)

“Issues”, “Economic Issues”

“Barack Obama’s Plan for America”

Search engine queries

(QueryView, QueryClick, QueryClickPos1)

“obama issues”, “obama platform”, “obama campaign issues”, “barack obama platform”

URL-Tokens “barack obama issues”

Web page content

(HTMLTitle, KeyPhrases)

“Barack Obama | Change We Can Believe In | Issues”

“Issues”, “Civil Rights”, “Defense”, “Economy”

Anchor text on incoming links (IntrasiteAT, IntersiteAT, HomepageAT)

“Issues”, “Economic Issues”

“Barack Obama’s Plan for America”

URL-Tokens “barack obama issues”

Web page content

(HTMLTitle, KeyPhrases)

“Barack Obama | Change We Can Believe In | Issues”

“Issues”, “Civil Rights”, “Defense”, “Economy”

URL-Tokens “barack obama issues”

Source Instances

Page 7: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Central Idea

Words from title and context (if applicable) are preferentially used by sources in constructing instances.

Degree of these preferences is source dependent.

Page 8: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Generation of Instances(URL: http://www.barackobama.com/issues/)

QuicklinkTitle

HomepageAbstract(Context)

GeneralVocabulary

QueryClick Source IntrasiteAT Source HTMLTitle Source …

obama issuesobama campaign issuesbarack obama platform

platform for obama campaign…

IssuesForeign Policy

Economic IssuesYes We Can

“Barack Obama | Change We Can Believe In | Issues”

0.5 0.4 0.10.8 0.1 0.10.2 0.6 0.2

0.5/0.4/0.1 0.8/0.1/0.1 0.2/0.6/0.2

Page 9: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Learning Source Generation Parameters(URL: http://www.barackobama.com/issues/)

QuicklinkTitle

HomepageAbstract(Context)

GeneralVocabulary

QueryClick Source IntrasiteAT Source HTMLTitle Source …

obama issuesobama platform

obama campaign issuesbarack obama platform

IssuesForeign Policy

Economic IssuesYes We Can

“Barack Obama | Change We Can Believe In | Issues”

GIVEN Learn parameter values that maximize probability of generation of instances

--/--/-- --/--/-- --/--/--

UNKNOWN

Page 10: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Finding Best Quicklink Title (URL: http://www.barackobama.com/issues/)

QuicklinkTitle

HomepageAbstract(Context)

GeneralVocabulary

QueryClick Source IntrasiteAT Source HTMLTitle Source …

obama issuesobama platform

obama campaign issuesbarack obama platform

IssuesForeign Policy

Economic IssuesYes We Can

“Barack Obama | Change We Can Believe In | Issues”

UNKNOWN GIVEN GIVEN

Select title for which probability ofgeneration of instances is maximum

LEARNT

0.5/0.4/0.1 0.8/0.1/0.1 0.2/0.6/0.2

Page 11: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

sourcess (s)w

lenw instances

)Plog(.1 )contexttitle,|(P log.1

sourcess sw

lenws )(instances

)Plog(.1 )contexttitle,|(P log.in instances #

1

sourcess sw

lenlens

ws )(instances

)Plog(. )contexttitle,|(P log.in instances #

Objective Function

• Sources have different number of instances– QueryClick vs. HTMLTitle

• Sources are associated to target web object to different degrees– QueryClick vs. QueryView

– Comments on Youtube etc.

• Can account for dependent sources

Source specific Normalization

Source specific Weights

Page 12: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Learning Source Weights

• With known source generation parameters we have a linear function in source weights

• We learn weights that ranks various candidate titles correctly

– We use the linear ranking SVM described in

Joachims, “Optimizing search engines using clickthrough data”, KDD 2002

Page 13: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Where do Title Candidates come from?

• Instances of some sources of information

• Not all sources used

– Ungrammatical (URL-Tokens)

– Miss-spellings (QueryView)

– Sometimes irrelevant (DeliciousTags)

• We clean some instances to obtain more candidates

– Removing website name

Page 14: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Agenda

• Motivation

• Our Approach

• Comparisons from Previous Work

• Experimental Results

Page 15: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Comparisons with Previous Work

• Our title generation is an “extractive” approach

– Avoid modeling gramatical correctness of titles

• Only learn parameters at the source level

– Lesser training data needed

• Combine information from external sources

– Can obtain titles for objects with no text content

• Respect constraints placed by context of title use

Page 16: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

BMW: Banko et al., Headline Generation based on Statistical Translation, ACL 2000

• Rank headline candidates using 3 factors

– Likelihood of seeing candidate words in a title

– Likelihood of most likely sequence of the words in candidate

– Likelihood of length of candidate

• Lots of parameters

– to model word being in title

– to model bi-grams

– to combine the above 3 factors

Page 17: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Agenda

• Motivation

• Our Approach

• Comparison from Previous work

• Experimental Results

Page 18: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Empirical Evaluation

• Two Tasks

– Generating Quicklink titles (manually judged data)

– Generating Web Page Titles

• Metrics

– F-measure, Jaccard, Exact Match, Longest Common Subsequence

• Baselines

– Sources of information our system uses

– BMW: Banko et al., ACL 2000

Page 19: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Quicklinks Title Task

Approach F-measure Jaccard Exact Match

Our Approach 0.81 0.75 0.63

HomepageAT 0.70 0.66 0.58

IntrasiteAT 0.43 0.41 0.35

IntersiteAT 0.36 0.32 0.25

HTMLTitle 0.37 0.27 0.05

KeyPhrases 0.25 0.19 0.07

• HomepageAT is a very competitive baseline

• IntrasiteAT better than IntersiteAT

• Our system’s performance approaches inter-judge agreement values

Page 20: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Quicklinks Title Task: Learning Rates

• Very few datapoints needed– Learning parameters at source level helps

Page 21: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Quicklinks Title Task: Source Weights

• Having Source weights and normalization helps

Page 22: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Web Page Title Task

Approach F-measure Jaccard LCS

Our Approach 0.53 0.41 3.44

HomepageAT 0.45 0.34 2.7

KeyPhrases 0.41 0.31 2.54

QueryClick 0.31 0.23 2.1

IntersiteAT 0.29 0.21 1.8

BMW 0.12 0.10 --

• Our approach beats competition– BMW not suited to this task

– Often page text doesn’t describe page well

• HomepageAT surprisingly effective

Page 23: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Conclusions

• Our approach combines various sources of information to select titles

• It select titles that respect constraints of length and context

• We empirically showed the effectiveness of our approach

• Future Work

– Deeper language features in selecting titles

– Uniform quicklinks titles across websites

– Contexts of different types

Page 24: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Questions

Thank you.

Page 25: Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Research © 2008 Yahoo!

Copyright Yahoo! 2008No publication or distribution allowed without written permission