Top Banner
Estimating the ImpressionRank of Web Pages Ziv Bar-Yossef Maxim Gurevich Google and Technion Technion
22

Estimating the ImpressionRank of Web Pages

Feb 25, 2016

Download

Documents

chinara

Estimating the ImpressionRank of Web Pages. Ziv Bar- Yossef Maxim Gurevich Google and Technion Technion. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A. Impressions and ImpressionRank. Impression of page/site x on a keyword w : - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Estimating the ImpressionRank of Web Pages

Estimating the ImpressionRank of Web Pages

Ziv Bar-Yossef Maxim GurevichGoogle and Technion Technion

Page 2: Estimating the ImpressionRank of Web Pages

Impressions and ImpressionRankImpression of page/site x on

a keyword w:A user sends w to a search

engineThe search engine returns x

as one of the resultsThe user sees the result x

ImpressionRank of x:# of impressions of x

Within a certain time frameMeasure of page/site

visibility in a search engine

Each result has an impression on the keyword “www 2009”:•www.2009.org•www2009.org/calls.html•www.loginconference.com• ...

Page 3: Estimating the ImpressionRank of Web Pages

Popular Keyword ExtractionThe Popular Keyword Extraction problem:

Input: web page x, int kOutput: k keywords on which x has the most

impressions among all keywordsExample: x = www.johnmccain.com

sarah palin john mccain cindy mccain

Page 4: Estimating the ImpressionRank of Web Pages

MotivationPopularity rating of pages and sitesSite analytics

Enable site owners to determine their visibility in different search engines

Combine with traffic data to derive click-through rates

Compare to other sitesKeyword suggestions for online advertisingSocial analysisSearch engine evaluationFinding similar pages

Page 5: Estimating the ImpressionRank of Web Pages

Internal Measurements of ImpressionRank and Popular Keyword ExtractionSearch engines can compute both

ImpressionRank and popular keywords based on their query logs

Query logs are not publicly released due to privacy concerns

Caveats:Only search engines can do thisNon-transparent

Page 6: Estimating the ImpressionRank of Web Pages

External Measurements of ImpressionRank and Popular Keyword Extraction

Main cost measure: # of requests to the search engine and to the suggestion server

ImpressionRank estimator / Popular keyword extractor

ImpressionRank / Popular Keywords

Target page URL

Page 7: Estimating the ImpressionRank of Web Pages

Our ContributionsReduce ImpressionRank Estimation to

Popular Keyword ExtractionFirst external algorithm for popular keyword

extractionAccurateUses relatively few search engine requestsApplies to:

Single web pages (www.cnn.com) Web sites (www.cnn.com/*) Domains (*.cnn.com/*)

Page 8: Estimating the ImpressionRank of Web Pages

Related WorkKeyword extraction [Frank et al 99, Turney 00, …]Keyword suggestions (for online advertising)

[Yih et al 06, Fuxman et al 08]Query by Document [Yang et al 09]Commercial traffic reporting [GoogleTrends,

comScore, Nielsen, Compete]

Page 9: Estimating the ImpressionRank of Web Pages

RoadmapThe naïve popular keyword extraction

algorithmThe improved popular keyword extraction

algorithmBest-First Search

Experimental results

Page 10: Estimating the ImpressionRank of Web Pages

Search Engine

Suggestion

Server

Popular Keyword Extraction: The Naïve Algorithm

Verification procedure for keyword w:Submit w to the search engine and the suggestion serverVerify that w returns the target pageVerify that the popularity of w > 0 [BG08]

Candidate Verifier

Term Extractor

Term Pool

Candidate keyword

generator

Popular Keywor

ds

Recall problem:Target page may have impressions on keywords that do not occur in its text

Efficiency problem:103 terms 109 3-term

candidates

…mp3

songtag

weather…

Candidate keyword TRIE

mp3…

Target Page

Candidate

keywordTRIE

mp3 tag

Page 11: Estimating the ImpressionRank of Web Pages

Candidate keyword

generatorBest-First

Search

Popular Keywords

Popular Keyword Extraction: The Improved Algorithm

Candidate Verifier

Term Extractor

Term Pool

Target Page

Candidate

keywordTRIE

Target

Page Similar

PagesAnchor Text

Search Engine

Suggestion

Server

Page 12: Estimating the ImpressionRank of Web Pages

…mp3 weather…

mp3songtag…

Candidate keyword TRIE

Best-First Search

Best-First Search

Candidate Verifier

3 5

8

Goals:Prune as many candidates as

possibleVerify the most promising

candidates first

Start with single term candidates

Score candidatesWhile not exceeded search

engine request budgetw = top scoring candidateSend w to the verifierDecide whether to prune wIf not prune w

Expand w – generate and score the children of w

Search Engine

Suggestion Server

Page 13: Estimating the ImpressionRank of Web Pages

PruningPruning decision for keyword w:

Submit query inurl:<target url> w If no results, prune w and all its descendants

Retrieve suggestions for w If no results, prune w and all its descendants

Pruning eliminates the vast majority of candidates

A single search/suggestion request may eliminate thousands of candidates

Page 14: Estimating the ImpressionRank of Web Pages

ScoringThe Best-First search algorithm considers only the top

scoring candidates given the budgetWant to predict

Whether the search engine returns the target page on w Whether w is a popular keyword

score(w) = tf(w) idf(w) popularity_score(w)

, , and : relative weights of the scoring components

Predicts whether the search engine returns the target

page on w

Predicts the popularity of w

Page 15: Estimating the ImpressionRank of Web Pages

How to Compute Candidate ScoresEvery time the algorithm expands a keyword, it needs to

compute scores for all its childrenThere could be thousands of such children

TF ScoreStraightforward. No search requests needed.

IDF Score Approximated based on an offline corpus. No search

requests needed.Popularity Score

[BarYossefGurevich 08]: Algorithm for estimating keyword popularity using the query suggestion service Too costly: may use dozens of suggestion requests per estimate

We present a new algorithm that estimates popularity for all the children in bulk Uses hundreds of suggestion requests to estimate the popularity of

all the children Estimates are less accurate

Page 16: Estimating the ImpressionRank of Web Pages

Cheap Popularity EstimationInput: a keyword wGoal: Estimate popularity of all w’s children

Bucket children according to their first characterEstimate relative popularity of each bucketEstimate the relative popularity within each bucket

Estimate of popularity_score(pre

fix)

BG08 Popularity Estimator

mp3_

a … s t

mp3 song

mp3 tagmp3 table

…5 6

245

mp3 smp3 t

Example: w = “mp3”children: “mp3 song”, “mp3 tag”, “mp3 table”, …

Page 17: Estimating the ImpressionRank of Web Pages

Popular Keyword Extraction Algorithm: Quality AnalysisPrecision: 100%

All extracted keywords return the target pageRecall: do we miss some popular keywords?

More difficult to measure – no ground truth to compare to

Estimate lower bound on the recall

Google: recall > 90%Yahoo!: recall = 70% - 80%

Page 18: Estimating the ImpressionRank of Web Pages

0%10%20%30%40%50%60%70%80%90%

100%

0 200 400 600 800 1000

Wei

ghte

d fr

actio

n of

po

pula

r ke

ywor

ds fo

und

Search requests used

Google

Yahoo!

Resource Usage

~10000 suggestion server requests per page~1000 search engine requests per page85%(Google), 75%(Yahoo) after 25% of resources spent

Page 19: Estimating the ImpressionRank of Web Pages

Google Yahoo! Compete

Rela

tive

Impr

essio

nRan

kcnn.comnytimes.comwashingtonpost.com

ImpressionRank of News Sites(March 2009)

weathercnn

videoobama

weathercnn

bristol palinnews

amazonmoviesbarack obama

stimulus package

new york timesbarack obama

Page 20: Estimating the ImpressionRank of Web Pages

Google Yahoo! Compete

Rela

tive

Impr

essio

nRan

ken.wikipedia.org www.youtube.comwww.facebook.com www.myspace.com

ImpressionRank of Social Sites(March 2009)

Page 21: Estimating the ImpressionRank of Web Pages

ConclusionsFirst external algorithms for

ImpressionRank estimationPopular keyword extraction

Future workImprove efficiencyImprove recall

Page 22: Estimating the ImpressionRank of Web Pages

Thank You