Link Analysis on the Web An Example: Broad-topic Queries Xin Xin
Jan 13, 2016
Link Analysis on the WebAn Example: Broad-topic Queries
Xin Xin
Problem
• Specific queries: “Does Netscape support the JDK 1.1 code-signing API?”
• Broad-topic queries: “Find information about the Java programming language.”
• Authority is important in broad-topic queries
WebQuery: “java”
1. http://java.sun.com
2. http://sunsite.unc.edu/javafaq/javafaq.html
3. …
Why to use link analysis comparing to content information?
Query: Harvard
“Harvard” occurring times: 4
Harvard Homepage Other page introducing Harvard
“Harvard” occurring times: 8
Query: Search engines
“Search engines” occurring times: 0
Yahoo! Homepage Other page introducing search engines
“Search engines” occurring times: 4
Graph Presentation
G=(V,E)
V: pages
E: in-link and out-link
Adjacency matrix
1
2
43
p1 p2 p3 p4
p1
p2
p3
p4
1
1
1
1
1
Given a query, how to find the most authoritative page through these link information?
Overview
Web
Query: “java”
1. http://java.sun.com
2. http://sunsite.unc.edu/javafaq/javafaq.html
3. …
1
2
43
1
2
1. Sub-graph construction
2. Hubs and authorities computation
Step1: Sub-graph Construction• Challenge:
– Small in size– Rich in relevant pages– Contains most of the strongest authorities
Step2: Hubs and Authorities
• Basic Idea: in-degree
• Problem:
Step2: Hubs and Authorities
Step2: Hubs and AuthoritiesAn Iterative Algorithm:
Simple Example 1
1
2
43
(x,y):
x=hub score
y=authority score
(1/4,1/4)
(1/4,1/4)
(1/4,1/4)
(1/4,1/4)
Simple Example 2
1/ 7
4 / 7
1/ 7
1
2
43
(1/4,1/4)
(1/4,1/4)
(1/4,1/4)
(1/4,1/4)
Hub :
1: 1/4
2: 1/4+1/4
3: 1/4
4: 1/4
Authority :
1: 1/4+1/4+1/4
2: 1/4
3: 0
4: 1/4
1/ 7
9 /11
1/11
1/11
0
Page Rank