Top Banner
Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer Science, University of Toronto [email protected]
32

Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Dec 18, 2015

Download

Documents

Bertram Sutton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Computer Science Department, University of Toronto 1

Seminar SeriesSocial Information Systems

Toronto, Spring, 2007

Manos PapagelisDepartment of Computer Science, University of Toronto

[email protected]

Page 2: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

2

Presentation Outline

Part I: Exploiting Social Networks for Internet Search Part II: An Experimental Study of the Coloring Problem on Human

Subject Networks

Page 3: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Computer Science Department, University of Toronto 3

Exploiting Social Networks for Internet Search Alan Mislove, Krishna Gummadi, and Peter Druschel, HotNets 2006

Part I

Page 4: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

4

Introduction

Social Networking (SN)

A new form of publishing and locating information Objective

To understand whether these social links can be exploited by search engines to provide better results

Contributions• Comparison of the mechanisms in Web and online SN for

Publishing: Mechanisms to make information available to users Locating: Mechanisms to find information

• Results from an experiment in social network-based Web Search• Challenges and opportunities in using Social Networks for

Internet Search

Page 5: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

5

Web vs. SN (1/2)

Web Publishing: By placing documents on a Web Server (and then search

for incoming links) Locating: Via Search engines (Exploiting the link graph)

Pros Very Effective (incoming links are good indicators of importance)

Limitations No fresh data No personalized results Unlinked pages are not indexed

Page 6: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

6

Web vs. SN (2/2)

Social Networks Publishing: No explicit links between content (photos, videos, blogs)

but implicit links between content through explicit links between users.

Locating: • Navigation through the social network and browsing users’

content• Keyword based search for textual or tagged content• Through "Top-10" lists

Pros Helps a user find timely, relevant information by browsing adjacent

regions of the network of users with similar interests Content is rated rapidly (by comments and feedback of a community)

Page 7: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

7

Integration of Web Search and SN

Web and SN information is disjoint No unified search tool that locates information across different

systems

Page 8: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

8

PeerSpective: SN-based Web Search

Technology: • Lucene text search engine and FreePastry P2P Overlay• Lightweight HTTP Proxy transparently indexes all visited URLs of

user

Page 9: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

9

Searching Process

A query is submitted by a user to Google The proxy transparently forwards the query to both Google and the

Proxies of Users in the network Each proxy executes the query on the local index Results are then collated and presented alongside Google results Peerspective Ranking:

Lucene Sc. + Pagerank + Scores from users who previously viewed the result

Page 10: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

10

Search Results Example

Page 11: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

11

Experiments

10 grad. students share downloaded or viewed Web content One month long experiments 200.000 Distinct URLs 25% were of type text/html or application/pdf (so the can be indexed)

Reports On: Limits of hyperlink-based search Benefits of SN-based Search

Page 12: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

12

Limits of hyperlink-based search

Report on fraction of visited URLs that are not indexed by Google• Too new page (blogs)• Deep Web• Dark Web (no links)

Results About 1/3 of requests cannot be retrieved by Google Peerspective’s indices covers 30% of the requested URLs 13.3% of URLs were contained in PeerSpective but not in Google's

index

Page 13: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

13

Random samples of URLs not in Google and Potential Reason

Page 14: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

14

Benefits of SN-based Search

Experiments on clicks on results on first page

For 1730 queries (1079 resulted in clicks)

Results 86.5% of the clicked results were returned only by Google 5.7% of the clicked results were returned by both 7.7% of the clicked results were returned only by PeerSpective

Conclusions This 7.7% is considered to be the gold standard of web search

engineering Inherent advantage of using social links in web search

Page 15: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

15

Reasons for Clicks on Peerspective

Disambiguation

Community tend to share definitions or interpretation of popular terms (bus)

Ranking

SN information can bias the ranking algorithms to the interests of users (CoolStreaming)

Serendipity

Ample opportunity of finding interesting things without searching

Page 16: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

16

Example of URLs found in Peerspective

Page 17: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

17

Opportunities and Challenges

Privacy• Willingness of users to disclose information• Need for mechanisms to control information flow and anonymity

Membership and Clustering of SN• Users may participate in many networks• Need for searching with respect to the different clusters

Content rating and ranking• New approaches to ranking search results• System Architecture: centralized or Distributed?

Page 18: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Computer Science Department, University of Toronto 18

An Experimental Study of the Coloring Problem on Human Subject Networks

Michael Kearns, Siddharth Suri, Nick Montfort, SCIENCE, (313), Aug 2006

Part II

Page 19: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

19

Experimental Study on Human Subject Networks

Theoretical work suggests that structural properties of naturally occurring networks are important in shaping behavior and dynamics• E.g. Hubs in networks are important in routing information

Empirical Structural Properties established by many disciplines• Small Diameter (the “six” degrees of separation)• Local clustering of connectivity• Heavy-tailed distribution of connectivity (Power-law distributions)

Empirical Studies of Networks• Limitation: Networks are fixed and given (no alternatives)• Other approach: Controlled laboratory study

Page 20: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

20

Experiment

Experimental Scenario• Distributed problem-solving from local information

Experimental Setting• 38 human subjects (network vertices)• Each subject controls the color of a vertex in a network• Networks: simple and more complex• Goal: Select a different color from that of all neighbors• Problem: Coloring problem• Information Available: Variable (Low, Medium, High)

Page 21: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

21

Graph Coloring Problem

Graph coloringAn assignment of "colors" to certain objects in a graph such that no two adjacent objects are assigned the same color

Graph Coloring ProblemFind the minimum number of colors for an arbitrary graph (NP-hard)

Chromatic numberThe least number of colors needed to color the graph

Example Vertex coloring A 3-coloring suits this graph but fewer

colors would result in adjacent vertices of the same color

Page 22: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

22

Network Topologies

Leader Cycle Pref. Att. v=2 Pref. Att. v=3

Simple Cycle 5-Chord Cycle 20-Chord Cycle

Page 23: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

23

Information View

YOU YOU

3

63

7 10

YOU

Overall Progress Overall Progress Overall Progress

Low(Color of each Neighbor)

Medium(#of Links of each Neighbor)

All(All network)

Page 24: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

24

Graph Properties and Experimental Results

Graph Graph Properties Experimental Results

Colors

Required

Min

Links

Max

LinksAvg.

Distance

Avg. Exp.

Duration (sec)

# Exp.

Solved

(sec)

No. of Changes

Simple Cycle

2 2 2 9.76 144.17 5/6 378

5-Chord Cycle

2 2 4 5.63 121.14 7/7 687

20-Chord Cycle

2 2 7 3.34 65.67 6/6 8265

Leader

Cycle2 3 19 2.31 40.86 7/7 8797

Pref. Att. V=2

3 2 13 2.63 219.67 2/6 1744

Pref. Att. V=3

4 3 22 2.08 154.83 4/6 4703

Page 25: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

25

1: Collective Performance

Subjects could indeed solve the coloring problem across a wide range of networks• 31/38 experiments ended in solution in less that 300 seconds• 82 sec mean completion time

Collective Performance affected by network structure• Preferential Attachment harder than Cycle-based networks

Cycle-based networks: • Monotonic relationship between solution time and average

network distance (smaller distance leading to shorter solution times)

Addition of random chords: Systematically reduces solution time

Page 26: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

26

2: Human Performance VS Artificial Distributed Heuristics

Heuristic considered: A vertex is randomly selected

• If there are unused colors in the neighbor of this vertex then a color is selected randomly from the available ones

• If there are not unused then a color is selected randomly

Comparison measure Number of vertex color changes

Findings: Results exactly reversed: lower average distance increases the

difficulty for the heuristic Preferential attachment networks easier for the heuristic

Page 27: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

27

3: Effects on Varying the Locality of Information View

Variable locality information provided to subjects• Low: Their own and neighboring colors are visible• Medium: Their own and neighboring colors are visible but

providing information on connectivity of neighbors• High: global coloring state at all times

Findings: Increased amount of information

• Reduces solution times for cycle-based networks• Decreases solution times for preferential attachment networks• Rapid convergence to one of the two solutions in cycle-based

networks

Page 28: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

28

Information View Effect 1: Pref. Att. VS Cycle-based Networks

Avg. Experiment Duration

0

50

100

150

200

250

300

350

Low Medium High

Information View

Tim

e (s

econ

ds)

CyclesPref. Att.

Page 29: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

29

Information View Effect 2: Cycle-based Solution Convergence

Low Information View High Information View

Population oscillates between approaches to the two solutions

Rapid convergence to one of theTwo possible solutions

Page 30: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

30

Individual Strategies

Choosing colors that result in the fewest local conflicts Attempt to avoid conflicts with highly connected subjects Signaling behavior of subjects Introducing conflicts to avoid local minima

Page 31: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Computer Science Department, University of Toronto 31

Questions?

Page 32: Computer Science Department, University of Toronto 1 Seminar Series Social Information Systems Toronto, Spring, 2007 Manos Papagelis Department of Computer.

Computer Science Department, University of Toronto 32

Thanks!