Top Banner
SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras
28

SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Mar 27, 2015

Download

Documents

Ashley Maynard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

SNFS: The design and implementation

of a Social Network File System

Ch. Kaidos, A. Pasiopoulos N. Ntarmos,

P. Triantafillou

University of Patras

Page 2: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Shameless plug..

If interested, please check out

eXO: Decentralized Autonomous Scalable Social Networking,

5th Conference on Innovative Data Systems Research (CIDR2011), 2011.

Page 3: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Social Networks

Social Networks

Our Take:

1.Search for•People (friends, experts, …)•Content (books, photos, videos, blogs, websites, …)

2.Form entities (collections)•Friends-lists, content-libs

3.Search for•entities•Using previously-formed collections…

4.SNFS currently provides the foundation for these…

Page 4: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Tagging

Tag 1Tag 2Tag 3Tag 4Tag 5

Profiles: sets of tags describing

entities.

“Search for”: •based on profiles.•Ranked retrieval (top-k)

Page 5: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Current State

5,000,000,000 photos3,000 photos/min (as of September 2010)

2,000,000,000 videos served up each day(May 2010)

600,000,000 monthly active users (January 2011)

15,000,000 books (October 2010)130,000,000 by the end of the decade

Page 6: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Current State

Need to access published content

22,750,000,000 queries in search engines

4,000,000,000 queries in YouTube

351,000,000 queries in Facebook

416,000,000 queries in MySpace(U.S. market figures, December 2009)

?

Page 7: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Current State

How do I findstuff I want?

How do I provideintresting objects

to my users?

Page 8: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Proposal

A content-awarefile system

for Social NetworkSystems

Usefull to users... ... And service providers too!

Page 9: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Previous Work on File Indexing

1991 – Semantic File Systems by Gifford

1996 – BeFS by Giampaolo and Meurillon, part of the BeOS

BeOS never had commercial success...

1998 – Indexing Service on Windows NT, not needed at the timeRemnant of the Object File System from the unmaterialized Cairo project

Typically• no ranked retrieval• No users’ input (tags)• No user relationships

Page 10: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Desktop Searches2004 – Windows Desktop Search, widely popular

2005... – Mac OS X's Spotlight, Google Desktop, Beagle, Strigi, Tracker...

Typically• no ranked retrieval ?• No user relationships• no exploits from relations for searching

Page 11: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Problems

Power tools for power users... But for average users...

Boolean operators???SQL like queries???

Page 12: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Previous Work on Ranked Retrieval

1968 – SMART system by Salton, introduced weights in retrieval, instead of classical Boolean retrieval

1975 – Vectors and cosine similarity by Salton

1988 – Other functions for similarity tested and evaluated by Salton and Buckley

2003 – Fagin proposes and compares several efficient algorithms for top-k retrieval

Page 13: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Design

Page 14: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Design – SNFS

Tags are extracted from object, stemmed and frequency is counted

Weights for each tag and document are calculated

Each object is associated with a unique id in a Tree

A tf-idf weighting scheme was chosen

Page 15: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Design – SNFS

Term Weight and Object ID are stored in an inverted index

Each posting list of the index is a B+Tree stored in secondary memory

The position of the root of the B+Tree in the index is stored in a Red Black Tree

Page 16: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Design – Search and retrieval

The query is split in terms and stemmed

The score of each document is calculated using a threshold algorithm and a tf-idf function

Page 17: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Threshold AlgorithmsInput: Posting lists sorted on weight (decreasing)

t1

t3

t2

depth 1

d1

d3

d2

NRA (No Random Access) Algorithm

d4

d5

d2

2

Doc ID ScoreDoc ID

d1 s1

d2 s2

d3 s3

d4

d5 s5

s4

+s6

d4

d3

d2

3

+s7

+s8

+s9

Threshold s1+s2+s3

s4+s5+s6

s7+s8+s9

When no score bellow the top-k objects can be improved to exceed the threshold the algorithm halts

Page 18: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Threshold AlgorithmsInput: Posting lists sorted on weight (decreasing)

TA (Threshold Algorithm with random accesses)

t1

t3

t2

1

d1

d3

d2

d4

d5

d2

2

d4

d3

d2

3

Threshold s1+s2+s3

s4+s5+s6

s7+s8+s9

Doc ID ScoreDoc ID

d1 s1

d2 s2

d3 s3

d4

d5 s5

s4

+s6 +s7

+s8

+s9

depth

d5

+s10

When score of the last object is bellow threshold the algorithm halts

Page 19: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Qualitative Comparison

NRA TA

Disk Accesses

State Keepingand computation

System Calls

We expect TA to perform many more slow disk accessesCan NRA's large state keeping keeping and computation need overcome TA's disk accesses?

We implement both, on hard disk and on RAM-disk to find out...

Page 20: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Implementation with FUSE

Page 21: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Testing

- 4 real world test sets

- files containing tags from online objects

- index is normally on secondary memory

- ram-disk used to evaluate the effect of disk accesses

Page 22: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Results demanded vs TimeDisk based index

NRA

TA

Page 23: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Results demanded vs TimeRAM based index

NRA

TA

Page 24: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Query Terms vs TimeDisk based index

NRA

TA

Page 25: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Query Terms vs TimeRAM based index

NRA

TA

Page 26: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Beagle vs NRA

Terms vs time

Results vs time

Page 27: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Conclusions

SNFS:

- Indexing, storage, and ranked retrieval of entities in a SN.

- Study of efficiency of algorithms and implementations, using real-world data, and various implementations.

- Competitive performance, (eg against Beagle).

- Many ways of further expansion

Page 28: SNFS: The design and implementation of a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras.

Future Work - Expansion for distributed systems and clouds

- Distributed file systems (HDFS)

- Distributed data structures

- Tagging, Indexing, and searching for entity-collections – straightforward, as our ‘object’ implementation/abstraction captures this.

- Establishing entities consisting of relationships between entities, using advanced-tagging, and searching for these…