Top Banner
Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang
23

Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Open Problems in Data-Sharing Peer-to-Peer Systems

Neil Daswani,Hector Garcia-Molina,

Beverly Yang

Page 2: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Peer-To-Peer Systems

Autonomous, large-scale, decentralized systems

A large pool of resources Files, compute cycles

Open performance and security challenges

Page 3: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Research problems Search

Efficiency Expressiveness Quality of Service

Security Availability Authenticity Anonymity Access Control

Page 4: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Search Mechanism Submit queries and receive results

Keywords, SQL statements Defines the behavior of peers

Topology How peers are connected to each other

Data placement How data is distributed across the peers

Message Routing How messages are propagated

Page 5: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

System Requirements Expressiveness

Query language should provide detailed description

Key lookups not expressive enough Comprehensiveness

Single result not sufficient for some systems All results required in some cases

Autonomy Nodes should control their organization

Page 6: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Goals of Search Mechanism

Maximize efficiency Light overhead, higher throughput

Maximize Quality of Service Number of results Response time

Robustness Stability in presence of failures

Page 7: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Expressiveness (1/2) Key lookup Keyword queries

Partial search Efficient for certain types of file , e.g music

Ranked Keyword Rank the results of keyword queries Global statistics required Collection and maintenance challenging “top k” results

Page 8: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Expressiveness (2/2)

Aggregates SUM, COUNT, MAX and MEDIAN E.g. COUNT nodes belonging to

forth.gr domain SQL

The most difficult query language Performance “hotspots” (PIER

system)

spiros antonatos
Page 9: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Autonomy/ Efficiency/ Robustness

Correlation between autonomy and efficiency Locate data with bounded cost

(Chord) Small sets of nodes guaranteed to

hold the answer Increased chance of finding results on

random node

Page 10: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Tuning the autonomy / efficiency tradeoff Varying needs

E.g. sensitive files should remain on the intranet

Different systems for different purposes not always desirable

SkipNet Specify a range of peers on which a

document can be stored Single peer range: high autonomy All peers range: traditional P2P system

Page 11: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Autonomy and Robustness Viceroy network construction

Low level of autonomy Reduced cost of maintaining structure

=> Increased robustness and efficiency Distributed hash tables

Logarithmic maintenance cost Super-peer redundancy

Stricter topology => decreased autonomy => greater robustness

Page 12: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Quality of Service Number of results

Tradeoff between number of results and cost BFS technique

Send messages to “productive” nodes Depends on ad-hoc topology

Concept-clustering Communicate according to “interest”

“Satisfaction” True when a threshold of results found Important to partial-search systems Cost can be drastically reduced

Page 13: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Page 14: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Availability Nodes should be always up DoS attacks

Flooding a node with messages Malicious super-nodes in Gnutella

Claims that the victim has all files requested Attack CPU availability

Sending complex queries Attack file storage

Submit bogus documents Attack quality-of-service

Serve a file slowly Send a different file

Page 15: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Countermeasures Careful design of P2P protocols

Gnutella is loosely constrained Back-door communication channels are

prohibited Techniques for detecting failures

High message overhead, complexity Assume pairwise connectivity

Allocate storage proportionally to what a node contributes

Hash trees to ensure a node is sending the correct data and at a reasonable rate

Page 16: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Page 17: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

File Authenticity

Different than file integrity CRC, hashing, MACs, digital

signatures Given a query, the authentic

response has to be distinguished What does “authentic” mean?

Page 18: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Definition of “authentic” Oldest Document

The oldest submission is consider authentic Timestamping systems

Expert-based Authoriative nodes keep track of signatures Susceptible to failures Offline digital signature schemes

Voting-based Votes of many experts Experts may be humans Spoofing of votes, nodes and files

Reputation-based Weight votes, some experts more trustworthy Maintenance, update and propagation of weights

Page 19: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Page 20: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Anonymity (1/2) Illegal trade of files vs. censorship

resistance, freedom of speech, privacy protection

Types of anonymity Author: which users created which documents Server: which nodes store a given document Reader: which users access which documents Document: which documents are stored at a

given node Anonymity vs. efficiency

Free Haven provides server anonymity, Freenet provides author anonymity

Page 21: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Anonymity (2/2)

Achieve server anonymity through intermediate nodes Forwarding proxies Servers identified by nicknames Degradation of anonymity protocols

under attacks Problem of collusion

Free Haven and Crowds use forwarding proxies

Page 22: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Security Availability

Bandwidth, CPU and file availability File Authenticity

Which responses are authentic? Anonymity

How we can hide our identity? Access Control

Restrict accessibility

Page 23: Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.

Access Control

Restrict accessibility to documents P2P systems cannot enforce

copyright laws Violation of copyright laws by users Lawsuits against companies that build

P2P systems Limited utilization vs. free

distribution