Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han
Network Computing Laboratory
Scalable File Sharing System Using Distributed Hash Table
Idea Proposal
April 14, 2005
Presentation by
Jaesun Han
Network Computing Laboratory | 2
Korea Advanced Institute of Science and Technology
Contents
One Line CommentMotivation & ProblemsMy Idea
Key IdeaDistributed Hash TableP2P file sharing system using DHT
Technical challengesConclusion
Network Computing Laboratory | 3
Korea Advanced Institute of Science and Technology
One-line comment
Achieving fully decentralized P2P file sharing system by distributing file indexing structure as distributed hash table (DHT)
Network Computing Laboratory | 4
Korea Advanced Institute of Science and Technology
Scalability in file sharing is a practical key issue!!! Even worse is
the request of infamous files network attack like DDoS
Internet Explosion
InternetKorean UsersUS Users
Hot!!File sharing
infrastructure
Network Computing Laboratory | 5
Korea Advanced Institute of Science and Technology
Solution Approach
Scalable solution for file sharing Investigate currently existing file sharing solutionsCurrently P2P based file sharing seems the most appropriate Investigate methods to provide scalability to P2P based
approaches
Fully decentralized architecture for P2P based file sharing
Network Computing Laboratory | 6
Korea Advanced Institute of Science and Technology
Key Idea
Decentralized indexing Existing schemes are either centralized or self-indexing
E.g., … Self-indexing is not a index scheme. They have no indexing scheme. Solve the absence of indexing scheme by flooding-based search mechanism High search overhead
k.mp3
Search(k.mp3)
1
2
3
4Node3n.mp3
Node2s.mp3
Node4b.mp3
Node1k.mp3
nodefile
CentralIndex table
Search(k.mp3)
1
2
3
4
DistributedIndex table
a-e
f-m
n-r
s-z
Network Computing Laboratory | 7
Korea Advanced Institute of Science and Technology
Key Idea
Distributed Indexing Split index table & distribute each part to each node
Hash Table for Distributed IndexingPossible to fast lookup Input to hash table : file name
Output from hash table : node address
Distributed Hash Table for Distributed IndexingSplit hash table & distribute to each nodeLookup through shortcut path
P2P file sharing with DHT
Network Computing Laboratory | 8
Korea Advanced Institute of Science and Technology
DHT based File sharing: Technical Challenges
CHALLENGE
CHALLENGE
Search(k.mp3)
1
2
3
4
DistributedIndex table
a-e
f-m
n-r
s-zRouting?!
Nodes often join and leave!
Network Computing Laboratory | 9
Korea Advanced Institute of Science and Technology
Related Works
Peer-to-Peer File Sharing SystemSharing files among personal computers [e.g.] Soribada, eDonkey, KaZaa, Gnutella33.4% of Internet traffic in KT investigation (2004.2)Millions of simultaneous users
Key technical issues in file indexing of existing P2P file sharing systemEvolution of indexing scheme for improving scalability
1st generation : centralized indexing2nd generation : fully decentralized self-indexing3rd generation : semi-centralized indexing
Network Computing Laboratory | 10
Korea Advanced Institute of Science and Technology
Related Works
First generation file sharing systemCentralized indexing ([e.g.] Soribada, Napster)Problems : not scalable, single point of failure
CentralizedDirectoryServer
(napster.com)
N1
N2N3
N4
N5
… …a.mp
3N5
… …
file node
Search(a.mp3)
N5 IP addr.
Request(a.mp3)
File(a.mp3)
Network Computing Laboratory | 11
Korea Advanced Institute of Science and Technology
Related Works
Second generation file sharing systemFully decentralized self-indexing ([e.g.] Gnutella)Problems : flooding overhead, partial searching
N1
N2
N3
N5
N4
N7
N6N8
N9
Search(a.mp3)
Search Result N3, N5, N8Selected Node N5
Network Computing Laboratory | 12
Korea Advanced Institute of Science and Technology
Related Works
Third generation file sharing systemSemi-centralized Indexing ([e.g.] eDonkey, KaZaa)Problems : partial searching, weak to DoS attack
SupernodeSupernode
Search (a.mp3)
File (a.mp3)
Network Computing Laboratory | 13
Korea Advanced Institute of Science and Technology
Distributed Hash Table, Basic (1)
Distributed Hash TableFile name H(x) File ID, Node address H(x) Node IDMapping File ID to Node ID
hash key node
0
1
9, 20
98
2 3,7,11
12767,10
2
H(x)
a.mp3
k.txt
x.mpg
g.doc
FileName
Node
k
b
w
n
Node IDFile ID
30(0-30)
71(31-71)
89(71-89)
127(89-127)
H(x)
k
b
n
w
NodeAddress
Network Computing Laboratory | 14
Korea Advanced Institute of Science and Technology
Distributed Hash Table, Basic (2)
Key and Node are uniformly distributed and exist in the same ID space Each node is responsible to keys between predecessor node and itself
000
001
010
011
100
101
110
111001000 g.txt(2,8)
-
010 a.mp3(1)
100011
x.doc(4)-
110101s.mpg(1,4)
-111 k.mp3(2)
H(g.txt)
H(a.mp3)
H(x.doc)
H(s.mpg)
H(k.mp3)
Network Computing Laboratory | 15
Korea Advanced Institute of Science and Technology
Distributed Hash Table, Routing (1)
Naïve approachEach node knows one’s successor node Lookup request is forwarded to the successor
until (Node ID < File ID < Successor Node ID)Worse case performance : O(N)
000
001
010
011
100
101
110
111110101s.mpg(1,8)
-111 k.mp3(2)
successor=010
successor=100
successor=111
successor=001
Lookup (H(k.mp3)) Lookup (101)
Network Computing Laboratory | 16
Korea Advanced Institute of Science and Technology
Distributed Hash Table, Routing (2)
Tree-based routing tableShortcut to nodes whose no
de ID have different bits in each bit position
2m ID space m entriesLookup performance O(logN)
0
1
1
1
0
0
000
001
010
011
100
101
110
111 d
a
b
c
01100x1xx
cac
10111x0xx
dda
Shortcuttable
Lookup(101)
Network Computing Laboratory | 17
Korea Advanced Institute of Science and Technology
Distributed Hash Table, Routing (3)
Complete example of routing table & routing algorithm
Lookup from node 65a1fc with key d46a1c
Lookup from node 65a1fc with key d46a1cRouting TablesRouting Tables
Network Computing Laboratory | 18
Korea Advanced Institute of Science and Technology
Distributed Hash Table, Join
Join processTry to lookup with one’s node ID as lookup keyGathering routing table entries in routing
d46a1c
Lookup from node d46a1c with key d46a1c
Lookup from node d46a1c with key d46a1c
0- 1- 2- 3- 4- 5- 6- 7- 8- 9- a- b- c- e- f-
d0- d1- d2- d3- d5- ….. dc- dd- de- df-
d40- d41- d42- d43- d44- d45- ….. d4f-
Routing TablesCreation
Routing TablesCreation
Network Computing Laboratory | 19
Korea Advanced Institute of Science and Technology
P2P File Sharing with DHT
Storing file index into DHTExample : node a shares new file g.txt, node b lookup g.txt
000
001
010
011
100
101
110
111
01100x1xx
cac
10111x0xx
dda
00001x1xx
acc
11010x0xx
dca
a
b
c
d
1. Hash g.txtFile ID=101
2. Insert file infowith ID=101
g.txt
3. Hash g.txtFile ID=1014. Lookup
with ID=101
101 g.txt a
ID Filename
Nodeaddr
File index table
addr(a)
5. Downloadfile g.txt
g.txt
Network Computing Laboratory | 20
Korea Advanced Institute of Science and Technology
File Sharing with DHT: Technical Challenges
Frequent node join & leave
Index replication & fast routing table adaptation
Exact matching search by hashing file name
Keyword search scheme
Hotspot problem in node which is indexing a popular file
Load balancing mechanism
Network Computing Laboratory | 21
Korea Advanced Institute of Science and Technology
Conclusion
New approach for P2P file sharing systemUsing new distributed data structure,
Distributed Hash Table (DHT)Fully decentralized indexingGuarantee lookup performance of O(logN)Possible to full searchRobust to node failure & network attack like DoS attack