FreeNet: A Distributed Anonymous Information Storage and Retrieval System Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong
Jan 31, 2016
FreeNet: A Distributed Anonymous Information Storage and Retrieval System
Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong
FreeNet
• P2P network for anonymous publishing and retrieval of data– Decentralized
– Nodes collaborate in storage and routing
– Data centric routing
– Adapts to demands
– Addresses privacy & availability concerns
Motivation
• Problem - Querying the network– Source - Requestor– Destination – Provider
• It’s a distributed search problem – Approximating global knowledge with local
knowledge– Other systems – Chord, Tapestry, Pastry
• Privacy and availability– Protect authorship, prevent denial attacks
Goals of Freenet
• Anonymity for producers and consumers• Deniability for information storers• Resistance to denial attacks• Efficient storing and routing• Does NOT provide
– Permanent file storage– Load balancing – Anonymity for general n/w usage
Architecture
• Each node – local data store + routing table• Request file through location independent keys• Routing - chain of proxy requests - decision is local• Graph structure actively evolves over time
Request:
1. key
2. Hops to live
3. ID
4. Depth
Key Based Searching
FILE‘D’– key generation Pb + Pr ; SHA(Pb) D
+ Pr
KSKEncrypted FILE Signature
E(FILE, D)
•Keyword signed key(KSK)
•Easy for retrieval – only need ‘D’
•Minimal protection against tampering
Keys and Searching…..
• Problems with KSK – flat namespace (collisions), key squatting, dictionary attacks
• Signed Subspace Key (SSK)– Randomly generated key pair namespace ID– SSK = SHA(‘D’) ^ SHA(Pb) – (-)Advertisement – subspace Pb + ‘D’– (+)Owner can construct hierarchical space of arbitrary
depth - using indirect files– (+)Reduces collision greatly
Keys and Searching…
• Problems with SSK - updating, versioning• Content Hash Keys (CHK)
– Encrypted by a random encryption key– Publish CHK + decryption key– CHK + SSK easily updateable files
• 2 step process – publish file, publish pointer• Results in pointers to newer version• Older versions accessed thru CHK
– Can be used for splitting files
Retrieving Files
• How do u locate the keys?– Hypertext spider – Indirect files – published with KSK of search words– Publish bookmarks
• File retrieval– Request forwarded to node in RT with closest
lexicographic match for the binary key– Request routing follows steepest-ascent hill
climbing: first choice failure backtrack second choice
Still Retrieving….
• Timers, hops - curtail request threads
• Files cached all along the retrieval path
• Self-reinforcing cycle – results in key expertise
c
a
d
b
e
f
Ring Topology
•1000 nodes in ring topology
•Datastore = 50 items
•RT = 250 items
•Keys associated with links are hash of destn IPs
Self Reinforced Routing • Snapshots using 300 requests with hops = 500
• As network converges it drops to 6 - “six degrees of separation”
Retrieval Discussion
• No controlled replication no persistence
• No correlation between keys and content– (+) Documents related to a subject are scattered
• Geographical fault resilience
– (-) No spatial locality – search latencies can suffer• Building indexes by other means
Publishing
• Similar to retrieval but, 2 step process – Detect collisions – ‘all clear’ if no collision
– Publish to node in RT with closest key match
• Are CD and publish paths same?– Can result in collision during publish step
• Inserts allow new nodes to advertise themselves
• (+) Key-squatting is not effective
Data Management
• Finite data stores - nodes resort to LRU
• Routing table entries linger after data eviction
• Outdated (or unpopular) docs disappear automatically
• Bipartite eviction – short term policy– New files replace most recent files
– Prevents established files being evicted by attacks
Network Growth
• New nodes have to know one or more guys
• Problem: How to consistently decide on what key the new node specializes in?– Needs to be consensus decision – else denial attacks
• Advertisement IP + H(random seed s0)– Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))…….
– Key for new node = XOR of all seeds
• Each node adds a RT entry for the new node
Network Growth
• Key assigned to new nodes = H(IP)
• Scales as log(n) until n ~ 40000
• At 40000, RTs are full
Protocol
• Nodes with frequently changing IPs use ARKs• Return address specified in requests – threat?• Messages do not always terminate when hops-
to-live reaches 1• Depth is initialized by original requestor to
arbitrarily small value• Request state maintained at each node – timers
- LRU
Fault Resilience
• Median path length < 20 at 30% node failures?
• N/w becomes ineffective at 40% failures ???
Small World• Most nodes form local
clusters
• Few high link connecting nodes
• Power law distribution provides high degree of fault tolerance
Security Concerns
• Pre- routing – mesg. encrypted by public keys which determine path of pre-routing
• Protecting data source – using random and probabilistic methods
Security
• File integrity - KSK vulnerable to dictionary attacks
• DOS attacks – Hash Cash to slow down
• Attempts to displace valid files are constrained by the insert procedure
Conclusion
• Provides a n/w to anonymously store and request files
• Adaptive routing who’s efficiency increases with experience
• Deals with privacy and data integrity in various scenarios
• Applications?– Freedom of speech
– Unaccountable, decentralized Napster