FreeNet: A Distributed Anonymous Information Storage and Retrieval System

FreeNet: A Distributed Anonymous Information Storage and Retrieval System

Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong

FreeNet

• P2P network for anonymous publishing and retrieval of data– Decentralized

– Nodes collaborate in storage and routing

– Data centric routing

– Adapts to demands

– Addresses privacy & availability concerns

Motivation

• Problem - Querying the network– Source - Requestor– Destination – Provider

• It’s a distributed search problem – Approximating global knowledge with local

knowledge– Other systems – Chord, Tapestry, Pastry

• Privacy and availability– Protect authorship, prevent denial attacks

Goals of Freenet

• Anonymity for producers and consumers• Deniability for information storers• Resistance to denial attacks• Efficient storing and routing• Does NOT provide

– Permanent file storage– Load balancing – Anonymity for general n/w usage

Architecture

• Each node – local data store + routing table• Request file through location independent keys• Routing - chain of proxy requests - decision is local• Graph structure actively evolves over time

Request:

1. key

2. Hops to live

3. ID

4. Depth

Key Based Searching

FILE‘D’– key generation Pb + Pr ; SHA(Pb) D

+ Pr

KSKEncrypted FILE Signature

E(FILE, D)

•Keyword signed key(KSK)

•Easy for retrieval – only need ‘D’

•Minimal protection against tampering

Keys and Searching…..

• Problems with KSK – flat namespace (collisions), key squatting, dictionary attacks

• Signed Subspace Key (SSK)– Randomly generated key pair namespace ID– SSK = SHA(‘D’) ^ SHA(Pb) – (-)Advertisement – subspace Pb + ‘D’– (+)Owner can construct hierarchical space of arbitrary

depth - using indirect files– (+)Reduces collision greatly

Keys and Searching…

• Problems with SSK - updating, versioning• Content Hash Keys (CHK)

– Encrypted by a random encryption key– Publish CHK + decryption key– CHK + SSK easily updateable files

• 2 step process – publish file, publish pointer• Results in pointers to newer version• Older versions accessed thru CHK

– Can be used for splitting files

Retrieving Files

• How do u locate the keys?– Hypertext spider – Indirect files – published with KSK of search words– Publish bookmarks

• File retrieval– Request forwarded to node in RT with closest

lexicographic match for the binary key– Request routing follows steepest-ascent hill

climbing: first choice failure backtrack second choice

Still Retrieving….

• Timers, hops - curtail request threads

• Files cached all along the retrieval path

• Self-reinforcing cycle – results in key expertise

c

a

d

b

e

f

Ring Topology

•1000 nodes in ring topology

•Datastore = 50 items

•RT = 250 items

•Keys associated with links are hash of destn IPs

Self Reinforced Routing • Snapshots using 300 requests with hops = 500

• As network converges it drops to 6 - “six degrees of separation”

Retrieval Discussion

• No controlled replication no persistence

• No correlation between keys and content– (+) Documents related to a subject are scattered

• Geographical fault resilience

– (-) No spatial locality – search latencies can suffer• Building indexes by other means

Publishing

• Similar to retrieval but, 2 step process – Detect collisions – ‘all clear’ if no collision

– Publish to node in RT with closest key match

• Are CD and publish paths same?– Can result in collision during publish step

• Inserts allow new nodes to advertise themselves

• (+) Key-squatting is not effective

Data Management

• Finite data stores - nodes resort to LRU

• Routing table entries linger after data eviction

• Outdated (or unpopular) docs disappear automatically

• Bipartite eviction – short term policy– New files replace most recent files

– Prevents established files being evicted by attacks

Network Growth

• New nodes have to know one or more guys

• Problem: How to consistently decide on what key the new node specializes in?– Needs to be consensus decision – else denial attacks

• Advertisement IP + H(random seed s0)– Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))…….

– Key for new node = XOR of all seeds

• Each node adds a RT entry for the new node

Network Growth

• Key assigned to new nodes = H(IP)

• Scales as log(n) until n ~ 40000

• At 40000, RTs are full

Protocol

• Nodes with frequently changing IPs use ARKs• Return address specified in requests – threat?• Messages do not always terminate when hops-

to-live reaches 1• Depth is initialized by original requestor to

arbitrarily small value• Request state maintained at each node – timers

- LRU

Fault Resilience

• Median path length < 20 at 30% node failures?

• N/w becomes ineffective at 40% failures ???

Small World• Most nodes form local

clusters

• Few high link connecting nodes

• Power law distribution provides high degree of fault tolerance

Security Concerns

• Pre- routing – mesg. encrypted by public keys which determine path of pre-routing

• Protecting data source – using random and probabilistic methods

Security

• File integrity - KSK vulnerable to dictionary attacks

• DOS attacks – Hash Cash to slow down

• Attempts to displace valid files are constrained by the insert procedure

Conclusion

• Provides a n/w to anonymously store and request files

• Adaptive routing who’s efficiency increases with experience

• Deals with privacy and data integrity in various scenarios

• Applications?– Freedom of speech

– Unaccountable, decentralized Napster

FreeNet: A Distributed Anonymous Information Storage and Retrieval System

Documents

key squatting

retrieval pathself

key pair namespace idssk

closest key matchare

items keys

retrieval systemian

binary keyrequest routing

distributed search problem