2
Content
Grid vs. P2P
What’s the request
Why P2P architecture
Issues of P2P
P2P case study- Freenet
Design
3
Grid vs. P2P
Grid Standards- based
Persistent
Addresses security issues
Resources are more powerful,more diverse,
better connected
Data intensive
Facing problems of autonomic configuration
and management
Not much scalable
4
Grid vs. P2P
P2P
Much scalability
Fault tolerance
Self-configuration
Automatic problem determination
Higher variable behaviour
But lack of infrastructure
Security problems
Less concerned with qualities of service
5
What’s the request
A user requests the car service, and keeps logs
recording if the request success or fail
The user may asks all other users about history
request records. By statistic, we can know
particular service responding ability.
Which can also gives prediction of further request.
6
Why P2P
Not run-time information
Better fault tolerance,
Pull model efficient and less network traffic
7
Issues of P2P - Topology
8
Issues of P2P - Response Modes
9
Issues of P2P
… It turns to be problem of query from distributed data stores, which is different from central database query …
10
Issues of P2P - Query Processing
Recursively Partitionable Query
11
Issues of P2P - Abort Timeout (1) Problems
- User no longer interested in query results- Query will forever roaming the network without stop it- The query should be fade away after sometime- Static timeout remains unchanging across hops Solution ->Dynamic Abort Timeout - Nodes further away from the originator timeout earlier than nodes closer to the originator.- Decrease the timeout at each hop- Exponential decay with halving
12
Issues of P2P - Abort Timeout (2)
13
Issues of P2P - Query Scope (1)
Problems- No necessary to search the whole net- Broadcast model will flooding the network.
Solutions -> Select a neighbour subset
- Search only a specific domain, host, owner - Random select half of the neighbours
- In a tree-like topology, select all child and ignore all parent
- Only find a single result.
- Specify the maximum number of result (maxResults) and
bytes(maxResultBytes) to be returned.
14
Issues of P2P - Query Scope (2)
Maintain a statistics about its neighbours. Only select neighbours that meet minimum requirements in term of latency, bandwidth or historic (maxLatency, minBandwidth, minHistoricResult)
Neighbour Selection Query Radius of a query - is a measure of path length.
- Set the maximum number of hops a query is allowed to travel- The radius is decreased by one at each hop.- The roaming query and response fade away when a radius of less than zero.
15
Issues of P2P - Routing
Random forwarding(random walk)
Learning: nodes record the requests answered by
other nodes. A request is forwarded to the peer that answered similar requests previously or randomly.
Best neighbour: records the number of answers
received from each peer. A request is forwarded to the peer who answered the largest number of requests.
Learning + best neighbour: identical with the
learning, when no relevant experience exists, the request is forwarded to the best neighbour.
16
P2P Case Study - Freenet
Freenet provides a file-storage service
The network is entirely decentralised Information publishers and consumers are anonymous Communications are encrypted Files in the data store are encrypted
17
Adding New File
A user assigns the file a GUID key, sends an insert message, containing file identifier(GUID) and a time-to-live(TTL)value.
GUID is location-independent globally unique identifier. By hashing the contents of the file.
On receiving an insert, the node checks if the key already exist. If not, stores it, creates a routing entry for it, looks up the closest key, and forwards the message to the related node.
If TTL expires, the final node returns an “all clear” message. The user then sends the data alone the path.
18
Requesting File
Every node maintains a routing table, listing addresses of other nodes and GUID keys.
On receiving a query, it first checks its own store. If it finds the file, it announces itself as the holder. Otherwise, it forwards the query to the node with the closest key.
If the file is found, each node passes the file alone the chain, and creates a new entry in its routing table.
Each node might also cache a copy locally. The query maintains a TTL, decreased at each hop. If a node runs out of candidates, it reports failure and back
the its predecessor, which then tries its second choice
19
Adding New Node
New node sends a announcement to an existing
node, with a TTL.
The receiving node forwards the announcement to
another node chosen randomly from its routing
table.
The announcement continues to propagate until
its TTL runs out.
20
Training Routes
Nodes that reliably answer queries will be added
to more routing tables.
Well-known nodes tend to see more requests and
become better connected.
Similar keys tend to cluster in the nodes along
the same path, because requests will be for
similar files which have similar keys.
21
Managing Storage
Given finite disk space, sometime need to decide
which file to keep.
Freenet decides by the frequency of requests per
file, keeps the more popular files.
Frequently requested files have more copies in the
network. Tree grows in that direction
Unrequested files are subjected to delete. Tree
shrinks in that direction.
22
Design
Tree Topology
Each node maintains a Log File
Each node also maintains a Local Data Store for storing the queries result.
23
Design
Adding New Node - When a new node adds to the network, it connects itself to only one existing node.
Adding Log Record
- When a user accesses services, a log record will be created - Log records should provide information about
service name, service accessing time, success/fail flag
24
Design - Query
Query - When a node sets up a query, it first looks up its local data store to see if the same query exists. - If it is a new query , the node multicasts a query message to all connecting nodes. The query message contains Query Conditions, Maximum Data Volume value and a Dynamic Abort Timeout(DAT) value. - Query Condition may contains time period which user concerns, services name etc.
25
Query
- On receiving the query message, a node first looks up its own local data store, if there is no same query, it multicasts the query to all connecting nodes.
- When DAT expires, the final node begins to return data along the chain.
- Response using Routed Response mode
26
Design - Query
- To reduce network traffic, calculation will operate at each node. Using Recurisively Query Plan. The calculation result will propagate up along the chain.
27
Design - Query
- To avoid data flooding, only necessary volume data will be calculated, that is specified by Maximum Data Volume
- Each chain will return zero or one result- Dynamic Abort Time (DAT) using Exponential
decay with halving model. DAT will decrease at each hop.
28
Design
Calculation - By particular statistics methodology Showing Result - Final result will be shown in graph style - The query result will also be saved in the Local Data Store Deleting log records - To save disk space, early log records should be
deleted after period of time
29
Grid vs. P2R
Thanks !