1 Grid vs. Peer-to-Peer Yin Chen [email protected] 25 June 2003.

1

Grid vs. Peer-to-Peer

Yin [email protected]

25 June 2003

2

Content

Grid vs. P2P

What’s the request

Why P2P architecture

Issues of P2P

P2P case study- Freenet

Design

3

Grid vs. P2P

Grid Standards- based

Persistent

Addresses security issues

Resources are more powerful,more diverse,

better connected

Data intensive

Facing problems of autonomic configuration

and management

Not much scalable

4

Grid vs. P2P

P2P

Much scalability

Fault tolerance

Self-configuration

Automatic problem determination

Higher variable behaviour

But lack of infrastructure

Security problems

Less concerned with qualities of service

5

What’s the request

A user requests the car service, and keeps logs

recording if the request success or fail

The user may asks all other users about history

request records. By statistic, we can know

particular service responding ability.

Which can also gives prediction of further request.

6

Why P2P

Not run-time information

Better fault tolerance,

Pull model efficient and less network traffic

7

Issues of P2P - Topology

8

Issues of P2P - Response Modes

9

Issues of P2P

… It turns to be problem of query from distributed data stores, which is different from central database query …

10

Issues of P2P - Query Processing

Recursively Partitionable Query

11

Issues of P2P - Abort Timeout (1) Problems

- User no longer interested in query results- Query will forever roaming the network without stop it- The query should be fade away after sometime- Static timeout remains unchanging across hops Solution ->Dynamic Abort Timeout - Nodes further away from the originator timeout earlier than nodes closer to the originator.- Decrease the timeout at each hop- Exponential decay with halving

12

Issues of P2P - Abort Timeout (2)

13

Issues of P2P - Query Scope (1)

Problems- No necessary to search the whole net- Broadcast model will flooding the network.

Solutions -> Select a neighbour subset

- Search only a specific domain, host, owner - Random select half of the neighbours

- In a tree-like topology, select all child and ignore all parent

- Only find a single result.

- Specify the maximum number of result (maxResults) and

bytes(maxResultBytes) to be returned.

14

Issues of P2P - Query Scope (2)

Maintain a statistics about its neighbours. Only select neighbours that meet minimum requirements in term of latency, bandwidth or historic (maxLatency, minBandwidth, minHistoricResult)

Neighbour Selection Query Radius of a query - is a measure of path length.

- Set the maximum number of hops a query is allowed to travel- The radius is decreased by one at each hop.- The roaming query and response fade away when a radius of less than zero.

15

Issues of P2P - Routing

Random forwarding(random walk)

Learning: nodes record the requests answered by

other nodes. A request is forwarded to the peer that answered similar requests previously or randomly.

Best neighbour: records the number of answers

received from each peer. A request is forwarded to the peer who answered the largest number of requests.

Learning + best neighbour: identical with the

learning, when no relevant experience exists, the request is forwarded to the best neighbour.

16

P2P Case Study - Freenet

Freenet provides a file-storage service

The network is entirely decentralised Information publishers and consumers are anonymous Communications are encrypted Files in the data store are encrypted

17

Adding New File

A user assigns the file a GUID key, sends an insert message, containing file identifier(GUID) and a time-to-live(TTL)value.

GUID is location-independent globally unique identifier. By hashing the contents of the file.

On receiving an insert, the node checks if the key already exist. If not, stores it, creates a routing entry for it, looks up the closest key, and forwards the message to the related node.

If TTL expires, the final node returns an “all clear” message. The user then sends the data alone the path.

18

Requesting File

Every node maintains a routing table, listing addresses of other nodes and GUID keys.

On receiving a query, it first checks its own store. If it finds the file, it announces itself as the holder. Otherwise, it forwards the query to the node with the closest key.

If the file is found, each node passes the file alone the chain, and creates a new entry in its routing table.

Each node might also cache a copy locally. The query maintains a TTL, decreased at each hop. If a node runs out of candidates, it reports failure and back

the its predecessor, which then tries its second choice

19

Adding New Node

New node sends a announcement to an existing

node, with a TTL.

The receiving node forwards the announcement to

another node chosen randomly from its routing

table.

The announcement continues to propagate until

its TTL runs out.

20

Training Routes

Nodes that reliably answer queries will be added

to more routing tables.

Well-known nodes tend to see more requests and

become better connected.

Similar keys tend to cluster in the nodes along

the same path, because requests will be for

similar files which have similar keys.

21

Managing Storage

Given finite disk space, sometime need to decide

which file to keep.

Freenet decides by the frequency of requests per

file, keeps the more popular files.

Frequently requested files have more copies in the

network. Tree grows in that direction

Unrequested files are subjected to delete. Tree

shrinks in that direction.

22

Design

Tree Topology

Each node maintains a Log File

Each node also maintains a Local Data Store for storing the queries result.

23

Design

Adding New Node - When a new node adds to the network, it connects itself to only one existing node.

Adding Log Record

- When a user accesses services, a log record will be created - Log records should provide information about

service name, service accessing time, success/fail flag

24

Design - Query

Query - When a node sets up a query, it first looks up its local data store to see if the same query exists. - If it is a new query , the node multicasts a query message to all connecting nodes. The query message contains Query Conditions, Maximum Data Volume value and a Dynamic Abort Timeout(DAT) value. - Query Condition may contains time period which user concerns, services name etc.

25

Query

- On receiving the query message, a node first looks up its own local data store, if there is no same query, it multicasts the query to all connecting nodes.

- When DAT expires, the final node begins to return data along the chain.

- Response using Routed Response mode

26

Design - Query

- To reduce network traffic, calculation will operate at each node. Using Recurisively Query Plan. The calculation result will propagate up along the chain.

27

Design - Query

- To avoid data flooding, only necessary volume data will be calculated, that is specified by Maximum Data Volume

- Each chain will return zero or one result- Dynamic Abort Time (DAT) using Exponential

decay with halving model. DAT will decrease at each hop.

28

Design

Calculation - By particular statistics methodology Showing Result - Final result will be shown in graph style - The query result will also be saved in the Local Data Store Deleting log records - To save disk space, early log records should be

deleted after period of time

29

Grid vs. P2R

Thanks !

1 Grid vs. Peer-to-Peer Yin Chen [email protected] 25 June 2003.

Documents