Zuse Zuse -Institute -Institute Berlin Berlin (ZIB) (ZIB) Computer Science Computer Science Research Research Artur Andrzejak Artur Andrzejak Zuse-Institute Berlin (ZIB) Zuse-Institute Berlin (ZIB) Overview: Overview: Challenges in P2P Systems Challenges in P2P Systems
22
Embed
Zuse-Institute Berlin (ZIB) Computer Science Research Artur Andrzejak Zuse-Institute Berlin (ZIB) Overview: Challenges in P2P Systems.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Comprehensiveness and guaranteesComprehensiveness and guarantees
Many of today‘s systems do not guarantee that existing items Many of today‘s systems do not guarantee that existing items will be found at all, or they do not find all itemswill be found at all, or they do not find all items
Query expressivenessQuery expressiveness
Today: only key/keyword searches; range queries, aggregates Today: only key/keyword searches; range queries, aggregates and SQL-like queries desirableand SQL-like queries desirable
EfficiencyEfficiency
A major problem: too many messages for searching, some A major problem: too many messages for searching, some systems even use floodingsystems even use flooding
From arbitrary (Gnutella) to rigid (Napster)From arbitrary (Gnutella) to rigid (Napster)
Rigid topology increases efficiency but decreases autonomyRigid topology increases efficiency but decreases autonomy
Placement of Data/MetadataPlacement of Data/Metadata
Gnutella – only own data; Chord – data/metadata is carefully Gnutella – only own data; Chord – data/metadata is carefully distributed in whole network; superpeers – metadata for distributed in whole network; superpeers – metadata for superpeers is centralizedsuperpeers is centralized
Message RoutingMessage Routing
Each query message is sent to a group of peersEach query message is sent to a group of peers
From unstructured flooding (Gnutella) to sofisticated protocols From unstructured flooding (Gnutella) to sofisticated protocols (Chord, CAN etc.)(Chord, CAN etc.)
(From „Open Problems in Data Sharing Peer-To-Peer Systems“ by Hector (From „Open Problems in Data Sharing Peer-To-Peer Systems“ by Hector Garcia-Molina)Garcia-Molina)
Each unique document or endpoint has a globally unique Each unique document or endpoint has a globally unique identifier (GUID)identifier (GUID)
Locating data can be seen as a routing problem:Locating data can be seen as a routing problem:
clients construct messages addressed with GUIDs and let clients construct messages addressed with GUIDs and let peers pass these messages until object is locatedpeers pass these messages until object is located
Known as Known as Decentralized Object Location and Routing (DOLR) Decentralized Object Location and Routing (DOLR) paradigm or paradigm or Distributed Hash Table (DHT)Distributed Hash Table (DHT)
Advantages: Advantages:
allows for routing messages to objects without knowing their allows for routing messages to objects without knowing their locationlocation
data can be stored anywhere, amidst millions of peers data can be stored anywhere, amidst millions of peers scalabilityscalability
provides locality: use of local resources instead of distant, if provides locality: use of local resources instead of distant, if possiblepossible
Implemented in Chord, CAN, Pastry, Tapestry Implemented in Chord, CAN, Pastry, Tapestry
Essence: Untrusted/Unreliable Components Essence: Untrusted/Unreliable Components Centralized systems have componentsCentralized systems have components which are professionally which are professionally
maintained andmaintained and trusted to behave well trusted to behave well
Components of a P2P-system may crash or fail at any time Components of a P2P-system may crash or fail at any time
((unreliable componentsunreliable components))
Also, the participants might be adversarial, attempting to damage Also, the participants might be adversarial, attempting to damage
the system (the system (untrusted componentsuntrusted components))
Failure rate ~ system size Failure rate ~ system size larger P2P-systems are guaranteed larger P2P-systems are guaranteed
to have malfunctioning components to have malfunctioning components
P2P-system builders must invoke P2P-system builders must invoke new design principles to new design principles to achieve guaranteesachieve guarantees
„„only the aggregate behaviour of many peers can be trusted“only the aggregate behaviour of many peers can be trusted“
Techniques for untrusted components solve issues for unreliable Techniques for untrusted components solve issues for unreliable ones (converse is not true)ones (converse is not true)
ReplicationReplication Redundancy helps to achieve fault tolerance by providing Redundancy helps to achieve fault tolerance by providing
online replacements for faulty resourcesonline replacements for faulty resources
Advanced P2P Systems (Intermemory, OceanStore, Advanced P2P Systems (Intermemory, OceanStore, FreeHaven) use so called FreeHaven) use so called erasure codingerasure coding
Each chunk of data is transformed into many fragmentsEach chunk of data is transformed into many fragments
Very low Fraction of Blocks Lost Per Year (FBLPY)Very low Fraction of Blocks Lost Per Year (FBLPY)
Losses per year for Losses per year for 6 months repair 6 months repair interval:interval:
„„Thermodynamic“ Systems Design Thermodynamic“ Systems Design A new concept of John Kubiatiowicz – A new concept of John Kubiatiowicz – „Stability through „Stability through
Statistics“Statistics“
We can give We can give guarantees on collective behaviourguarantees on collective behaviour while individual while individual nodes are not predictablenodes are not predictable
Over time, the latent order of a system is destroyed – this Over time, the latent order of a system is destroyed – this resembles the 2nd law of thermodynamics: „resembles the 2nd law of thermodynamics: „entropy of closed entropy of closed systems increasessystems increases““
Therefore, Therefore, self-organizing behaviour is necessaryself-organizing behaviour is necessary::
Servers must continuously collect, regenerate and Servers must continuously collect, regenerate and redistribute fragments in a data storage systemredistribute fragments in a data storage system
They must adjust routing links in the DOLR to correct They must adjust routing links in the DOLR to correct changeschanges
They must recognize faults without global communicationThey must recognize faults without global communication
Entropy reduction can be also achieved by Entropy reduction can be also achieved by introspectionintrospection
System observes itself, applies analyses, then adapts System observes itself, applies analyses, then adapts accordinglyaccordingly
Research in the area of IBM‘s Autonomic ComputingResearch in the area of IBM‘s Autonomic Computing