Pregel
Pregel
• A System for Large-Scale Graph Processing• Sufficiently flexible to express arbitrary graph
algorithms• So easy
Pregel: Model Of Computation
• Vertex state
• Terminate codition: all vertex are inactive
Pregel: Model Of Computation
• Sequence of supersteps• Invoke compute() for each active vertex• Each vertex can– Modify its state, its outgoing edges– Recive messages– Send messages to another
Pregel: Model Of Computation
Pregel: Model Of Computation
Pregel API
Pregel API
• Combiners• Aggregators• Topology Mutations• Input and Output
Giraph
Why not implement Giraph with multiple MapReduce jobs
• Too much disk, no in-memory caching, a superstep becomes a job!
Giraph is a single Map-only job in Hadoop
• Hadoop is purely a resource manager for Giraph, all communication is done through Netty-based IPC
Maximum vertex value implementation
Giraph components
• Master– One active master at a time– Assign partition owners to workers prior to each
superstep– Synchronize supersteps
• Worker– Load the graph from input– Does the computation/messaging of its assigned
partitions
Graph distribution