Top Banner
CS 744: GRAPHX Shivaram Venkataraman Fall 2021
19

CS 744: GRAPHX

Mar 16, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 744: GRAPHX

CS 744: GRAPHX

Shivaram VenkataramanFall 2021

Page 2: CS 744: GRAPHX

ADMINISTRIVIA

- Midterm grades today?- Course Project: Check in by Nov 30th

Page 3: CS 744: GRAPHX

Scalable Storage Systems

Datacenter Architecture

Resource Management

Computational Engines

Machine Learning SQL Streaming Graph

Applications

Page 4: CS 744: GRAPHX

POWERGRAPH

Programming Model:Gather-Apply-Scatter

Better Graph Partitioningwith vertex cuts

Distributed execution (Sync, Async)

What is different from dataflow system e.g., Spark?

What are some shortcomings?

Page 5: CS 744: GRAPHX

THIS CLASS

GraphXCan we efficiently map graph abstractions to dataflow engines?

Scalability! But at what COST?When should we distribute graph processing?

Page 6: CS 744: GRAPHX

MOTIVATION

Page 7: CS 744: GRAPHX

SYSTEM OVERVIEWAdvantages?

Page 8: CS 744: GRAPHX

PROGRAMMING MODELConstructor

Triplets

Page 9: CS 744: GRAPHX

MR TRIPLETSmrTriplets(f: (Triplet) => M, sum: (M, M) => M): Collection[(Id, M)]

Page 10: CS 744: GRAPHX

PREGEL USING GRAPHXdef Pregel(g: Graph[V, E],

vprog: (Id, V, M) => V,sendMsg: (Triplet) => M,gather: (M, M) => M): = {

g.mapV((id, v) => (v, halt=false))

while (g.vertices.exists(v => !v.halt)) {val msgs: Collection[(Id, M)] =

g.subgraph(ePred=(s,d,sP,eP,dP)=>!sP.halt).mrTriplets(sendMsg, gather)

g = g.leftJoinV(msgs).mapV(vprog) }

return g.vertices}

Page 11: CS 744: GRAPHX

IMPLEMENTING TRIPLETS VIEWJoin strategy

Send vertices to the edge site

Multicast joinUsing routing table

Page 12: CS 744: GRAPHX

SCALABILITY VS. ABSOLUTE PERFORMANCE

GraphX3x from 8 to 32 machines

PowerGraph2.6x from 8 to 32

Page 13: CS 744: GRAPHX

COST: Configuration THAT OUT-PERFORMS SINGLE THREAD

Page 14: CS 744: GRAPHX

DISCUSSIONhttps://forms.gle/u4TvMumnH7yBHd3b8

Page 15: CS 744: GRAPHX

What are some reasons why GraphX or GraphLab or Naiad might be slower than a single thread implementation of PageRank?

Page 16: CS 744: GRAPHX

How would you expect a single-thread QR implementation to perform?

Page 17: CS 744: GRAPHX

SUMMARY

GraphX: Combine graph processing with relational model

COST- Configuration that outperforms single-thread- Measure scalability AND absolute performance

- Computation model of scalable frameworks might be limited- Hardware efficiency matters- System/Language overheads

Page 18: CS 744: GRAPHX

NEXT STEPS

Next class: MariusProject check-ins by Nov 20th

Page 19: CS 744: GRAPHX

OPTIMIZING MR TRIPLETS

Filtered Index ScanningStore edges clustered on source vertex idFilter triplets using user-defined predicate

Automatic Join EliminationSome UDFs don’t access source or dest propertiesInspect JVM byte code to avoid joins