Mul$threaded Graph Coloring Algorithms for Scien$fic Compu$ng on Many‐core Architectures Assefaw Gebremedhin [email protected]Purdue University ICCS Workshop on Manycore and Accelerator‐based High‐Performance Scien$fic Compu$ng Berkeley, January 28, 2011
32
Embed
Multhreaded Graph Coloring Algorithms for …...Multhreaded Graph Coloring Algorithms for Scienfic Compung on Many‐core Architectures Assefaw Gebremedhin [email protected] Purdue
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Mul$threaded Graph Coloring Algorithms for
Scien$fic Compu$ng on Many‐core Architectures Assefaw Gebremedhin [email protected]
Purdue University
ICCS Workshop on Manycore and Accelerator‐based High‐Performance Scien$fic Compu$ng
Berkeley, January 28, 2011
CSCAPES
www.cscapes.org
2
Coloring and its applica$ons
• Graph coloring is an abstrac$on for par$$oning a set of binary‐related objects into few “independent sets”
• Coloring contributed to the growth of much of Graph Theory
• Our work on coloring is mo$vated by its prac$cal applica$ons:
– Concurrency discovery in parallel (scien$fic) compu$ng
• Low available concurrency • Poor data locality • Irregular memory access pagern
• Access pagern determined only at run$me
• High data access to computa$on ra$o
16
Parallel Coloring Algorithms
• Independent‐set based (previous approaches) – Find maximal independent set in parallel (Luby’s algorithm) – Limited (or no) success
• Itera$on and specula$on Itera(ve Algorithm (G=(V,E)) Order V in parallel
U = V while U is not empty 1. Specula(vely color ver(ces in U in parallel; 2. Check consistency of colors in U in parallel, store conflicts in R;
U = R;
• Dataflow – Fine‐grain (edge‐level) synchroniza$on; no itera$on – Feasible when there is HW support for FGS (like the Cray XMT)
17
Enhancing the Itera$ve Algorithm
• Color choice – First Fit – Staggered First Fit – Least Used – Random
• Resolving a conflict – Randomiza$on
18
Ordering is inherently sequen$al Remedy: approxima$on
Illustra(on:
Smallest Last ordering
19
Experimental Results on Parallel Performance
20
Test plauorms
Cray XMT Sun Niagara 2 Intel Nehalem
128 processors 128 hardware thread streams per processor
cache‐less, globally accessible shared memory
hardware support for fine‐grain synchroniza$on
two 8‐core sockets 8 hardware threads per socket
L1 cache on core, shared L2 cache
two quad‐core chips two hyperthreads per core
private L1 and L2 cache, shared L3 cache
!"#$%&
'%()*#+%'
,$-+%".
/()0.123/4
5 6 ! 678
9%+:$**+%&5
,$-+%".
/()0.123/4
,$-+%".
/()0.123/4
!"#$%&'()*+#)',-.$/0#)1'2%3*$4'56'!"#$%& '(!)
*+$,(,-./0-.%(&,1223+45(-$(67("#$%&(5.-413-.+$#
;<=':> /?@@$%
,$-+%".A+)'%+BB$%
89(:;.1&(<%$0;.=
!"#$%&
'%()*#+%'
5 6 ! 678
9%+:$**+%&6
;<=':> /?@@$%
,$-+%".A+)'%+BB$%
!"#$%&
'%()*#+%'
5 6 ! 678
9%+:$**+%&!
;<=':> /?@@$%
,$-+%".A+)'%+BB$%
!"#$%&'()*+#)',%-*$.
!"#$%&'/0'1#2"%'34'5#6789
!"#$%&'
($)*%$++"%
4:;'1#2"%'1$*88+#$
< = 0 > ? @ A B
($%",-
./'(012"
< = 0 > ? @ A B
($%",/
./'(012"
< = 0 > ? @ A B
($%",3
./'(012"
!"#$%&'
($)*%$++"%
!"#$%&'
($)*%$++"%
!"#$%&'
($)*%$++"%
!"#$%&'/0'1#2"%'34'5#6789
!"#$%&'
($)*%$++"%
4:;'1#2"%'1$*88+#$
< = 0 > ? @ A B
($%",-
./'(012"
< = 0 > ? @ A B
($%",/
./'(012"
< = 0 > ? @ A B
($%",3
./'(012"
!"#$%&'
($)*%$++"%
!"#$%&'
($)*%$++"%
!"#$%&'
($)*%$++"%
!"#$%&'()*+#)',%-*$.
!"#$%&'/0'1#2"%
!"# !"$
%&'()#
*$+%,-.(
*/+%,-.(
!"# !"$
%&'()$
*$+%,-.(
*/+%,-.(
!"# !"$
%&'()/
*$+%,-.(
*/+%,-.(
!"# !"$
%&'()0
*$+%,-.(
*/+%,-.(
,%-*$.'
1*34$*))%$567
!"#$%&'/0'1#2"%
!"# !"$
%&'()#
*$+%,-.(
*/+%,-.(
!"# !"$
%&'()$
*$+%,-.(
*/+%,-.(
!"# !"$
%&'()/
*$+%,-.(
*/+%,-.(
!"# !"$
%&'()0
*$+%,-.(
*/+%,-.(
567,%-*$.'
1*34$*))%$
21
Test graphs
sc : graphs from scien$fic compu$ng apps er : R‐MAT (0.25, 0.25, 0.25, 0.25) g : R‐MAT (0.45, 0.15, 0.15, 0.25) b : R‐MAT (0.55, 0.15, 0.15, 0.15) 22
Distance‐2 coloring: # colors
Nehalem
23
Distance‐2 coloring: # colors
Nehalem
24
Distance‐2 coloring: run$me
Nehalem
25
Distance‐2 coloring: run$me
Nehalem
26
Distance‐1 coloring: # colors
Nehalem, Niagara 2, Cray XMT 27
Distance‐1 coloring : run$me
!
"
#
$
!%
&"
%#
!"$
"'%
! " # $ !%
!"#
$%&"'%($)*'+(,
-.#/$0%*1%)*0$(
!()*+,-./01+,
"()*+,-.2/01+,
#()*+,-.2/01+,
$()*+,-.2/01+,
!"#$
!"$
%
#
&
'
%(
)#
(&
%#'
#$(
% # & ' %( )# (& %#'
!"#
$%&"'%($)*'+(,
-.#/$0%*1%20*)$((*0(
*+,-.#&
*+,-.#$
*+,-.#(
*+,-.#/
!"#$%
!"$%
!"%
#
$
&
'
#(
)$
(&
#$'
$%(
# $ & ' #( )$ (& #$'
!"#
$%&"'%($)*'+(,
-.#/$0%*1%20*)$((*0(
*+,-.$&
*+,-.$%
*+,-.$(
*+,-.$/
Small‐world graph with 224 = 16M ver$ces and 134M edges
Itera$ve Dataflow
Niagara 2 Nehalem
Small‐world graphs with 224, …, 227 ver$ces and 134M, …, 1B edges
Itera$ve Itera$ve
Cray XMT Cray XMT
28
Itera$ve: looking inside
Nehalem, Niagara 2, Cray XMT 29
A “generic” paralleliza$on technique?
• “Standard” Par$$oning – Break up the given problem into p independent subproblems of
almost equal sizes – Solve the p subproblems concurrently
• “Relaxed” Par$$oning – Break up the problem into p, not necessarily en$rely
independent, subproblems of almost equal sizes – Solve the p subproblems concurrently – Detect inconsistencies in the solu$ons concurrently – Resolve any inconsistencies
Can be used poten/ally successfully if the resolu/on in the fourth step involves only local adjustments
30
Thanks
• Erik Boman, Doruk Bozdag, Umit Catalyurek, John Feo, Mahantesh Halappanavar, Bruce Hendrickson, Paul Hovland, Fredrik Manne, Duc Nguyen, Mostafa Patwary, Alex Pothen, Arijit Tarafdar, Andrea Walther
• Financial Support: DOE, NSF
31
Some References • Gebremedhin, Nguyen, Pothen and Patwary. ColPack: Graph Coloring So{ware for
Deriva$ve Computa$on and Beyond. ACM Trans. Math. Soaware. Submiged. 2010.
• Gebremedhin, Manne and Pothen. What color is your Jacobian? Graph coloring for compu$ng deriva$ves. SIAM Review 47(4):627—705, 2005.
• Gebremedhin, Tarafdar, Manne and Pothen. New acyclic and star coloring algorithms with applica$ons to compu$ng Hessians. SIAM J. Sci. Comput. 29:1042—1072, 2007.
• Gebremedhin, Pothen and Walther. Exploi$ng sparsity in Jacobian computa$on via coloring and automa$c differen$a$on: a case study in a Simulated Moving Bed process. AD2008, LNCSE 64:339‐‐‐349, 2008.
• Catalyurek, Feo, Gebremedhin, Halappanavar, Pothen. Mul$threaded Algorithms for Graph Coloring. In submission, 2011.
• Bozdag, Catalyurek, Gebremedhin, Manne, Boman and Ozguner. Distributed‐memory parallel algorithms for distance‐2 coloring and related problems in deriva$ve computa$on. SIAM J. Sci. Comput. 32(4):2418‐‐2446, 2010.
• Bozdag, Gebremedhin, Manne, Boman and Catalyurek. A framework for scalable greedy coloring on distributed‐memory parallel computers. J. Parallel Distrib. Comput. 68(4):515—535, 2008.
• For more informa$on: www.cs.purdue.edu/homes/agebreme 32