Top Banner
!"#$% '$()*$'%+ ")*, -()+ !.$ -.)/%.)+ !$00 -"(1+ 0-.'(") 2(12$-%3$'/$'+ #"'1() *$ -"%"+ !"#$%$&'()*%+, ./' 0%1")# (..(2$1 4.)%5-1")1 6'$1('$* ()1$-7 *(%)$, ")(#"1(.) *(%)$, ")(#"1(.) %(*$08 9(8"' "51.*$%: ;<=>?@A?<= Theater 411 !BCD EA+ ;FAG
55

Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Sep 01, 2018

Download

Documents

voque
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

!"#$%&'$()*$'%+")*,&-()+

!.$&-.)/%.)+!$00&-"(1+

0-.'(")&2(12$-%3$'/$'+#"'1()&*$&-"%"+

!"#$%$&'()*%+,-./'-0%1")#-(..(2$14.)%5-1")1&6'$1('$*&()1$-7*(%)$,&")(#"1(.)*(%)$,&")(#"1(.)%(*$089(8"'"51.*$%:

;<=>?@A?<=Theater 411!BCD&EA+&;FAG

Page 2: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Agenda2:00pm - 2:40pm: Multithreading Introduction and Overview

James Reinders

2:40pm - 3:25pm: Parallel Processing and Multi-Process Evaluation in a Host ApplicationAndy Lin and Joe Longson

3:25pm: Break

3:35pm - 4:05pm: Threading SOPs in HoudiniJeff Lait

4:05pm - 4:35pm: Building a Scalable Evaluation Engine for PrestoFlorian Zitzelsberger

4:35pm - 5:15pm: Parallel Evaluation of Animated Maya CharactersMartin De Lasa

Page 3: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

!"#$% &%'()*+,-.

///0"#$% &%'()*+&12*1+,-.03(2

!"#$%&%'()*+&,-./0(.1&2#*$.3//)4%25

Page 4: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

“Think Parallel”

Page 5: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

“Think Parallel”– Parallelism is almost never effective to

“stick in” at the last minute in a program– Think about everything

in terms of how to do in parallelas cleanly as possible

Page 6: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Motivation

Page 7: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

456789:5;*")<5=)&1+)(<:5#<)+5/&%'5>)("&<<&3105'%%>?@@$3%<3-A3()<0A3"

!"#$%&'($!&#)*$+

8B9C58DEF:5677C585GEF

677H5C5GEF:5<%&$$5%3+*IJ

>3/)(5/*$$5K5LMN5/*$$5K5")"3(I5/*$$<3$,)?5>*(*$$)$5'*(+/*()5K5).>$&A&%5>*(*$$)$5<3-%/*()5K5

<3-%/*()5")"3(I53>%&"&F*%&31

Page 8: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

456789:5;*")<5=)&1+)(<:5#<)+5/&%'5>)("&<<&3105'%%>?@@$3%<3-A3()<0A3"

Page 9: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

O3()5*1+5P'()*+5O3#1%< Q&+%'

R&12$)5A3():5<&12$)5%'()*+:5(#$)+5-3(5+)A*+)<0D#$%&%'()*+?52(3/5+&)5*()*5<"*$$5S5-3(5*++&%&315'*(+/*()5%'()*+T<U5<'*(&125()<3#(A)<0D#$%&A3()@D*1I5O3()?5877S5+&)5*()*5-3(5*++&%&31*$5'*(+/*()5%'()*+5/&%'3#%5<'*(&12:

456789:5;*")<5=)&1+)(<:5#<)+5/&%'5>)("&<<&3105'%%>?@@$3%<3-A3()<0A3"

V*%*5>*(*$$)$&<"?5'*1+$&125"3()5+*%*5*%531A):"#$%&WI%):5"#$%&/3(+:5"*1I5/3(+<0

Page 10: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

456789:5;*")<5=)&1+)(<:5#<)+5/&%'5>)("&<<&3105'%%>?@@$3%<3-A3()<0A3"

N*(*$$)$5'*(+/*()5+)<&21<5A*15W)

"3() )--&A&)1%5&-5%')I5A*15*<<#")5%'*%

>(32(*"<5/&$$5#<)5<A*$&125-3(5>)(-3("*1A)5/)$$0

Page 11: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Task• Key thing to know:

– Program in TASKS not THREADS.

http://tinyurl.com/threadsYUCK

Page 12: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Task• Key thing to know:

– Program in TASKS not THREADS.

This means:– Programmer’s job: identify (lots) of tasks to do– Runtime’s job: map tasks onto threads at runtime

Page 13: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

James’ BIG 3 REASONSto avoid programming to specific hardware mechanisms

1. Portability is impaired severely when coding “close to the hardware”

2. Nested parallelism is IMPORTANT(nearly impossible to manage well using“mandatory parallelism” methods such as threads)

3. Other parallelism mechanisms (e.g., vectorization) need to be considered and balanced.

Page 14: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Parallel Programming• No widely used (popular) programming language

was designed for expressing parallel programming.– Not Fortran, C, C++, Java, C#, Perl, Python, Ruby

• This creates many challenges• Fundamental question of all programming

languages: level of abstraction

Page 15: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Programming Model Ideal Goals• Performance

– achievable, scalable, predictable, tunable, portable• Productivity

– expressive, composable, debugable, maintainable• Portability

– functionality & performance acrossoperating systems, compilers, targets

Page 16: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Level of Abstraction: Parallelism• There is no “perfect” answer (one size fits all)

• Higher level programming (more abstract):– Desired benefits:

More portable, more performance portable, better investment preservation over time.

• Lower level programming (less abstract):– Desired benefits:

More control for the programmer.

Page 17: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Level of Abstraction: Parallelism• There is no “perfect” answer (one size fits all)

• Higher level programming (more abstract):– Desired benefits:

More portable, more performance portable, better investment preservation over time.

• Lower level programming (less abstract):– Desired benefits:

More control for the programmer.

TBB,OpenMP*

OpenCL*,CUDA*

*Thirdpartymarksmaybeclaimedasthepropertyofothers.

Page 18: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Advancing C and C++

Page 19: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

KKKRC9=476:53IG:>6:53I>BF;@RB=3! #RVI&<R<BCNL&4dd&NZVILNSIJRP! hJPORUVi! -JPBci! #NS&.%i 8! 8ZRc&EjF! %RCNLJVi! 0LMM3%*i! (PIMC&<LRSMVVRLV! "#*&<LRSMVVRLV! %9"'4&<LRSMVVRLV! (3#&<LRSMVVRLV! R<MP&VRBLSM! VINPONLO&SR==JIIMM&VBZ=JVVJRPV

The most used method to parallelize C++ programs.Z Y%')(51*")<5*1+5W(*1+<5"*I5W)5A$*&")+5*<5%')5>(3>)(%I53-53%')(<0

Page 20: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Easiertomaintain,scalesbetter,easiertodebug(getcorrect)

Page 21: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Work StealingThreaddeque

mailbox

Threaddeque

mailbox

Threaddeque

mailbox

Threaddeque

mailbox

CacheAffinity2.Stealtaskadvertisedinmailbox

Loadbalance3.Stealoldesttaskfromrandomvictim

Locality1.Takeyoungesttaskfrommydeque

Page 22: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Work Depth First; Steal Breadth First

L1

L2

victimthread

Bestchoicefortheft!• bigpieceofwork• datafarfromvictim’shotdata.

Secondbestchoice.

Page 23: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

1F7>4-.B=K7=6-%5C4>U-$9=476:53-NG:>6:53-N>BF;@-VRW-@F7>4@-4XF4HC:B57>>J-K4>>%5C4>U-$9=476:53-NG:>6:53-N>BF;@-VRW-@F7>4@-4XF4HC:B57>>J-K4>>

Page 24: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

TBB ComponentsGenericParallelAlgorithms

Efficientscalablewaytoexploitthepowerofmulti-corewithouthavingtostartfrom

scratch.

Concurrent Containers

Concurrentaccess,andascalablealternativetoserialcontainerswithexternallocking

TaskScheduler

Sophisticatedworkschedulingenginethatempowersparallelalgorithmsandtheflow

graph

Threads

OSAPIwrappers

Miscellaneous

Thread-safe timersandexception

classes

MemoryAllocation

Scalablememorymanagerandfalse-sharingfreeallocators

SynchronizationPrimitives

Atomicoperations,avarietyofmutexes withdifferentproperties,conditionvariables

FlowGraph

Asetofclassestoexpressparallelismasagraphofcomputedependenciesand/or

dataflow

Parallelalgorithmsanddatastructures

Threadsandsynchronization

Memoryallocationandtaskscheduling

ThreadLocalStorage

Unlimitednumberofthread-localvariables

Page 25: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

without lambda, code has to go in a classclass ApplyABC {public:

float *a;float *b;float *c;ApplyABC(float *a_,float *b_,float *c_):a(a_), b(b_), c(c_) {}void operator()(const blocked_range<size_t>& r) const {

for(size_t i=r.begin(); i!=r.end(); ++i) a[i] = b[i] + c[i];

}};

void ParallelFoo( float* a, float *b, float *c, int n ) {parallel_for(blocked_range<size_t>(0,n,10000),ApplyABC(a,b,c) );

puttingcodeinaclassisannoying

Page 26: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

doing with lambdas support is more natural

void ParallelApplyFoo(size_t n, int x) {parallel_for( blocked_range<size_t>(0,n,1000),

[&]( const blocked_range<size_t>& r ) -> void{

for( size_t i=r.begin(); i!=r.end(); ++i )a[i] = b[i] + c[i];

});}

mucheasiertoteach/readwithlambdas

Page 27: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Intel® TBB Class Graph: ComponentsInteresting “new” aspect since TBB 4.0 Release (2011)

• Graphobject– Containsapointertotheroottask– Ownstaskscreatedonbehalfofthegraph– Userscanwaitforthecompletionofalltasksof

thegraph

• Graphnodes– Implementsender and/orreceiver interfaces– Nodesmanagemessagesand/orexecute

functionobjects

• Edges– Connectpredecessorstosuccessors

Graphobject==graphhandle

Graphnode

Edge

Page 28: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

TBB• Reasons it might matter:

– Everywhere (ports, open source)– Forever (commercial and open source)– Performance Portable

• Problems:– C++ not C

threadingbuildingblocks.org

Page 29: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Cilk™ Plus• Reasons it might matter:

– Space/time guarantees– Performance Portable– C++ and C– Keywords bring compiler into the “know”– “Parent stealing”– Vectorization help too (array notations, elem. func, simd)

• Problems:– Requires compiler changes– Not feature rich– Only in Intel and gcc compilers (and some in Clang/LLVM)– Standards adoption still “future”

cilkplus.org

Page 30: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

OpenMP*• Reasons it might matter:

– Everywhere (all major compilers)– Solutions for Tasking, vectorization, offload– Performance Portable

• Problems:– C and Fortran, not so much C++– Not composable– Not always in-sync with language standards

*Thirdpartymarksmaybeclaimedasthepropertyofothers.

openmp.org

Page 31: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

OpenCL*• Reasons it might matter:

– Explicit heterogeneous controls– Everywhere (ports)– Non-proprietary– Underpinning for tools and libraries

• Problems:– Low level– Not performance portable

*Thirdpartymarksmaybeclaimedasthepropertyofothers.

khronos.org/opencl

Page 32: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Choice is Good• Our favorite programming languages were

NOT DESIGNED for parallelism.

• They need HELP.

• Multiple approaches and options are NEEDED.

Page 33: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Vectorization: Who, What, Why, When

Page 34: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

A moment of clarity

task parallelismdoesn’t matter

without data parallelism

(James’ observation about whatAmdahl and Gustafson were ultimately pointing out)

Page 35: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Parallel first

Vectorize second

Page 36: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

#BCIJIKLMNOJPQ JV =RLM<RUMLYBC IKNP TMSIRLJ[NIJRP\ ZD VJ=<CM =NIK@\a UND YLR= TMSIRLJ[NIJRP[bb-UND&YLR=IKLMNO&<NLNCCMCJV=

Page 37: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Aj c ;nn o HjFn

r

Page 38: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

vector data operations:data operations done in parallel

void v_add (float *c,float *a,float *b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

Page 39: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

D4FCB=-67C7-BH4=7C:B5@P67C7-BH4=7C:B5@-6B54-:5-H7=7>>4>

void v_add (float *c,float *a,float *b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

M33>?80 MY\V5*f&g5bh5=*60 MY\V5Wf&g5bh5=WC0 \VV5=*:5=W bh5=AH0 RPY=a5=A bh5Af&gi0 \VV5& K585bh5&

Page 40: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

D4FCB=-67C7-BH4=7C:B5@P67C7-BH4=7C:B5@-6B54-:5-H7=7>>4>

void v_add (float *c,float *a,float *b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

M33>?80 MY\V5*f&g5bh5=*60 MY\V5Wf&g5bh5=WC0 \VV5=*:5=W bh5=AH0 RPY=a5=A bh5Af&gi0 \VV5& K585bh5&

float *a,float *b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

M33>?80 MY\V,H5*f&?&KCg5bh5=,*60 MY\V,H5Wf&?&KCg5bh5=,WC0 \VV,H5=,*:5=,W bh5=,AH0 RPY=a,H5=,A bh5Af&?&KCgi0 \VV5& K5H5bh5&

Page 41: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

D4FCB=-67C7-BH4=7C:B5@P67C7-BH4=7C:B5@-6B54-:5-H7=7>>4>

void v_add (float *c,float *a,float *b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

M33>?80 MY\V5*f&g5bh5=*60 MY\V5Wf&g5bh5=WC0 \VV5=*:5=W bh5=AH0 RPY=a5=A bh5Af&gi0 \VV5& K585bh5&

float *a,float *b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

M33>?80 MY\V,H5*f&?&KCg5bh5=,*60 MY\V,H5Wf&?&KCg5bh5=,WC0 \VV,H5=,*:5=,W bh5=,AH0 RPY=a,H5=,A bh5Af&?&KCgi0 \VV5& K5H5bh5&

Q)5A*$$5%'&<5j,)A%3(&F*%&31k

Page 42: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

vector data operations:data operations done in parallel

void v_add (float *c, float *a, float *b){

for (int i=0; i<= MAX; i++)c[i]=a[i]+b[i];

}

PROBLEM:This LOOP is NOT LEGAL to (automatically) VECTORIZE in C / C++ (without more information).• Arrays not really in the language• Pointers are, evil pointers!

Page 43: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

vectorization solutions1. auto-vectorization (let the compiler do it when legal)

– sequential languages and practices gets in the way2. give the compiler hints

– C99 “restrict” keyword (implied in FORTRAN since 1956)– Traditional pragmas like “#pragma IVDEP”– Cilk Plus #pragma simd– OpenMP 4.0 #pragma omp simd

3. code differently– simd instruction intrinsics– Cilk Plus array notations– Cilk Plus __declspec (vector)– OpenMP 4.0 #pragma omp declare simd– OpenCL / CUDA kernel functions

Page 44: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

D4FCB=:`7C:B5-@B>GC:B5@A^ NBIR>TMSIRLJ[NIJRP&6CMI&IKM&SR=<JCML&OR&JI&;1%!'$%<)$7

\ VMfBMPIJNC&CNPQBNQMV&NPO&<LNSIJSMV&QMIV&JP&IKM&UND;^ QJTM&IKM&SR=<JCML&KJPIV

\ 4qq&aLMVILJSIb&XMDURLO&6J=<CJMO&JP&0.'1'")&VJPSM&Aq?j7\ 1LNOJIJRPNC&<LNQ=NV&CJXM&ae<LNQ=N&(m*$9b\ 4JCX&9CBV&e<LNQ=N&VJ=O\ .<MP#9&H^F&e<LNQ=N&R=<&VJ=O

E^ SROM&OJYYMLMPICD\ VJ=O JPVILBSIJRP&JPILJPVJSV\ 4JCX&9CBV&NLLND&PRINIJRPV\ 4JCX&9CBV&rrOMSCV<MS 6TMSIRL7\ .<MP#9&H^F&e<LNQ=N&R=<&OMSCNLM&VJ=O\ .<MP4-&l&45*"&XMLPMC&YBPSIJRPV

L15*$$5A*<)<:5<%#+I&12j,)A%3(&F*%&31()>3(%<k5A*15W)A3")*5/*I53-5$&-)0

Page 45: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

C99 “restrict” keywordvoid v_add (float *restrict c,

float *restrict a,float *restrict b)

{for (int i=0; i<= MAX; i++)

c[i]=a[i]+b[i];}

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

Page 46: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

IVDEPvoid v_add (float *c,

float *a,float *b)

{#pragma IVDEP

for (int i=0; i<= MAX; i++)c[i]=a[i]+b[i];

}

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

Page 47: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

simdvoid v_add (float *c,

float *a,float *b)

{#pragma simd

for (int i=0; i<= MAX; i++)c[i]=a[i]+b[i];

}

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

Page 48: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

omp simdvoid v_add (float *c,

float *a,float *b)

{#pragma omp simd

for (int i=0; i<= MAX; i++)c[i]=a[i]+b[i];

}

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

Page 49: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

omp simdvoid v_add (float *c,

float *a,float *b)

{#pragma omp simd

for (int i=0; i<= MAX; i++)c[i]=a[i]+b[i];

}

2.givethecompilerhintsC99“restrict”keywordTraditionalpragmaslike“#pragmaIVDEP”CilkPlus#pragmasimdOpenMP4.0#pragmaompsimd

3.codedifferentlysimd instructionintrinsicsCilkPlusarraynotationsCilkPlus__declspec (vector)OpenMP4.0#pragmaompdeclaresimdOpenCL/CUDAkernelfunctions

Manychoices…Irecommend:

Page 50: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Data Layout: AoS vs. SoA

Structureofarrays(SoA)canbeeasilyalignedtocacheboundariesandisvectorizable.

Arrayofstructures(AoS)tendstocause cachealignmentproblems,andishardtovectorize.

Page 51: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Data Layout: Alignment

51

Array of Structures (AoS), padding at end.

Array of Structures (AoS), padding after each structure.

Structure of Arrays (SoA), padding at end.

Structure of Arrays (SoA), padding after each component.

Page 52: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Think Parallel– Parallelism does NOT work well if

“added” at the last minute in a program

– think about every thing“how to do in parallel cleanly”

Page 53: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

'%%>?@@///0"#$%&%'()*+&12*1+,-.03(2@ 7 +08,$0*+2.&,4$#+).9*%)(&*$./(09.%0+*:678H?5D#$%&%'()*+&125-3(5m&<#*$5a--)A%<a.>*1<&3153-5"*%)(&*$5A3,)()+5&15%'&<5%#%3(&*$5>$#<5<3")5).%(*<03,-$&2'

6779?5L1%)$5P'()*+&125X#&$+&125X$3A]<=)"*&1<5*15).A)$$)1%5&1%(3+#A%&315%35PXXo51)/)(5-)*%#()<5*()513%5A3,)()+3,-$&2';.<'&,)2);.=*>*,)2);.?0()*,

'%%>?@@%'()*+&12W#&$+&12W$3A]<0A3"%'()*+&12W#&$+&12W$3A]<0A3"%'()*+&12W#&$+&12W$3A]<0A3"

6786?5R%(#A%#()+5N*(*$$)$5N(32(*""&12a.A)$$)1%5%).%5-3(5)<<)1%&*$<53-5>*(*$$)$5>(32(*""&125-3(5O@OKK3,-$&2';.=*>*,)2)

'%%>?@@>*(*$$)$W33]0A3"8)@2&%).'*2./()).%)*4'&,-.9*%)(&*$2 '%%>?@@$3%<3-A3()<0A3"

E&2'5N)(-3("*1A)5N*(*$$)$&<"678i5JN)*($<5m3$#")5P/3678H5JN)*($<5m3$#")5Y1)3,-$&2'

678p?5L1%)$q5r)315N'&s5N(3A)<<3(E&2' N)(-3("*1A)5N(32(*""&12t1&2'%5M*1+&125a+&%&31555555555553,-$&2'

Page 54: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

!"#$% &%'()*+,-.

///0"#$% &%'()*+&12*1+,-.03(2

!"#$%&%'()*+&,-./0(.1&2#*$.3//)4%25

$975;-JBGd

Page 55: Reinders SIGGRAPH 2017 as presented - … · – Not Fortran, C, C++, Java, C#, Perl, Python, Ruby ... Interesting “new” aspect since TBB 4.0 Release (2011) ... Reinders SIGGRAPH

Agenda2:00pm - 2:40pm: Multithreading Introduction and Overview

James Reinders

2:40pm - 3:25pm: Parallel Processing and Multi-Process Evaluation in a Host ApplicationAndy Lin and Joe Longson

3:25pm: Break

3:35pm - 4:05pm: Threading SOPs in HoudiniJeff Lait

4:05pm - 4:35pm: Building a Scalable Evaluation Engine for PrestoFlorian Zitzelsberger

4:35pm - 5:15pm: Parallel Evaluation of Animated Maya CharactersMartin De Lasa