cale Programming Models Lecture Series 06/12/2014 What is OCR? TG Team (presenter: Romain Cledat) June 12, 2014 https://xstackwiki.modelado.org/ Traleika_Glacier/ This research was, in part, funded by the U.S. Government, DOE and DARPA. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.
22
Embed
Exascale Programming Models Lecture Series 06/12/2014 What is OCR? TG Team (presenter: Romain Cledat) June 12, 2014
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Exascale Programming Models Lecture Series 06/12/2014
What is OCR?
TG Team (presenter: Romain Cledat)June 12, 2014
https://xstackwiki.modelado.org/Traleika_Glacier/
This research was, in part, funded by the U.S. Government, DOE and DARPA. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either
expressed or implied, of the U.S. Government.
2Exascale Programming Models Lecture Series 06/12/2014
• OCR– Open Community Runtime– Developed collaboratively with partners (mainly Rice University and
Reservoir Labs)
• The term ‘OCR’ is used to refer to way too many concepts– A programming model– A user-level API– A runtime framework– One of a multitude of reference runtime implementations
OCR
Seshasayee, Bala
Might want to de-emphasize this. We want it to be just the programming model (which we'll cover along with *our* runtime framework). Any other usage is incorrect and should not be acknowledged.
3Exascale Programming Models Lecture Series 06/12/2014
• Design a software stack to meet Exascale goals– Target a strawman architecture– Provide a programming model, API, reference implementation and
tools
• Concerns– Extreme hardware parallelism– Data locality– Fine grained resource management– Resiliency– Power and energy and not just performance– Platform independence
TG X-Stack project goals
4Exascale Programming Models Lecture Series 06/12/2014
mainEdt
fibIterEdt
fibIterEdt
fibIterEdt
sumEdt
N
finishEdt
N-2N-1
Dataflow programming model
EDT
DatablockCreate
Event
Runtime maps the constructed
data-flow graph to architecture
PE PE PE PE
PE PE PE PEServ
ice
Core
1MB L2
PE PE PE PE
PE PE PE PEServ
ice
Core
1MB L2
………..
PE PE PE PE
PE PE PE PEServ
ice
Core
1MB L2
PE PE PE PE
PE PE PE PEServ
ice
Core
1MB L2
………..
Shared LLC
Interconnect……
…..
……
…..
Seshasayee, Bala
Emphasize the "separation of concerns" in comments, which will smoothly transition to the next slide...
5Exascale Programming Models Lecture Series 06/12/2014
OCR level of abstractionvoid ParallelAverage( float* output, const float* input, size_t n ) { Average avg; avg.input = input; avg.output = output; parallel_for( blocked_range<int>( 1, n ), avg );}
6Exascale Programming Models Lecture Series 06/12/2014
• Common– All objects globally and uniquely identifiable and relocate-able
• Computation– Event Driven Task (EDT)– Does not perform synchronization– Distinct from the notion of thread or core
• Data– Data-block (DB)– Relocate-able consecutive chunk of data
• Synchronization, links– Events– Runtime-visible
• Slots– Positional end-points for dependences
OCR concepts
7Exascale Programming Models Lecture Series 06/12/2014
• N pre slots (N known at creation time)
• Optional attached “completion event”
OCR concepts: building blocks
Evt
0 N
EDT
0 N
( )
Data
• No pre slots• Post slot always
“satisfied”
• N pre slots (N fixed by type of event NOT determined by user)
• Post slot initially “unsatisfied”
• Slot is:– Connected (attached to another slot) or unconnected– Satisfied (user-triggered or runtime-triggered) or unsatisfied
Pre slots
Post slots (multiple connections)
Seshasayee, Bala
This slide needs to be redone slightly to reduce confusion.The next 2 slides need to be redone heavily as they're difficult to grasp.
8Exascale Programming Models Lecture Series 06/12/2014
OCR concepts: add dependence
Data
Evt
0 N
OR
EDT
0 N
Evt
0 N
OR
Evt
0 N
EDT
0 N
Connected=>
1 of 4 possible combinations
Argument 1 Argument 2
9Exascale Programming Models Lecture Series 06/12/2014
OCR concepts: satisfy
EDT
0 N
Evt
0 N
OR
Data
OR
NULLEDT
0 N
Satisfied/triggered
Data
=>
1 of 4 possible combinations
Argument 1 Argument 2
10Exascale Programming Models Lecture Series 06/12/2014
• EDTs– 0..N in/out pre-slots
• Slots are initially “unconnected” and “unsatisfied”• At creation time, the number of incoming slots must be known
– An EDT executes after all pre slots are “satisfied”• Satisfaction of pre slots can happen in any order
– An EDT can access memory:• Data-blocks:
– passed in through one of its in/out slots (the EDT gets a C pointer)– created by the EDT
• Stack and ephemeral heap (local)• NO global memory
– An EDT, during its execution, can at any time:• Write to any accessible data-blocks• Manipulate the dependence graph for future (not yet runnable) EDTs by
adding dependences, satisfying events, etc.
OCR execution model for EDTs
11Exascale Programming Models Lecture Series 06/12/2014
• Dynamic dependence construction• Producer and consumer never know about each other• Focus on minimum needed for placement and scheduling
Example 1: Producer/Consumer
ConsumerEDT
ProducerEDT
Data
Concept OCR
Evt
ConsumerEDT
ProducerEDT
Data
(1) dbCreate(*) addDep
(3) satisfy
(2) edit Data
Who executes call
Data dependence
Control dependence
12Exascale Programming Models Lecture Series 06/12/2014
• Control dependence is no different than a data dependence
Example 2: Simple synchronization
(1) satisfy
Concept OCR
Step 1EDT
Step 2-aEDT
Step 2-bEDT
Evt
Step 1EDT
(*) addDep
NULL
Step 2-aEDT
Step 2-bEDT
13Exascale Programming Models Lecture Series 06/12/2014
• Events– 0..N pre slots
• Slots are initially “unconnected” and “unsatisfied”– Events have a “trigger” rule that determines when their post slot
transitions to “satisfied” and what gets connected to it• Simple event (pass-through)
– 1 pre slot– When: satisfy post slot on incoming slot satisfaction– What: whatever is on incoming slot (pass GUID)
• Latch event (multi-party synchronization)– 2 pre slots; “waiting-on” count and current count– When: satisfy outgoing slot when number of satisfies on both pre
slots matches (similar to reference count in TBB)– What: NULL (incoming data-blocks are ignored)
OCR execution model for events
14Exascale Programming Models Lecture Series 06/12/2014
Example 3: In place parallel update
Concept OCR
SetupEDT
Parallel_1EDT
Parallel_2EDT
WrapupEDT
Data
Data
SetupEDT Data
Parallel_1EDT
Parallel_2EDT
FinishEDT
WrapupEDT
(1) dbCreate
(1) edtCreate
(1) e
dtCr
eate
(3) edtCreate
(4) addDep
(2) addDep(2
) add
Dep
(3) edtCreate
15Exascale Programming Models Lecture Series 06/12/2014
Example 4: Single assignment update
Concept OCR
SetupEDT
Parallel_1EDT
Parallel_2EDT
WrapupEDT
Data
SetupEDT Data
Parallel_1EDT
Parallel_2EDT
WrapupEDT
(1) dbCreate
(1) edtCreate
(1) e
dtCr
eate
(2) addDep
Data2Data1
Evt2
Data2Data1
Evt1
(4) dbCreate (4) dbCreate
(5) satisfy (5) satisfy
(3) addDep
(1) e
vtCr
eate
Exascale Programming Models Lecture Series 06/12/2014
OCR ecosystem
FSim - TG Architecture
Low-level compilers
Platforms
OCR implementations
LLVM
OCR targeting TG
C, Array DSL CnC Hero
CodeHC
CnC Translator
HC CompilerR-Stream
HTA
PIL
Programming platforms
OCR API + Tuning AnnotationsOpen Community Runtime
x86
GCC
OCR targeting x86
Cluster
Evaluation platforms
17Exascale Programming Models Lecture Series 06/12/2014
• OCR API is at the “assembly” level; other tools are meant to sit between it and programmers
• Few simple concepts, multiple ways to use them– Interested in determining “best” use
• Dependence graph built on the fly:– Complicates the writing of the program– Scalable approach
Take-aways
18Exascale Programming Models Lecture Series 06/12/2014
• On some code, OCR matches or bests OMP• Simple scheduler, no data-blocks (very preliminary but promising)
Preliminary results
19Exascale Programming Models Lecture Series 06/12/2014
• Development of a specification:– Memory model
• Tuning hints and annotations
• More expressive support for collectives
Areas of investigation
20Exascale Programming Models Lecture Series 06/12/2014
Backup
21Exascale Programming Models Lecture Series 06/12/2014