1/17 Design Patterns and Computer Architecture Mark Murphy, Scott Beamer, Henry Cook, Andrew Waterman, Krste Asanovic, Kurt Keutzer
Dec 19, 2015
1/17
Design Patternsand
Computer Architecture
Mark Murphy,Scott Beamer, Henry Cook, Andrew
Waterman,Krste Asanovic, Kurt Keutzer
Design Patterns and Architecture
Design patterns (so far) are good at exposing ||ism Only half of the battle / There is parallelism everywhere
we look!
We need to incorporate Architectural information But not too much: we don't want to drown in detail!
Computer Architects need patterns too! Dwarfs were supposed to supplant benchmarks,
remember? Dwarfs -> Computational Patterns: too vague for
architects
Do design pattern writers need architectural patterns? Standardize a vocabulary to discuss performance
issues?
2/17
Work In Progress The point of this talk is not to present any results I want your input on result of brainstorming
sessions between myself and the Architecture research group
There are 40 minutes for this -- ~20 of me presenting slides and the rest for discussion
3/17
Structural PatternsChoose your high level structure
Agent and repository Layered systems
Arbitrary static task graph Map reduce
Iterative refinement Model view controller
Process control Pipe-and-filter
Event based, implicit invocation
Puppeteer
Computational PatternsIdentify the key computations
Dense linear algebra
Backtrack branch and
bound
Monte carlo methods
Sparse linear algebra
Finite state machine
Dynamic programming
Unstructured grids
Graphical models
Graph algorithms
Structured grids N-body methods
Circuits
Spectral methods
Parallel Algorithm Strategy PatternsRefine the structure - what concurrent approach do I use? Guided re-organization
Task Parallelism
Geometric Decomposition
Data Parallelism Pipeline Discrete Event
Recursive Splitting
Implementation Strategy PatternsUtilize Supporting Structures – how do I implement my concurrency? Guided mapping
Program Structure
Actors SPMD Master/Worker Shared queue Distributed array Data StructureTask queue Strict data
parallelLoop
parallelismShared data Graph partitioning
Fork/Join BSP Shared hash table Memory parallelismConcurrent Execution Patterns
Implementation methods – what are the building blocks of parallel programming? Guided implementation
Advancing Program Counters Coordination
MIMD Thread pool Message passing Mutual exclusion Digital circuits
Task graph Speculation Collective communication Transaction al memory
SIMD Data flow Collective synchronization
P2P synchronization
Applications
Pro
duct
ivit
y L
ayer
Effi
ciency
Layer
Pattern Language Exposes ||ism
Pattern Language Exposes ||ism
Example from Machine Learning: Compute the gradient of a scalar function w.r.t a matrix
B Each entry of gradient requires NxN Blas2 matrix
computations
5/17
Pattern Language Exposes ||ism
Example from Quantum Chemistry: Need to compute a matrix <# basis functions> x <#
electrons> Each entry of matrix requires evaluating a number of
functions, and summing the results
6/17
Pattern Language Exposes ||ism
In both examples, we have (at least) two levels of ||ism Many entries in matrix (Task Parallel) Much work in computing each entry (Map/Reduce Data
Parallel) The pattern language can pretty much tell us this
However, the right parallel program for a GPU-like manycore processor looks different in the two cases for the Machine Learning problem, only parallelize the
computation of each matrix element for the Chemistry problem, parallelize at both levels
Knowing this requires understanding that GPU-like processors implement fine-grained data parallelism best
7/17
SW writers understand HW arch?
There has been a sentiment that the pattern language should be architecture-agnostic
Architectural savvy required for decisions like these.
Otherwise, the options are all unattractive: Implement every possible parallelization, choose best? ... Choose one parallelization, hope it works? ... Ask Bryan to parallelize your code?
But clearly we can't write a pattern language around GTX200, just as we can't write it around LRB or Nehalem
8/17
Performance Models? Abstract, simplistic models to capture the
essence of low-level performance issues. Extant example: logP for distributed memory
machines l -- Network Latency for message o -- CPU overhead of sending a message g -- gap = inverse of NIC bandwidth P -- number of processors
9/17
l-latency network
Performance Models? Could imagine a similar model for current
manycores. How about this one? The BLIMP model:
B(L) -- Bandwidth as function of load/store block size I -- # Instruction Fetch units M -- # Load/Store units P -- # Execution Pipelines
10/17
I = 4
P = 8
Performance Models? Problems are obvious
Sure -- you can analyze the FFT algorithm and Matrix Mulitply
But what about my code? Can't handle data dependence in computational
intensity Example: SIFT Feature Extraction
Compute a "scale space" For each maximum in scale space:
Do a whole bunch of work How many maxima are there?
"Interesting" architectural features cannot be described
Still .... better than nothing? 11/17
Design Patterns and Architecture
Design patterns (so far) are good at exposing ||ism Only half of the battle / There is parallelism everywhere
we look!
We need to incorporate Architectural information But not too much: we don't want to drown in detail!
Computer Architects need patterns too! Dwarfs were supposed to supplant benchmarks,
remember? Dwarfs -> Computational Patterns: too vague for
architects
Do design pattern writers need architectural patterns? Standardize a vocabulary to discuss performance
issues?
12/17
Architects need patterns too! "Benchmark Addiction" was part of motivation for
Dwarfs Reliance upon C-source code benchmarks pigeon-holed
architectural innovation Dwarfs were supposed to be anti-benchmarks: provide a
non-source code description of the computations that were important
We (i.e. Tim) quickly discovered that Dwarfs were far too vague and high-level to serve this purpose A Computational Patern (~Dwarf) doesn't even imply a
particular problem to be solved, much less a particular algorithm
Can the fleshed-out pattern language be the solution?
13/17
Anti-Benchmarks? Architecture-agnostic patterns-based analysis of a
program enumerates space of implementations
14/17
Task Parallel
Map/Reduce
But architects still need their benchmark fix What does this actually tell them? They need to know:
Is my cache big enough? Should I include my whiz-bang u-arch
widget?
Anti-Benchmarks Suppose that the pattern language included
somehow the architectural savvy needed to make every possible implementation decision
What happens when the architect changes the rules?
15/17
Multiple Levels of Description Level 0: A patterns-based description Level 1: An "Abstract Machine" model? Level 2: A performance model? Level 3: A cycle-accurate simulation? Level 4: A joule-accurate simulation?
16/17
Abstract Machines Alternate proposal for performance model (K.
Asanovic) Given a microarchitectural widget, how does its
presence/absence affect the performance of a program? Map the program to two different machines (one
with, one without the widget). How are the programs different? Mapping process TBD. SEJITS?
Examples: An "Infinite ILP" machine. The superscalar analogue of
PRAM An Infinite Vector-width machine. An infinite thread machine
17/17
Design Patterns and Architecture
Design patterns (so far) are good at exposing ||ism Only half of the battle / There is parallelism everywhere
we look!
We need to incorporate Architectural information But not too much: we don't want to drown in detail!
Computer Architects need patterns too! Dwarfs were supposed to supplant benchmarks,
remember? Dwarfs -> Computational Patterns: too vague for
architects
Do design pattern writers need architectural patterns? Standardize a vocabulary to discuss performance
issues?
18/17
Architectural Meta-Patterns Hopefully by now I've conveyed my concern
about the lack of architectural / performance information in design patterns
Also, hopefully it is clear that I don't know the answer
Maybe someone can write me a pattern? How should I tell you what I know about
architecture?
19/17