Page 1
Advanced Parallel Primitives in SPM.Pythonfor Inheriting Fault-Tolerance, and Scalable
Processing of Data and Graphs
Minesh B. Aminmamin @ mbasciences.com
http://www.mbasciences.com
HPC Advisory Council / Stanford Workshop 2011
Stanford University, CA
Dec 7, 2011
© 2011 MBA Sciences, Inc. All rights reserved.
Page 2
Problem Statement
... exploiting parallelism using libraries
Page 3
Problem Statement
... exploiting parallelism using frameworks
libraries
Page 4
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Page 5
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
Page 6
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
Page 7
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
Page 8
Terminology: ”Exploiting Parallelism”
Parallelism: The management of a collection of serial tasks
Management: The policies by which:● tasks are scheduled,
● premature terminations are handled,
● preemptive support is provided,
● communication primitives are enabled/disabled, and
● the manner in which resources are obtained andreleased
Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate
prior to conclusion, or
● Fine grain ... where tasks may communicate priorto conclusion.
Management policies codify how serial tasks areto be managed ... independent of what they may be
Page 9
Terminology: ”Exploiting Parallelism”
Parallelism: The management of a collection of serial tasks
Management: The policies by which:● tasks are scheduled,
● premature terminations are handled,
● preemptive support is provided,
● communication primitives are enabled/disabled, and
● the manner in which resources are obtained andreleased
Serial Tasks: Are classified in terms of either:● Coarse grain ... where tasks may not communicate
prior to conclusion, or
● Fine grain ... where tasks may communicate priorto conclusion.
Management policies codify how serial tasks areto be managed ... independent of what they may be
Page 10
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?
Page 11
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?
Page 12
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?What makes
Page 13
Terminology: ”The Big Picture”
Question: Is exploiting parallelism {easyhard
} ?What makes
Supposition: The gap between developer’s intent and API of PET(parallel enabling technology) ...
Page 14
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 15
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 16
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 17
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 18
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 19
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures
>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 20
Terminology: ”Parallel Enabling Technologies”
Means to the end
� Bottom-up
OpenMPI OpenMPCUDA OpenGL
● Maximum flexibility
● Maximum headaches
● Must implement fault tolerance
� Top-downHadoop GoldenorbGraphLab
● Limited flexibility
● Fewer headaches
● Fault tolerance is inherited
� Self-contained environment
SPM.Python● Maximum flexibility
● Fewest headaches
● Fault tolerance is inherited
N environments/installations for N frameworks
One environment/installation, N suites of pclosures>>> createVirtualCloud -async>>> cmdA >>> cmdA -parallel>>> cmdB >>> cmdB -parallel>>> cmdC >>> cmdC -parallel>>> cmdD >>> cmdD -parallel
Page 21
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
Page 22
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
Page 23
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
Page 24
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Architectural● Scalable vocabulary
Developer
● Correct-by-construction
fault tolerance
self-cleaning
● Construct-by-correction
rapid prototyping
IT● No certification (!)
Page 25
SPM.Python: Typical Flow
Visualization
Life Sciences
Finance
ITSoftware
Development
EDA
Analytics
Gap between intent
and API of
parallel primitives
Fundamental Prerequisite
Ability to express parallelism in terms of parallelprimitives (pclosures)
Page 26
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
Page 27
Partition/OpenMPI: Prologue
GNU/Linux [] mpirun ... ./hello world -prefix ”api”
Typical OpenMPI application ... lacks support for:
● fault tolerance
● timeout
● detection of deadlocks
Page 28
Partition/OpenMPI: Prologue
GNU/Linux [] mpirun ... ./hello world -prefix ”api”
Typical OpenMPI application ... lacks support for:
● fault tolerance
● timeout
● detection of deadlocks
Page 29
Partition/OpenMPI: Prologue
GNU/Linux [] mpirun ... ./hello world -prefix ”api”
Typical OpenMPI application ... lacks support for:
● fault tolerance
● timeout
● detection of deadlocks
⇒ Prototyping is (deeply)∞
frustrating
Page 30
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
Page 31
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
Page 32
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
Page 33
Partition/OpenMPI: Problem Statement
Prototyping should be frictionless
Must use original OpenMPI application� original source code� original binary
Original OpenMPI application must inherit support for:� fault tolerance� timeout� detecting deadlocks
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
Page 34
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 35
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 36
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 37
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 38
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 39
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 40
Partition/OpenMPI: Problem Statement (Cont’d)
GNU/Linux []spm.python ...
mpirun ... ./hello world -prefix ”api”
AB
Exploiting two very different forms of parallelism:� Using same resources� At the same time
Drop-inreplacement for
mpirun
Multiple sessions ofmpirun
within a single session ofof spm.python
Can use same resources for:
● Checkpoint based parallelism
● What-if analysis
● Stress testing
Page 41
Partition/OpenMPI: Anatomy - Timeline
GNU/Linux []spm.python ...
mpirun ./hello world -prefix ”api”
Page 42
Partition/OpenMPI: Anatomy - Timeline (Cont’d)
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
Page 43
Partition/OpenMPI: Anatomy - Timeline (Cont’d)
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
Page 44
Partition/OpenMPI: Anatomy - Timeline (Cont’d)
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
Establish a nervous system over the OpenMPI application
Populate nervous systemwith streams oftime-series data
Page 45
Partition/OpenMPI: Anatomy - Breakdown
Hub mpirun Spoke orted wrapper Application
exit();
exit();exit();
exit();
1
2 34
5
67
Launch:● mpirun
Monitor:● mpirun● Spokes
Launch:● orted
Monitor:● orted● wrapper
Launch:● Application
Monitor/Timeout:● Application
NormalExecution
Built-in Package Management System
● Selectively change default OpenMPI env
Redirection of library calls
● Augment libmpi.so, libc.so ...
with libSPM.so
Second Parallel Capability
● ∼ 60-line Python script
● Authored by developer
Page 46
Partition/OpenMPI: Second Parallel Capability
@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __init():return spm.pclosure.macro.papply.template.openMPI.\
policyA.defun(signature = ’signature::Hub’,stage1Cb = __taskStat,);
__pc = __init();
Declaration + Definition of Pclosure
Page 47
Partition/OpenMPI: Second Parallel Capability
@spm.util.dassert(predicateCb = spm.sys.sstat.amOffline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def main(pool,
taskApiArgs,taskTimeout):
# Initialize ’stage0’.__pc.stage0.init.main(typedef = ...);hdl = __pc.stage0.payload.tie();# Populate the template taskhdl.spm.meta.label = ’***’; # Not interested.hdl.spm.meta.apiArgs = taskApiArgs;hdl.spm.meta.timeout = taskTimeout;# Invoke the pmanager__pc.stage0.event.manage(pool = pool,
nSpokesMin = ...nSpokesMax = ...timeoutWaitForSpokes = ...timeoutExecution = ...);
return;
Population + Invocation of Pclosure
Page 48
Partition/OpenMPI: Second Parallel Capability
r"""task<template> ::struct {# SPM component ...spm ::struct {
meta ::struct {label ::scalar<stringSnippet> = deferred;apiArgs ::dict<string,mixed> = deferred;timeout ::scalar<timeout> = deferred;
};
core ::struct {relaunchPre ::scalar<bool> = None;relaunchPost ::scalar<bool> = None;nameHost ::scalar<auto> = None;whoAmI ::scalar<auto> = None;
};
stat ::struct {exception ::scalar<auto> = None;returnValue ::scalar<record> = None;
};};# non-SPM component ...
};"""
Typedef for Template Task
Page 49
Partition/OpenMPI: Second Parallel Capability
@spm.util.dassert(predicateCb = spm.sys.sstat.amOnline)@spm.util.dassert(predicateCb = spm.sys.pstat.amHub)def __taskStat(pc):try:hdl = pc.stage1.payload.tie();returnValue = hdl.spm.stat.returnValue;if (returnValue.Has(attr = ’stdOut’)):
print("\tstdOut : %s", returnValue.stdOut);if (returnValue.Has(attr = ’stdErr’)):
print("\tstdErr : %s", returnValue.stdErr);if (returnValue.Has(attr = ’stdOutErr’)):
print("\tstdOutErr: %s", returnValue.stdOutErr);except (SPMTaskDropped,
SPMTaskLoad,SPMTaskEval,), (hdl,):
pass;
return (pc.stage1.event.done(),None,)[-1];
Callback for Status Reports
Page 50
Partition/OpenMPI: SPM.Python Session
l GNU/Linux [] spm.3.111116.trial.A.python(Trial Edition)
Spm.Python 3.111116 / Python 2.4.6
[GCC 4.4.3 (64 bit) on linux2]
NOTE
>>>> Trial period ends at <<<<
>>>> 24:00 hrs (Pacific Standard Time) <<<<
>>>> December 29, 2011 <<<<
Type "help", "copyright", "credits", "license" or "spm.Api()" for more information.
Type "spm.DemoExtract(dirname = ...)" to extract demo scripts.
Please visit www.mbasciences.com for the latest and growing
collection of scripts and technical briefs classified in terms of
parallel management patterns.
l >>> import pooll >>> import demol >>> import os;l >>> taskApiArgs = \l dict(app = os.getcwd() + ’/hello_world’,l appOptions = "-prefix=’app’",l );l >>> taskTimeout = spm.util.timeout.after(seconds = 10);3 >>> demo.main(pool = pool.intraAll(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall app => 0l app => 1l #: MetaStatus (hub): Tasks - EvalDone3 >>> demo.main(pool = pool.intraOnePerServer(),l taskApiArgs = taskApiArgs,l taskTimeout = taskTimeout)l #: MetaStatus (hub): Waiting - ForSpokes ...l #: MetaStatus (hub): Tasks - Evall #: MetaStatus (hub): Tasks - EvalDonel >>> exit()l GNU/Linux []
Page 51
Problem Statement
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
Page 52
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
Page 53
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
Page 54
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
Page 55
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
Page 56
Partition/HybridFlow: Basic Template
while (not done):
try:
for work in pc.generate(...):
eval(work); # Local Python/C/C++/GPU computation
pc.counter.async += 1; # Update parallel data structure(s)
if (some condition):
raise pc.exception(...); # Parallel exception
if (some condition):
pc.emit(...); # Emit work/report
done = True;
except (pc.exception,) (val,):
if (some condition):
continue; # Repeat with new consensus (’val’)
done = True;
Page 57
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
Page 58
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
Page 59
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP
Page 60
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative
Page 61
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative BSP
Page 62
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative BSP DAG
Page 63
Partition/HybridFlow: Suite of Parallel Primitives
while (not done):try:
for work in pc.generate(...):eval(work);pc.counter.async += 1;if (some condition):
raise pc.exception(...);if (some condition):
pc.emit(...);
done = True;except (pc.exception,) (val,):
if (some condition):continue;
done = True;
pc.generator(...);
pc.emit(...);pc.exception(...);
pc.counter.async;
BAP BSPSpeculative BSP DAG
● ● ●
Page 64
Conclusion
... exploiting parallelism using parallel primitives
frameworks
libraries
Clone
CloneRepeat
PartitionAggregate
Decentralized
PartitionAggregate
Centralized
PartitionList
PartitionDAG
{● Single, self-contained parallel environment● Patented Technology ...
Enable any OpenMPI application to inherit support for:
● Fault tolerance● Timeout● Detection of deadlocks
Partition/OpenMPI
Suites of parallel primitives to process data and graphsin parallel
Partition/HybridFlow
Page 65
Conclusion (Cont’d)
http://www.mbasciences.com
⎧⎪⎪⎪⎨⎪⎪⎪⎩
SPM.Python distribution
Technical Briefs
Parallel Management Patterns
⎫⎪⎪⎪⎬⎪⎪⎪⎭
CloneOnceRepeat
PartitionDAGList
PartitionAggregateCentralizedDecentralized
Elementary
Parallel Primitives
PartitionGrid/OpenMPI
In
Limited Beta
HPC
Parallel Primitives
PartitionData FlowGraph
Limited Beta
Jan 24, 2012
Data / Graph
Parallel Primitives