Simulation of Streaming Applications on Multicore Systems Saurabh Gayen, Mark Franklin (PI), Eric J. Tyson, Roger D. Chamberlain Storage-Based Supercomputing Group Dept. of Computer Science and Engineering Washington University in St. Louis Supported by Nat’l Science Foundation grant CCF- 0427794
19
Embed
Simulation of Streaming Applications on Multicore Systems
Simulation of Streaming Applications on Multicore Systems. Saurabh Gayen, Mark Franklin (PI), Eric J. Tyson, Roger D. Chamberlain Storage-Based Supercomputing Group Dept. of Computer Science and Engineering Washington University in St. Louis - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Simulation of Streaming Applications on Multicore Systems
Saurabh Gayen, Mark Franklin (PI),Eric J. Tyson, Roger D. Chamberlain
Storage-Based Supercomputing GroupDept. of Computer Science and EngineeringWashington University in St. Louis
Supported by Nat’l Science Foundation grant CCF-0427794
Saurabh Gayen 6/3/2008 2
Problem domain
FPGA Network Proc
FPGA
Network Proc
FPGA
High-performance streaming applications» Large streams of high-throughput data
– Networking and communications– Scientific computing (offline AND online)– Media creation and playback– Data mining (e.g., bioinformatics, security)
Hard to develop applications on multicore systems» Complex programming model (e.g., synchronization)
Other platforms can provide speedups (FPGA, DSP, NP)
Devices are becoming more interconnected» Hard to simulate» Hard to debug» Hard to deploy
Saurabh Gayen 6/3/2008 3
Overview
1. Auto-Pipe and the X Language
2. X-Sim: Federated System Simulator
3. Example applications
4. Status and future work
Saurabh Gayen 6/3/2008 4
CPU CPU
What is Auto-Pipe?
Auto-Pipe is… a set of tools used to create, test, build and deploy,
and optimize distributed applications
CPUFPGA CPUPCI
Partitioned, parallel algorithms
PCI
NP
Complex heterogeneous systems
Auto-Pipe is made for…
Time and/or resource-constrained applications
Saurabh Gayen 6/3/2008 5
The X Language
X language files are composed of:
• An algorithm description• Made of blocks and edges
• A processing architecture• Made of computation and interconnect resources
• A mapping of algorithm to architecture
CPU
CPU
CPU
FPGAA
BC D E
Saurabh Gayen 6/3/2008 6
Overview
1. Auto-Pipe and the X Language
2. X-Sim: Federated System Simulator
3. Example applications
4. Status and future work
Saurabh Gayen 6/3/2008 7
X-Sim: Federated Simulation
Platform-Specific Simulators
gen2
half
proc[1]
proc[2]
FPGA
gen1
outsumPCI
Sh.
Mem
.
Communication Link Models
Saurabh Gayen 6/3/2008 8
X-Sim Mechanism
gen2
gen1
sum
1us
0us
D TD T
avail
TT
in
D T
out
D T
avail
D T
out
D T
out
D
T
Data file
Timestamp file
T
testpoint
T
testpoint
proc[1]
proc[2]
FPGA
PCIhalf store
T
in
T
testpoint
Sh.
Mem
.
Saurabh Gayen 6/3/2008 9
Overview
1. Auto-Pipe and the X Language
2. X-Sim: Federated System Simulator
3. Example applications
4. Status and future work
Saurabh Gayen 6/3/2008 10
Example Application : test1
shar
ed_
me
m
proc [4]
processor
processor
proc [3]
processor
processor
gen1
pro c[1]
gen2
s um half s tore
pro c[2]
sh
are
d_m
em
proc [4]
processor
processor
proc [3]
processor
processor
gen1
pro c[1]
gen2
s um half s tore
pro c[2]
48.9 48.9
18.2 12.9
138.2
0
20
40
60
80
100
120
140
160
gen1 gen2 sum half store
block
cum
ula
tive
tim
e (s
) .
267.5
138.6
267.5
143.3
0
50
100
150
200
250
300
350
1-core 2-coremappings
app tim
es (s)
.
sim
deployed
1.93x 1.87x
Saurabh Gayen 6/3/2008 11
Example Application : VERITAS
Astrophysics Gamma-ray event parameterization
» Active sources: galactic nuclei, pulsars» Transient sources: hypernovae, ...
Lots of data: 20TB/year» Want to process as fast as possible» Process whole DB for rare events
Saurabh Gayen 6/3/2008 12
VERITAS algorithm
Front
P ipe[1] P ipe[6]
Back
R aw D ata for 1 P ixel
C harge for 1 P ixel
...
...
F ron t(2 .5% )
P ipes(94 .7% )
B ack(0 .9 % )
o the r(1 .9% )
P rocess ing T im e
FFT
IFFT
LowPass LowPass
F F T(47 .4% )
Low P ass(13 .7% )
IF F T(38 .8 % )
Pipe[i]
Saurabh Gayen 6/3/2008 13
2-Processor Mappings
Front
Back
map2a : Vertical Partition
proc[1]
proc[2]
Front
Back
map2b : Horizontal Partition
FFT
LowPass
IFFT
proc[1]
proc[2]
Saurabh Gayen 6/3/2008 14
3-Processor Mappings
Front
Back
map3a : Vertical Partition
proc[1]
proc[3]proc[2]
Front
Back
map3b : Horizontal Partition
proc[1]
proc[3]
proc[2]
FFT
LowPass
IFFT
Saurabh Gayen 6/3/2008 15
69.3 73.4
46.461.5
70.2 75.2
48.563.2
127.2 127.2
0
20
40
60
80
100
120
140
map1 map2a map2b map3a map3b
Mappings
App tim
es (s)
.
sim deployed
2 and 3 Processor Results
1x
1.83x 1.73x
2.74x2.07x
VERITAS Configured with 6 Pipes
Saurabh Gayen 6/3/2008 16
134
19.611.4
134
38.835.4
69.1
47.7
72.2
48.3
0
20
40
60
80
100
120
140
160
1 2 3 4 8 16
Number of Processors
App. Tim
e (s
) .
sim dep
SMP Performance Scaling
VERITAS Configured with 16 Pipes
1x
1.94x
2.81x3.79x
6.84x11.75x
…
Saurabh Gayen 6/3/2008 17
Overview
1. Auto-Pipe and the X Language
2. X-Sim: Federated System Simulator
3. Example applications
4. Status and future work
Saurabh Gayen 6/3/2008 18
Status and Future Work
Currently»X-Sim is operational
What’s next»Develop library of validated
communication modelsFuture directions
»Develop X-Opt, an automated performance optimization tool
Saurabh Gayen 6/3/2008 19
Acknowledgements
•Storage based supercomputing group
Michela Becchi Justin Brown Jim Buckley
Jeremy Buhler Roger Chamberlain Patrick Crowley
Mark Franklin (PI) Narayan Ganesan Gregory Galloway
Saurabh Gayen Eric Tyson
• Gamma Ray application: Jim Buckley / VERITAS collab.