E7 1 Chang Mark L. Chang Northwestern University Evanston, IL [email protected] du Adaptive Computing in NASA Multi-Spectral Image Processing Scott A. Hauck University of Washington Seattle, WA [email protected] .edu
Apr 01, 2015
E71Chang
Mark L. Chang
Northwestern University
Evanston, IL
Adaptive Computing in NASAMulti-Spectral Image Processing
Scott A. Hauck
University of Washington
Seattle, WA
E72Chang
Background
• (1991) Initiative by NASA to study Earth as an environmental system—Earth Science Enterprise (ESE)
• (1999) Launch of the first Earth Observation System (EOS) satellite, Terra
E73Chang
The Data Flow
• EOS divides telemetry processing into five levels with the following flow:
Receiver Level 0
L1L4L2
L3Instrument 1
Instrument 2
Instrument 3
Instrument n
E74Chang
The Processing Problem
• I/O intensive• Terra satellite generates ~918 Gbytes of data per day
• Current NASA-supported data holdings total ~125,000 Gbytes
• MODIS instrument accounts for over half the daily data and processing load
MODISInstrument
E75Chang
Why Adaptive Computing?
• Instrument dependent processing
• Data products involve many different algorithms
• Algorithms often change over the lifetime of the instrument
RAMRAMRAMRAMRAM
RAMRAMRAMRAMRAM
RAMRAM
E76Chang
MATCH Compiler
• Current mappings are done by hand• Hardware description languages (Verilog, VHDL)
• C program interface to adaptive compute engine
• Requires low-level understanding of the architecture
• MATCH == MATlab Compiler for Heterogeneous computing systems• MATLAB codes compiled to a configurable computing system
automatically
• Embedded processors, DSPs, and FPGAs
• Performance goals
• Within a factor of 2-4 of the best manual approach
• Optimize performance under resource constraints
E77Chang
MATCH Compiler Framework
• Parse MATLAB programs into intermediate representation
• Build data and control dependence graph
• Identify scopes for fine-grain, medium grain, and coarse grain parallelism
• Map operations to multiple FPGAs, multiple embedded processors and multiple DSP processors
• Automatic parallelization, scheduling, and mapping
E78Chang
MATCH Testbed
VME bus and chassis
Motorola MVME-2604embedded boards•IBM PowerPC 604•64 MB RAM•OS-9 OS•Ultra C compiler
Transtech TDMB 428DSP board•Four TDM 411 cards containing TI TMS 320C4 DSP, 8 MB RAM•TI C compiler
Annapolis Wildchild board•Nine XILINX 4010 FPGAs•2 MB RAM•Wildfire software
Development Environment:
SUN Solaris 2, HP HPUX and Windows
Ultra C/C++ for MVME
TI C for TMS320
XILINX XACT for XILINX
Force 5VMicroSPARC CPU64 MB RAM
E79Chang
Motivation for MATCH
• NASA scientists prefer MATLAB• High-level language, good for prototyping and development
• NASA applications are well-suited to the MATCH project• Lots of image and signal processing applications
• Same domain as users of embedded systems
• High degree of data parallelism
• Small degree of task parallelism
• NASA has an interest in adaptive technologies (ASDP)
• Will be a benchmark for the MATCH compiler
E710Chang
Multi-spectral Image Classification
• Want to classify a multi-spectral image in order to make it more useful for analysis by humans• Used to determine type of terrain being represented
• Similar to data compression
• Similar to clustering analysis
Pixel[000][000] = ForestPixel[123][123] = UrbanPixel[255][212] = TundraPixel[410][230] = Water
etc…
E711Chang
Multi-Spectral Classification
kP
i
kiT
ki
kddk
WXWX
PSXf
122/ 2
)()(exp
1
)2(
1)|(
E712Chang
MATLAB Iterative
for p=1:rows*cols % load pixel to process pixel = data( (p-1)*bands+1:p*bands );
class_total = zeros(classes,1); class_sum = zeros(classes,1);
% class loop for c=1:classes
class_total(c) = 0; class_sum(c) = 0;
% weight loop for w=1:bands:pattern_size(c)*bands-bands weight = class(c,w:w+bands-1); class_sum(c) = exp( -(k2(c)*sum( (pixel-weight').^2 ))) + class_sum(c); end
class_total(c) = class_sum(c) * k1(c); end results(p) = find( class_total == max( class_total ) )-1;end
kP
i
kiT
ki
kddk
WXWX
PSXf
122/ 2
)()(exp
1
)2(
1)|(
E713Chang
MATLAB Vectorized
% reshape dataweights = reshape(class',bands,pattern_size(1),classes);
for p=1:rows*cols % load pixel to process pixel = data( (p-1)*bands+1:p*bands);
% reshape pixel pixels = reshape(pixel(:,ones(1,patterns)), bands,pattern_size(1),classes);
% do calculation vec_res = k1(1).*sum(exp( -(k2(1).*sum((weights-pixels).^2)) )); vec_ans = find(vec_res==max(vec_res))-1; results(p) = vec_ans;end
kP
i
kiT
ki
kddk
WXWX
PSXf
122/ 2
)()(exp
1
)2(
1)|(
E714Chang
PE4PE3PE2PE1
PNNController
SubtractionUnit
PixelMemory
WeightMemory
SquareUnit
BandAccumulator
# of bands times
K2 MultUnit
K2[K]Memory
exp LUTUnit
exp / K1[K]K1[K] Mem
exp MultUnit
ClassAccumulator
# weights/class times
K1 MultUnit
ClassCompare
PE0
ResultMemory
Initial FPGA Mapping
kP
i
kiT
ki
kddk
WXWX
PSXf
122/ 2
)()(exp
1
)2(
1)|(
5% 67% 85% 82% 82%
E715Chang
Improving the Mapping
• Improve speed of PNN• Utilize all eight processing elements
• Time-multiplex low-rate functions
• Vary precision of multipliers/lookups
PE4PE3PE2PE1
PNNController
SubtractionUnit
PixelMemory
WeightMemory
SquareUnit
BandAccumulator
# of bands times
K2 MultUnit
K2[K]Memory
exp LUTUnit
exp / K1[K]K1[K] Mem
exp MultUnit
ClassAccumulator
# weights/class times
K1 MultUnit
ClassCompare
PE0
ResultMemory
1:1 1:1 1:4 1:4 1:20
kP
i
kiT
kik K
WXWXKSXf
1 21
)()(exp)|(
E716Chang
Optimized Mapping
PE0
Pixel Reader
PE1
SubtractSquare
PE2
SubtractSquare
PE3
SubtractSquare
PE4
SubtractSquare
PE5
K2 Multiplier
PE6
ExponentLookup
PE7
Class Accumulator
PE7
K1 MultiplierClass Comparison
5%
75%
85% 61% 54% 97%
kP
i
kiT
kik K
WXWXKSXf
1 21
)()(exp)|(
E717Chang
Results
Raw Image Data
Processed Image
Reference: HP C180 Workstation
Pixels Processed per Second
1.6
35.4
149
364
1942
5825
1
10
100
1000
10000
MatlabIterative
MatlabVectorized
Java C VHDL VHDL(2)
Method
Pix
els
E718Chang
Results (Cont’d)
Pixels Processed per Second
14.8
92
1942
5825
1
10
100
1000
10000
Java C VHDL VHDL(2)
Method
Pix
els
Reference:MATCH Testbed
Force 5VMicroSPARC CPU64 MB RAM
E719Chang
Results (Cont’d)Lines of Code
39 27
474371
2205
2480
0
500
1000
1500
2000
2500
3000
MatlabIterative
MatlabVectorized
Java C VHDL VHDL(2)
Method
Lin
es
E720Chang
Conclusions
• NASA is interested in adaptive computing
• NASA has many candidate applications• High processing loads and I/O requirements
• Applications are well-suited for acceleration using adaptive computing
• Scientists will want to write in MATLAB rather than C+VHDL
• Good benchmarks for the MATCH compiler
• Will help identify functions and procedures necessary for real-world applications