University of Michigan Electrical Engineering and Computer Science MacroSS: Macro- SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡ , Mark Woh*, Manjunath Kudlur † , Rodric Rabbah ‡ , Trevor Mudge*, Scott Mahlke* * Advanced Computer Arch. Lab., University of Michigan † Nvidia Corp. ‡ IBM T.J. Watson Research Center
25
Embed
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
University of MichiganElectrical Engineering and Computer Science
MacroSS: Macro-SIMDization of Streaming Applications
Amir Hormati*, Yoonseo Choi‡, Mark Woh*,
Manjunath Kudlur†, Rodric Rabbah‡, Trevor Mudge*,
Scott Mahlke*
* Advanced Computer Arch. Lab.,
University of Michigan† Nvidia Corp. ‡ IBM T.J. Watson Research
Center
University of MichiganElectrical Engineering and Computer Science
Importance of SIMD
• Energy and area efficient way to exploit data-level parallelism
• Performance in multimedia and communication apps
• Ubiquitous in modern processors– Intel: SSE, Larrabee– IBM: Altivec, Cell SPE – ARM: Neon
Control Unit
Functional Units
Cache
Control Unit
Functional Units
Cache
Control Unit
Functional Units
Cache
University of MichiganElectrical Engineering and Computer Science
Stream Computing
• Prevalent in embedded, desktop and server systems
• Many optimizations for mapping and scheduling applications to parallel architectures
• Retargetability is a big plus in streaming languages
• Task, pipeline, and data-level parallelism is mapped into core-level parallelism
• Data-level parallelism on SIMD engines is not utilized
University of MichiganElectrical Engineering and Computer Science
Traditional Vectorization on Streaming Applications
AudioBeam
BeamForm
erDCT
FFT
FM R
adio
Matr
ix Multip
ly
Matr
ix Multip
ly Block
Bitonic
Sort
FilterB
ank
MP3 D
ecoder
Average
0
0.5
1
1.5
2
2.5
3
3.5ICC + Auto Vectorize
Sp
ee
du
p (
x)
University of MichiganElectrical Engineering and Computer Science
Why SIMD engines are under-utilized?
• Finding data-level parallelism suitable for SIMD engines
• Proper data-alignment
• Complicated compiler optimization and transformations
• Wide variety of SIMD standards
University of MichiganElectrical Engineering and Computer Science
In this work…
• Macro-level SIMDization techniques for streaming languages.
• MacroSS compiler for StreamIt language
• Hardware-based buffer optimizations for packing/unpacking operations
• Evaluation of MacroSS on Intel Core i7
University of MichiganElectrical Engineering and Computer Science
StreamIt
• Main Constructs:– Filter: Encapsulate computation.