BERKELEY ERKELEY P LAB B P L AR AB The Future ofThe Future ... · Michael Anderson, Bryan Catanzaro, Jike Chong, Katya Gonina, Dorothea Kolossa, Chao-Yue Lai, Mark Murphy, Bor-Yiing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
No question we can build these processorsBig question: How do we program 32 512 processors to solve yourBig question: How do we program 32 – 512 processors to solve your favorite application?
Perform content-based image retrievalLarge-vocabulary speech recognitionMRI image reconstructionV l t i k l i i tit ti fiValue-at-risk analysis in quantitative financeEtc.
Parallel processors are more efficient, if we can program them. So … the future of microprocessors is really about software.That’s why the rest of the talk is all about programming applications on parallel microprocessors
Today’s code is serial but the world is parallel we just need to find the parallelism in ourwe just need to find the parallelism in our applications and reflect it in our software
Our formula in a bit of detail: MRI reconstructionOther applications where our formula has proven itselfOther applications where our formula has proven itselfApplications in progress
Pediatric MRI is difficult:Children cannot sit still breathholdChildren cannot sit still, breathholdLow tolerance for long examsAnesthesia is costly and risky
Like to accelerate MRI acquisitionAdvanced MRI techniques exist, but
i d t d t i trequire data- and compute- intense algorithms for image reconstruction
Reconstruction must be fast, or time saved in accelerated acquisition is lost in computing reconstruction
Non-starter for clinical useNon-starter for clinical use
100X faster reconstructionHigher quality faster MRIHigher-quality, faster MRIThis image: 8 month-old patient with cancerous mass in liver
256 x 84 x 154 x 8 data sizeSerial Recon: 1 hourP ll l R 1 i tParallel Recon: 1 minute
Fast enough for clinical useSoftware currently deployed at Lucile Packard Children's Hospital for clinical study of the reconstruction techniquereconstruction technique
Our formula in a bit of detail: MRI reconstructionOther applications where our formula has proven itselfOther applications where our formula has proven itselfApplications in progress
19
BERKELEY PAR LAB
Support-Vector Machine Mini-Framework
Algorithmic changes and parallel i l i l d fimplementation lead to performance speedup: Core 2 Duo versus G80
Computation LIBSVM Our algorithm2-core Parallel
Our algorithm, 16-coreParallelParallel
CPUParallel GPU
SVM Training(geo-mean)
771.8 s --- 38.5 s
SVM Classification
41.42 s 4.21 s 0.38 s
20
Fast support vector machine training and classification, Catanzaro, Sundaram, Keutzer, International Conference on Machine Learning 2008
100X speed‐up793 downloads since release in 10/2008
Achieved 11x speedup over sequential versionAll 3 5 f t th l ti itiAllows 3.5x faster than real time recognition
Our technique is being deployed in a hotline call-center data analytics companyUsed to search content, track service quality and provide early d t ti f i idetection of service issues
22
Scalable HMM based Inference Engine in Large Vocabulary Continuous Speech Recognition,Kisun You, Jike Chong, Youngmin Yi, Ekaterina Gonina, Christopher Hughes, Wonyong Sung and Kurt Keutzer, IEEE Signal Processing Magazine, March 2010
Value-at-Risk Computation with Monte Carlo MethodMonte Carlo MethodSummarizes a portfolio’s vulnerabilities to market movementsImportant to algorithmic trading, derivative usage and highly leveraged hedge fundsgImproved implementation to run 60x faster on a parallel microprocessor
f ( )
Four Steps of Monte Carlo Method in Finance
Uniform RandomNumber Generation
Market Parameter Transformation
Instrument Pricing
f (x)Data
Assimilation
Matthew Dixon, Jike Chong, Kurt Keutzer, “Acceleration of Market Value-at-Risk Estimation”, Workshop on High Performance Computing in Finance at Super Computing 2009, November 15, 2009.
BERKELEY PAR LAB
Applications
Our formula in a bit of detail: MRI reconstructionOther applications where our formula has proven itselfOther applications where our formula has proven itselfApplications in progress
Can locate humans in images20x speedup through algorithmic improvements and parallel implementation
Work can be extended to pose estimation for controller free
25
estimation for controller-free video game interfaces using ordinary web cameras
BERKELEY PAR LAB
Option Pricing Application
Price an option - a tradable financial security whose value depends on the value of an underlying asset and market parameters.on the value of an underlying asset and market parameters. Black-Sholes equation:
Speedup:
6x pricing 1 option25x pricing 128 options on Larrabee
• Optical Flow involves computing the motion vectors (“flow field”) between the consecutivevectors (“flow field”) between the consecutive frames of a video
• Involves solving a non‐linear optimization problem
Speedup
32x linear solver
Linear solver
Matrix creation
Interpolation & WarpingDownsampling
27
7x overall ow s p g
Filter
Other
Memcopy CPU-GPU
Serial Parallel
BERKELEY PAR LAB2009 ITRS - Functions/chip and Chip Size
10000000
2009 ITRS Cost Performance MPU Functions
Figure 9b Logic
Including ’11 snapshot AnalysisITRS Roadmap
100000
1000000
and
Squa
re M
illim
eter
s
2009 ITRS Cost-Performance MPU Functionsper chip at production (Mtransistorst)
2009 ITRS High-Performance MPU Functionsper chip at production (Mtransistors)
BERKELEY PAR LABParallelism Delivers Moore’s Law (and More) in the Future
T iTransistorsSerial PerformanceParallel Performance
1970 1980 1990 2000 2010 2020
29
BERKELEY PAR LAB
Tip of the Iceberg
Today we’re only exploiting the smallest tip of the iceberg of computation that will be available in the futurecomputation that will be available in the future
How do we deliver all those new applications to the user?
I’ve talked about how to translate Moore’s Law i t f l li ti i iinto useful applications via microprocessors.Our next speakers will talk about how to deliver those applications to you.