12/31/12 1 26 th Interna.onal Conference on VLSI January 2013 Pune, India Architectural Alternatives for Energy Efficient Performance Scaling Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Outline • Impending Power and Thermal Limits to Mul.core • New Rules: Scaling Performance • Applica.on: CoDesign of a Mul.core Architecture and Thread Scheduler 2
16
Embed
Architectural Alternatives for Energy Efficient Performance Scalingcasl.gatech.edu/.../2013/01/VLSID_2013_yalamanchili.pdf · 2013-01-30 · Architectural Alternatives for Energy
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
12/31/12
1
26th Interna.onal Conference on VLSI January 2013 Pune, India
Architectural Alternatives for Energy Efficient Performance Scaling
Sudhakar Yalamanchili
School of Electrical and Computer Engineering Georgia Institute of Technology
Outline
• Impending Power and Thermal Limits to Mul.core
• New Rules: Scaling Performance
• Applica.on: Co-‐Design of a Mul.core Architecture and Thread Scheduler
2
12/31/12
2
Moore’s Law
3
From wikipedia.org
• Performance scaled with number of transistors
• Dennard scaling: power scaled with feature size
Goal: Sustain Performance Scaling
New Rules: The End of Dennard Scaling
tox
SOURCE DRAIN
L
GATE
• Change in scaling factors • Slower scaling in power per
transistor à increasing power densi.es
From R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.
4
12/31/12
3
Impending Power and Thermal Limits for Multicore
Mukhopadhyay and Yalamanchili (2009)
n Based on scaling using Pen.um-‐class cores modeled with Intsim1
n Chip level performance will be power and thermally limited!
1D. Sekar, A. Naemi, R. Savari, J. Davis, and J. Meindl, “IntSim: A CAD tool for optimization of multilevel interconnect networks,” Proceedings of the IEEE/ACM international conference on Computer-aided Design, 2007
5
Impending Power and Thermal Limits for Multicore
6
Year?
Dark Silicon Gap
Per
form
ance
Predicted by Moore’s Law
Limited by power/thermal
12/31/12
4
Managing the Physics
7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5
1
1.5
2
2.5
3
3.5
1 14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
300
313
326
339
352
Peak Die Tem
perature -‐>
CPU & GPU Relative Power-‐>
Time (seconds) -‐>
GPU Pow CPU CU0 Pow
CPU CU1 Pow PeakDieTemp
CPU power is limited, GPU running at max DVFS state
Thermal coupling
Temp thro>ling
CU0 CU1 GPU
n Thermal coupling between CPU and GPU accelerates temperature rise
n Induces premature throUling
AMD Trinity APU
Paul, Manne, Bircher, & Yalamanchili 2012
Outline
• Impending Power and Thermal Limits to Mul.core
• New Rules: Scaling Performance
• Applica.on: Co-‐Design of a Mul.core Architecture and Thread Scheduler
12/31/12 8
12/31/12
5
Post Dennard Architecture Performance Scaling
Perf opss
!
"#
$
%&= Power W( )×Efficiency ops
joule!
"#
$
%&
W. J. Dally, Keynote IITC 2012
Operator_cost + Data_movement_cost
Three operands x 64 bits/operand
Energy = #bits× dist −mm× energy− bit −mm
9
Borkar & Chien, 2011
Scaling Performance: Cost of Data Movement
10
Embedded Platforms
Goal: 1-100 GOps/w Goal: 20MW/Exaflop
Big Science: To Exascale
• Sustain performance scaling through massive concurrency
• Data movement becomes more expensive than computa.on
Courtesy: Sandia Na1onal Labs :R. Murphy.
Cost of Data Movement
12/31/12
6
Post Dennard Architecture Performance Scaling
Perf opss
!
"#
$
%&= Power W( )×Efficiency ops
joule!
"#
$
%&
W. J. Dally, Keynote IITC 2012
Operator_cost + Data_movement_cost
Three operands x 64 bits/operand Specialization à heterogeneity and asymmetry
Energy = #bits× dist −mm× energy− bit −mm
11
Borkar & Chien, 2011
Hardware Power-Performance Tradeoffs
Programmability/Flexibility
GO
ps/W
att
In-Order Processor
GPU
FPGA
DSP (LP)
ASIC
Xilinx Virtex 6
Westmere-‐EP
NVIDIA Tesla TMS320671D
Customization is key to energy efficiency!
freecaroffers.net
Model T
freewebs.com
12
OOO Processor
Atom
12/31/12
7
Computer Architecture Today: Speculation and Locality