Power Consumption by Integrated Circuits Lin Zhong ELEC518, Spring 2011
Power Consumption by Integrated Circuits
Lin ZhongELEC518, Spring 2011
Power consumption of processing
• Dynamic power
2
Busy power vs. delay vs. energy
fVCaP dddyn 2
)( Tdd
dd
VVV
t
Analysis and Design of Digital ICs, Hodges et al
3
Core 2 Duo for example• Intel® Core™2 Duo processor
– T7800 at 2.6GHz– T7700 at 2.4GHz available on Thinkpad T61p– 0.75-1.35V, 35Watts
• Intel® Core™2 Duo Low Voltage– L7500 at 1.6GHz available on Thinkpad X61– 0.75-1.3V, 17Watts
• Intel® Core™2 Duo Ultra Low Voltage– U7500 at 1.06GHz available on Dell D430– 0.75-0.975V, 10Watts
4
5
Switching energy
e=1/2 C V∙ ∙ 2
Switching power
P= b C V∙ ∙ 2= a C V∙ ∙ 2 f∙
Higher integration• Selling the chipset (or solution or platform)
– Intel Centrino• Centrino Duo includes Core 2 Duo processor, 9XX Express-series chipset,
and Wi-Fi adapter– TI TCS2600 chipset
6 6
System-on-a-chip (SoC)
• TI OMAP
7
SiP: Multiple-chip product (MCP)
Siemens SX66 PDA PhoneAudiovox PPC6601KIT
32MB
400MHz
Source: Intel.com
8
SiP: Stacked-die approachQualcomm 3G CDMA2000 chip
Seven power regimes 100 clock regimes
ISSCC 20049
10
Moore’s Law
known
Exciting Unknown
11
MOSFET at nanoscale
Sunlin Chou, “Extending Moore’s Law in the Nanotechnology Era” (www.intel.com).
12
Given workload L and deadline T
• L measured by # of CPU cycles• Clock speed f ≥ L/T
• Time to finish: t = L/f
• Energy to finish: P t= a C V∙ ∙ ∙ 2 f t= a C ∙ ∙ ∙V∙ 2 L∙
13
Effect of lower clock speed (f)
Power consumption
P= a C V∙ ∙ 2 f∙
Energy consumption
E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2
L∙
14
Effect of lower supply voltage (V)
Power consumption
P= a C V∙ ∙ 2 f=k V∙ ∙ 3=x f∙ 3
Energy consumption
E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2
L∙
Maximum clock speed
f= b V∙
15
Given workload L and deadline Tsingle processor
• The processor can run at any frequency (voltage)– f= b V∙
• The processor can be complete off when work is done (zero power when idle)
• To minimize energy consumption, at which frequency should the processor run?– f ≥ L/T (in order to meet the deadline)– E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2 L∙– f=????
16
time
f
T
f1=L/T
f2=L/(T/2)=2f1
17
time
P
T
P1=x f∙ 3
P2=23P1
18
Given workload L and deadline TM processors
• The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0)
• To minimize energy consumption, at which frequency should processor i run?– f i= Li/T and V = u L∙ i
– Ei= a C V∙ ∙ 2 L∙ i=w L∙ i3
19
Given workload L and deadline TM processors
• The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0)
• To minimize the TOTAL energy consumption, how should the workload be allocated?– E= E1+E2+…+EM= w L∙ 1
3+w L∙ 23+…+w L∙ M
3
– = w(L13+L2
3+…+LM3)
20
From high school
• [(a+b)/2]2≤ (a2+b2)/2
≥ ≥ ≥
Quadratic mean Arithmetic mean Geometric mean harmonic mean
21
From high school (Contd.)
• [(a+b)/2]3≤ (a3+b3)/2 ( for a, b ≥0)
– E= w(L13+L2
3+…+LM3) ??? (L1+L2+…+LM)3
22
From college: Convex (Concave)
By definition of “convex”
23
Jensen’s Inequality (finite form)
• ϕ (x) is convex– ϕ (t x∙ 1+(1-t) x∙ 2)≤ t ∙ ϕ (x1)+(1-t) ∙ϕ (x2)
http://en.wikipedia.org/wiki/Jensen%27s_inequality#Proof_1_.28finite_form.29
24
• ai=1/n• ϕ (x) =x2 (Convex)
• ϕ (x) =x3(Convex for x≥0)– E= w(L1
3+L23+…+LM
3)=w M (L∙ ∙ 13+L2
3+…+LM3)/M
– ≥ w M [(L∙ ∙ 1+L2+…+LM)/M] 3=w L∙ 3/M2
≥
More about ConvexityCost
Return
Example Cost Return
Workload distribution Energy Workload finished within T
Eating Price of apples Pleasure from eating apples
Helicopter engine Price of engine Engine thrust
Law of diminishing marginal returns
Cost of production Increase in production
More about Convexity
• Greedy optimization works• Combine simpler/cheaper components
Cost
Return
27
Check the assumptions
• Power consumption is zero when the processor is not active
Idle power (Static power)
Tstatic eTP
2 ddVddstatic eVP
When IC is idle but not powered off, e.g. SRAM28
Leakage power
30
Scaling down
Scaling down (Contd.)
31
Thermodynamics: Gas
Quantum dynamics: Individual molecules
Uniform (central limit theorem)
High variation and likely defectivel
Scaling: Not that simple (Contd.)
32
Tunneling effect
33time
f
T
f1=L/T
f2=L/(T/2)=2f1
34time
P
T
P1=x f∙ 3
35time
P
T
P1=x f∙ 3+Pstatic
36time
P
T
P1=x f∙ 3+Pstatic
P2=23x f∙ 3+Pstatic
Why is static power important?
ITRS, 2009
Pentium II (Klamath) and III (Coppermine)
7.5M Transistors28M Transistors 38
Core 2 Duo (Conroe)
64KB L1 cache, 4MB L2 cache, 291M Transistors
39
Core 1
Core 2
Solutions to “never-enough” challenge
234M transistors
24M go to L2 cache
8 SPE, each 20.9M transistors (167M transistors)
Each has 4 64KB SRAM (12M transistors)
SRAM takes 122M transistors (>50%)40
Multiple power/clock domains
TI OMAP 2 architecture, ISSCC 2005
Multimedia phone: NTT DoCoMo 3G FOMA 902i to be released with OMAP2420
41
42
Given workload L and deadline Tsingle processor
• One processor can run at any frequency (voltage)– f= b V∙
• The processor can be complete off when work is done (zero power when idle) Given Pstatic
– Given energy overhead of shutting down the processor (Eoverhead)
• To minimize energy consumption, at which frequency should the processor run?
43time
P
T
P1=x f∙ 3+Pstatic
P2=23x f∙ 3+Pstatic
Why is there overhead to power off circuit?
Clock generator
• Resonant circuit + amplifier
• Resonant circuit (Oscillator)– Crystal oscillator (>2x109/yr)
• ~10KHz to ~10MHz• Quartz, ceramics (low cost, low accuracy), surface acoustic
wave (SAW) quartz crystal (expensive, accurate)• Real-time clocks
– 32.768KHz (215), 4.194304MHz (222)• Application-specific
– 4.9152MHz (4 x 1.2288MHz, CDMA baseband frequency)……
45
ResA
• LC/RLC circuit• Ring oscillator
– Application other than oscillator?• Voltage-controlled oscillator (VCO)
– Varicap: variable capacitance diode (tuning diode)– Phase-locked loop for high-speed clock (next slide)– Frequency scaling of IC for energy saving
Oscillator (Contd.)
46
• High-speed clock from a master oscillator• Digital PLL
• Clock generation, recovery, synchronization– Digital computing, RF communication
Phase-locked loop (PLL)
47
Phase-frequency detector
Master oscillator VCO
Frequency divider (N)
voltage
48
Given workload L and deadline Tsingle processor
• The processor can run at any frequency (voltage)– f= b V∙
• The processor can be complete off when work is done (zero power when idle)
• To minimize energy consumption, at which frequency should the processor run?– f ≥ L/T (in order to meet the deadline)– E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2 L∙– f=????
Threshold voltage
50
Vdd scales slow & Vth scales slower• Vth is limited by the
thermal voltage
• Vdd needs to stay considerable higher than Vth to curb leakage current
• End up with destroying the scaling rules– low channel mobility
Plummer and Griffin, 2001 (Data from ITRS/NTRS)
51
Check the assumptions (Contd.)
• The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0)
• Communication cost between processors!!!
Quadrotor vs. Helicopter
Quadrotor vs. Helicopter
De Bothezat Quadrotor, 1923.
Quadrotor vs. Helicopter
A.R. Drone, 2010
Wire power consumption
55
Wire power consumption
Inter-processor communication