Top Banner
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University of Michigan
28

Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Low-power Clock Trees for CPUs

Dong-Jin Lee, Myung-Chul Kimand Igor L. MarkovDept. of EECS, University of Michigan

1ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 2: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Outline

■Motivation and challenges■Modeling and objectives

− Local skew with variation− Local-skew slack− Modeling process variation■Proposed methodology and techniques

− Initial tree construction and buffer insertion− Robustness improvements− Wire snaking and delay buffer insertion■Empirical validation■Summary

2ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 3: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Motivation

■Clock networks− Contribute a significant fraction of dynamic power− A limiting factor in high-performance CPUs and SoCs

■Challenges − Interconnect is lagging in performance

while transistors continue scaling− Multi-objective optimization

– Traditional clock network synthesis constraints– The increasing impact of process variation– Power-performance-cost trade-offs

3ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 4: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Tree vs Mesh

■Objectives− Minimize skew of a high-performance clock tree− Minimize the impact of PVT variations− Clock trees vs meshes, subject to skew < 7.5ps

4

Ro

bu

stn

es

s

Power efficiency

Trees

Ideal clock networks

Meshes

ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 5: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Our Contributions

■The notion of local-skew slack for clock trees

■A tabular technique to estimate the impact of variations

■A path-based technique to enhance the robustness

■A time-budgeting algorithm for clock-tree tuning with minimal power resources

■Fine tuning of clock trees : accurate, fast, power efficient

■Implementation : Contango2.0

■Strong empirical results : low skew, robustness, low power

5ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 6: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Modeling and Objectives

6ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 7: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Local Skew

■Main objective (concept)− Minimize local skew in the presence of variation

■Definition: Skew− Ψ : Clock tree

− λ(si) : the clock latency (insertion delay) at sink si Ψ∈−

■Definition: Global Skew (ωΨ)−

7ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 8: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■Definition: The worst nominal local skew (ωΨΔ)

− Δ : local skew distance bound

− dist(si,sj) : Manhattan distance between si and sj Ψ∈−

■Definition: The worst local skew with variation (ωΨΔ,ν,y )

− ν : variation model − y : yield (0 <y ≤ 1)

− f(t) : the cumulative distribution function of ωΨΔ,ν

Local Skew

8ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 9: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Worst local skew with variation (ωΨΔ,ν,y )

− Probability density function of ωΨΔ,ν

− ΩΔ = 7.5ps, y = 95%, ωΨΔ,ν,y< ΩΔ

− ωΨΔ,ν,y = 6.05ps

Modeling and Objectives - Example

9

0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3ΩΔωΨ

Δ,ν,y

ps

ICCAD 2010, Dong-Jin Lee, University of Michigan

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3PDFCDFInverse CDFPDF

y = 0.95

ωΨΔ,ν,y = 6.05ps

Page 10: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■Building variation-tolerant clock trees

− such that ωΔ,ν,y < ΩΔ (ΩΔ – local skew limit)− subject to slew constraints■Minimizing clock-tree power

Optimization Objectives

10ICCAD 2010, Dong-Jin Lee, University of Michigan

0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3ΩΔωΨ

Δ,ν,y

ps0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3

Page 11: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Local-skew Slack σ(s) for sink s Ψ∈

■Definition− σ(s) is the minimum amount of additional delay for s,

so that the tree satisfies ωΨ Δ < ΩΔ

■Example (Ωδ = 5ps)

11ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 12: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Modeling Process Variation

■Impact of variation on skew(si,sj) depends on tree path length(si,sj), num. buffers(si,sj) and type buffers(si,sj)

■Notation− T : technology node− B : buffer and wire library− v : variation model

■Variation-estimation table ΞT,B,ν,y[w,b,t] − worst-case increase in skew (with probability y) between

two sinks connected by a tree path of length w with b buffers and the buffer type t

12ICCAD 2010, Dong-Jin Lee, University of Michigan

w : tree path length b : num. of buffers (2)t : buffer type

A B C D

Page 13: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Modeling Process Variation

■varEst(si,sj)

− the worst case variational skew(si,sj)−

■Key constraint−

13ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 14: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Initial Tree Construction

■ZST-DME algorithm* based on Elmore delay■A simple and robust technique for obstacle avoidance** ■Initial buffer insertion

− t0 : the initial buffer type for initial buffer insertion− Use variation-estimation table with path lengths from

initial tree

− Once t0 is determined, we adapt the fast variant of van Ginneken’s algorithm*** for initial buffer insertion

− Minimize insertion delay, reliable slew rate

14

* : J.-H. Huang et al, “On Bounded-Skew Routing Tree Problem,” DAC‘95

** : D.-J. Lee et al, “Contango: Integrated Optimization of SoC Clock Networks,” DATE‘10

*** : W. Shi et al, “A Fast Algorithm for Optimal Buffer Insertion,” Trans. on CAD 24(6),2005

ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 15: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Robustness Improvement

■Improve robustness after initial buffer insertion so that ωΨ

Δ,ν,y < ΩΔ holds after skew optimization

■The target buffer type for a tree-path between sink si and sj, t(si,sj) is defined as the smallest t such that

− choosing smaller buffers reduces capacitance

15ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 16: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Local Skew Optimization : Wire Snaking

16

Ttarget(e) : 11ps Tactual(e) : 7ps

T2actual(e) : 3ps

T3actual(e) : 1ps

ICCAD 2010, Dong-Jin Lee, University of Michigan

■Local-skew optimization techniques− based on the optimal tuning amount

from the slack computation algorithms with varEst(si,sj) ■Improved wire snaking algorithm

− speed, accuracy and routing resources

e

T1target(e) : 11ps T1

actual(e) : 7ps

T2target(e) : 4ps

T3target(e) : 1ps

Tactual(e) : 7psTtarget(e) : 11ps Tactual(e) : 10psTactual(e) : 11ps

Titarget(e) ≥ Ti

actual(e)

Page 17: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■α : to keep Tiactual(e) ≤ Titarget(e) efficiently

■Delay model for wire snaking aims for Tiactual(e) to satisfy the above inequality with the highest α possible

■Look-up tables for length estimation− to enhance the quality of estimation by wire snaking − a set of SPICE simulations for each technology

environment which includes technology model, types of buffers and wires, variation specification

■We achieved α values between 60% and 70% for the ISPD 2010 CNS contest benchmarks

Delay Model for Wire Snaking

17ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 18: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■Wire snaking at buffer outputs is more accurate than at other nodes

■Limiting wire snaking to buffer outputs reduces # of SPICE calls

■Example

Optimal Node Selection for Wire Snaking

18ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 19: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■Highly unbalanced sink capacitances or layout obstacles may result in significant local skew

■Delay buffer insertion− Skew can be reduced by the delay of the inserted buffer− Further precise wire snaking is possible because

the inserted buffer isolates the target node■Example

Delay Buffer Insertion

19ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 20: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

ISPD’10 Clock Network Synthesis Contest

■45nm 2GHz CPU benchmarks from IBM and Intel

■Evaluation− Monte-Carlo SPICE simulations with PVT variations− Skew and slew constraints (7.5ps, 100ps)− Objective : total capacitance — proxy for dynamic power

■A rare opportunity to compare multiple strategies for clock-network synthesis

20ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 21: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■ispd10cns07

Example of Our Clock Tree

21ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 22: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■ISPD 2010 benchmarks

− 2.6ps nominal local skew− Smaller capacitance than CNSrouter and NTUclock

by 4.22× and 4.13× resp.− Our clock trees yield > 95%, while CNSrouter violates

yield constraints on 3 benchmarks and NTUclock on 7

Empirical Validation

22ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 23: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■Local skew constraints are

all cleared

■Smaller capacitance than NTU

and CUHK by 2.09× and

4.24× resp.

■More robust withsmaller

capacitance

ICCAD 2010 Proceedings

23ICCAD 2010, Dong-Jin Lee, University of Michigan

NTU CUHK Contango2

Bench ωΨΔ,ν,y Cap. ωΨ

Δ,ν,y Cap. ωΨΔ,ν,y Cap.

cns01 7.16 445 7.23 1168 7.01 198

cns02 7.33 934 7.35 2100 7.34 376

cns03 4.88 184 3.95 94 4.18 56

cns04 4.09 196 7.25 125 4.46 72

cns05 3.81 89 7.27 74 4.41 38

cns06 7.49 16 6.79 87 6.05 48

cns07 6.24 23 5.97 128 4.58 73

cns08 5.47 23 5.37 97 5.15 52

Avg. 5.81 2.09 6.40 4.24 5.40 1.0

Page 24: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■Probability density functions (PDF) for skew on ISPD’10 benchmarks

Skew Profiles for Contango2 & CNSrouter

24ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 25: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■When tight local skew constraints, large buffers ensure robustness, increasing capacitance

− Much capacitance can be saved when local skew constraints are loose

■Experiments on ispd10cns08

Trade-off - Power vs Robustness to Variations

25ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 26: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

■A tree solution for CPU clock routing− Improves power consumption under tight skew

constraints in the presence of variation− Clock trees can be tuned to have nominal skew below

5 ps and low total skew in the presence of variation− 4x capacitance improvement on average over

mesh structures

■Our clock trees have a higher yield than meshes− meshes are not as easy to tune for nominal skew

Summary

26ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 27: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Thank you!!

Questions?

Questions and Answers

27ICCAD 2010, Dong-Jin Lee, University of Michigan

Page 28: Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.

Questions and Answers

28ICCAD 2010, Dong-Jin Lee, University of Michigan