Top Banner
Clock Distribution and Balancing Methodology For Large and Complex ASIC Designs Clock Distribution and Balancing Methodology For Large and Complex ASIC Designs Dr. Kaijian Shi Principal Consultant Synopsys, Inc. (Professional Services)
49
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Clock Balance Ieee Seminar04

Clock Distribution and Balancing Methodology For Large and Complex ASIC Designs

Clock Distribution and Balancing Methodology For Large and Complex ASIC Designs

Dr. Kaijian ShiPrincipal Consultant

Synopsys, Inc. (Professional Services)

Page 2: Clock Balance Ieee Seminar04

AgendaAgenda

Part 1: Clock tree synthesis: FundamentalsClassical clock tree synthesis methodsAdvanced clock tree synthesis

Part 2: Clock tree synthesis: Engineer perspectiveCustom vs. automatic clock tree synthesisClock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Summaries

Page 3: Clock Balance Ieee Seminar04

Classical clock tree synthesis methodsClassical clock tree synthesis methods

Step 1: Generate a clock treeStep 2: Tune the clock tree to meet :-

Skew targetSlew targetOther required constraints

Page 4: Clock Balance Ieee Seminar04

Clock tree generation based on structure and load balance (H-tree)Clock tree generation based on structure and load balance (H-tree)

Structure balance Structure and load balance

Page 5: Clock Balance Ieee Seminar04

Clock tree generation based on structure and load balance (Fish-bone)Clock tree generation based on structure and load balance (Fish-bone)

Taping point

Page 6: Clock Balance Ieee Seminar04

Clock tree tuningClock tree tuningClock tree tuning

Extract clock tree from layoutBuild buffered RC network or Spice deck Calculate clock tree skewsTune clock tree branch delays

Adjust tapping pointsSize buffersSize wiresSnake route wiresAdd dummy load

Page 7: Clock Balance Ieee Seminar04

Advanced clock tree synthesis methodsAdvanced clock tree synthesis methods

0-skew clock tree synthesisClock tree synthesis considering process variations

Page 8: Clock Balance Ieee Seminar04

0-skew clock tree synthesis method0-skew clock tree synthesis method

Integrate 0-skew clock tuning into each level CTSBottom up hierarchical process:

Cluster clock nodes and build a local tree by the load balance based CTS methodsCreate a buffered RC network from the local clock treeMinimize clock skew by wire sizing and snake routing

AdvantagesEliminate port-CTS tuning processInterconnect aware skew balanceEfficient (local clock tree skew optimization)

Page 9: Clock Balance Ieee Seminar04

Advanced clock tree synthesis methodsAdvanced clock tree synthesis methods

0-skew clock tree synthesisClock tree synthesis considering process variations

Page 10: Clock Balance Ieee Seminar04

Clock tree synthesis considering process variationClock tree synthesis considering process variationP-variations cause unpredictable delay variations in transistors and wires -> uncontrollable skewThe delay variations in common part of clock tree between launch and capture flops do not cause skew

Principle of minimizing P-variation effectMinimize non-common part of clock tree between launch and capture clock nodes

logic cloud

Page 11: Clock Balance Ieee Seminar04

Apply to clock tree topologyApply to clock tree topology

Group launch and capture clock nodes and cluster them in the bottom up fashion in clock tree topology generation

More sophisticated methods: Create a weighted direction graph to model launch and

capture clock node relationship in a complex clock structureUse the graph as constraint in clock tree generationReduce complexity by considering only clock nodes in

timing critical paths.

GoodBad

Page 12: Clock Balance Ieee Seminar04

Apply to clock tree layoutApply to clock tree layout

Place clock tree buffers to reduce the length of the non-common portion of the clock tree

Bad Good

Page 13: Clock Balance Ieee Seminar04

Clock tree synthesis: Engineer perspectiveClock tree synthesis: Engineer perspective

Custom vs. automatic clock tree synthesisClock structure in large and complex ASIC designsClock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 14: Clock Balance Ieee Seminar04

Custom clock tree distribution and balancingCustom clock tree distribution and balancingManually define top levels of clock tree to blocks

H-tree, wide/shield wires, differential buffers etc.Build local mesh or tree to distribute clock to leaf cellsExtract clock tree and build SPICE deck.Analyze the extracted clock tree and tune it manually.Pros:

Low skew clock treeCons:

Long and complex processLarge power dissipation in clock mesh

Popular in high speed MPU design, but not suitable for ASIC and low-power designs

Page 15: Clock Balance Ieee Seminar04

Automatic clock tree synthesis (CTS) Automatic clock tree synthesis (CTS)

Pros:Flexible and relatively easy to useLow skew, if clock structure is not complexClock phase delay and wire length are minimizedmax_fanout, max_cap, max_slew targets are honoredGood correlation between pre- and post-route clock skew

Limitations:Optimization is focused on expanded clock buffer trees. Quality drops when clock structure becomes complex and CTS constraints are not provided.

Need CTS constraints and guidance.

Page 16: Clock Balance Ieee Seminar04

Clock tree synthesis: Engineer perspectiveClock tree synthesis: Engineer perspective

Custom vs. automatic clock tree synthesisClock structure in large and complex ASIC designsClock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 17: Clock Balance Ieee Seminar04

Clock structure in large and complex ASIC designsClock structure in large and complex ASIC designsFor power saving

Multi-level clock gatingMany gating domainsMulti-clock speed domains

For user programmability and debug supportUser programmable clock dividersVarious operation mode dependent clock distributions to support emulation and debug

Results: Complex clock structure with control logicChallenges in clock distribution and balancing

Page 18: Clock Balance Ieee Seminar04

Issues and practical solutions in automatic clock tree synthesisIssues and practical solutions in automatic clock tree synthesis

Clock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 19: Clock Balance Ieee Seminar04

Clock phase delay reductionClock phase delay reduction

Large phase delay => large clock tree and power dissipationContributions to clock phase delay

Delays of the expanded clock buffer treeDelays of “non-CTS” cells in the clock distribution. (gating, mux, divider)

Large clock phase delay: causes and solutions

Clock_Gating_Cell

Clock_Gating_Cell

FlipFlop

FlipFlop

CTBCTB

CTB

CTB

CTB

CTB

CTB

Clock divider

Td1 Td2 Td3 Td4

Page 20: Clock Balance Ieee Seminar04

Cause1: Bad placement of “non-CTS” cellsCause1: Bad placement of “non-CTS” cells

Large load cause large slew at “non-CTS” cell outputsCascaded “non-CTS” cells amplify slewTo fix slew violation, CTS tools insert back-to-back buffer chain=> buffer chain delay is added to clock tree insertion delay !

Long net => Large load

CTS1

0

Before clock tree expansion

Gating

After clock tree expansion

Gating

1

0

Page 21: Clock Balance Ieee Seminar04

Cause1: Bad placement of “non-CTS” cellsCause1: Bad placement of “non-CTS” cellsLarge “non-CTS” cell delays due to large load

The cell delays are part of the clock phase delayThe cell delays cannot be reduced by CTS tools

A practical solutionFind “non-CTS” output netsApply heavy net-weight to the nets in cell placement

Minimize net length and load on “non-CTS” cells.

Page 22: Clock Balance Ieee Seminar04

Cause2: Bad placement of local cells at high levels of a clock treeCause2: Bad placement of local cells at high levels of a clock tree

Though not require to be balanced in timing, the local flops arestill balanced on loading => increase clock phase delay

Flop1Before expansion

Flop2

Flop3

Top-level delay

CTB

Flop1After expansion

CTB

CTB

CTB

CTBCTB

Flop3

Flop2

Page 23: Clock Balance Ieee Seminar04

Solution 1: place local flops in a small regionSolution 1: place local flops in a small region

Region the local flop in a small area.

Effect is limited in a case of many local flops.

Flop3

Flop2

Flop1

Top-level delay

After expansion

CTB

CTB

CTB

CTB

Page 24: Clock Balance Ieee Seminar04

Solution 2: Isolate the local flopsSolution 2: Isolate the local flops

Insert and place an isolation buffer to minimize load on clock source => eliminate local flops’ load on clock tree

Flop3

Flop2

Flop1

Top-level delay

Before expansion

CT

B

Flop3

Flop2

Flop1After expansion

CT

B

CTB

CTB

CTB

CTB

CTB

CTB

Page 25: Clock Balance Ieee Seminar04

Cause3: Bad floorplanCause3: Bad floorplan

The longest clock path determines clock phase delay

PLLs

Flop

Page 26: Clock Balance Ieee Seminar04

Solution: CTS friendly floorplanSolution: CTS friendly floorplan

Equalize steiner distance from clock source to leaf cells.Region cells connected to high level clock tree in a small area close to the clock source.Do not place a subchip in a position that can force clock paths detour routed around it.

Page 27: Clock Balance Ieee Seminar04

An example of CTS friendly floorplanAn example of CTS friendly floorplan

Challenges of rigid pin constraints and fixed size hard IP core.Equalize Steiner distance from clock source to leaf cells.Region clock management block in a small area close to PLL.No hard macros in std_cell placement area.

coarse and fine programmable DLLs Mega DSP Core

Interface

DPLL x 2

CLK

&R

ST

Mg r

.

MPU Core

DMA Controller

Inte

rface

s

CLOCK

SDR

AM

/DD

R

Page 28: Clock Balance Ieee Seminar04

Issues and practical solutions in automatic clock tree synthesisIssues and practical solutions in automatic clock tree synthesis

Clock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 29: Clock Balance Ieee Seminar04

Clock skew controlClock skew control

Localized clock skew can be used for good causes:Borrow time from non-critical paths to meet timing of critical pathsReduce IR drop and EM caused by simultaneous clock switching

In general case, skew degrades design speed and causes malfunction due to hold time violations.

clock skew need to be minimized

Page 30: Clock Balance Ieee Seminar04

Cause1: Multiple clock dividing paths Cause1: Multiple clock dividing paths

Clock dividing FSM CTS

1/21/41/61/8

bypass

Clock select

CTS tools usually balance through one dividing pathHigh fan-in mux has large delay variations from inputs

Page 31: Clock Balance Ieee Seminar04

Solution: balance dividing paths through a delay equalizerSolution: balance dividing paths through a delay equalizer

Paths from Clock-in through the flop and the delay line are balanced.

Div lock-in D QCLR

Clock-in

Reset

Clock-out

Div/bypass select

Delay line

1

0

Page 32: Clock Balance Ieee Seminar04

The delay equalizer implementation 1The delay equalizer implementation 1

Pros. Easy to implementCons. FSM flops need to be skewed earlier than bypass clock to avoid a cycle shift of the equalizer output clock

Clock dividing

FSMCTS

1/2

1/4

1/6

1/8

bypassDividing Clock

select

Dividing clock delay equalizer

Div/bypass Clock select

Page 33: Clock Balance Ieee Seminar04

The delay equalizer implementation 2The delay equalizer implementation 2

Pros.Save one flopDivided clocks and bypass clock are always in phase

Cons. Need to redesign the clock divider logic

Clock dividing

FSM

CTS

Clock select

Dividing clock delay equalizerClock-in

The flop in the equalizer is integrated into the divider as the last stage flop

Page 34: Clock Balance Ieee Seminar04

Constraints on dividing/bypass select signalConstraints on dividing/bypass select signal

Select during power-on => static signal => no constraintRun time programmable => need constraintSignal switches only when clocks are in a same phase

Clock

Div_2

Div_4

Div_8

Page 35: Clock Balance Ieee Seminar04

Cause 2: Operation mode dependent multiple clock distributionsCause 2: Operation mode dependent multiple clock distributions

CTS

DPLL1

DPLL2

1

0

Func_clk

Test_sel

chip_func_enable

CTS

CTS

CTS

CTS

Test_clk

1

0

1

0

GATING

GATING

func_enable1

func_enable2

test_enable

test_enable

GATING

GATING

block1_func_enable

block2_func_enable

GATING

boundary_scan_enable

bsr_clock

Four levels of Gating1. Chip level clock gating2. Functional domain level clock gating3. Block level clock gating4. Transaction level clock gating

GATING

no_transaction

test_enableCTS

CTS

test_enable

Asynchronous module

1

0CTS

CTS

Functional path

Test mode path

func_enable

test_enable

GATING

Divider

1/x

CTS

Divider

1/x

GATING

DOMAIN2

DOMAIN1

Page 36: Clock Balance Ieee Seminar04

Solution: A multi-mode clock balance strategySolution: A multi-mode clock balance strategy

Extracted a common clock distribution topologyMode dependent clock path selection criteria

Minimize unbalanced leaf cellsMaximize balanced leaf cells in timing critical paths

Defined constraints for local paths outside of the common clock distributionBalance the common clock distribution and local paths with defined constraints

=> Clock tree is balanced in various modes

Page 37: Clock Balance Ieee Seminar04

Solution: A multi-mode clock balance strategySolution: A multi-mode clock balance strategy

CTS

DPLL1

DPLL2

1

0

Func_clk

Test_sel

chip_func_enable

CTS

CTS

CTS

CTS

Test_clk

1

0

1

0

GATING

GATING

func_enable1

func_enable2

test_enable

test_enable

GATING

GATING

block1_func_enable

block2_func_enable

GATING

boundary_scan_enable

bsr_clock

GATING

no_transaction

test_enableCTS

CTS

test_enable

1

0CTS

CTS

Functional path

Test mode path

func_enable

test_enable

GATING

Divider

1/x

CTS

Divider

1/x

GATING

DOMAIN2

DOMAIN1

Asynchronous module

Page 38: Clock Balance Ieee Seminar04

Function mode inverting, test mode non-inverting clock paths Function mode inverting, test mode non-inverting clock paths

Problem: inverter causes clock skew in test mode.Solution: Implement XNOR gate

Clock_in

Test_mode

Clock_out Clock_in

Test_mode

Clock_out

1

0

Page 39: Clock Balance Ieee Seminar04

Issues and practical solutions in automatic clock tree synthesisIssues and practical solutions in automatic clock tree synthesis

Clock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 40: Clock Balance Ieee Seminar04

Clock duty cycle distortion controlClock duty cycle distortion control

Applicable to a design utilizing both clock edgesCause setup time violations in timing critical pathsMain causes of clock duty cycle distortion

Rise and fall delay variations of library cellsThe large load on the cells, the larger delay variationsThe deeper a clock tree, the larger clock duty cycle distortion

PLL introduced distortion

Page 41: Clock Balance Ieee Seminar04

Solution1: insert inverter pairs in a clock treeSolution1: insert inverter pairs in a clock tree

Find a mid-point in a clock tree to invert rise/fall delay variation so as to cancel the non-inverted rise/fall delay variation.Issues:

A number of inverters need to be added after clock tree is expanded.Find optimal insertion points in the expanded clock tree is not easy.

FlipFlop

FlipFlop

CTB

CTB

CTB

CTB

CTB

CTB

Rise/fall delay delta = -100ps Rise/fall delay delta = +100ps

INV

INV

INV

Page 42: Clock Balance Ieee Seminar04

Solution2: Implement a clock phase chopper – case 1: longer high phaseSolution2: Implement a clock phase chopper – case 1: longer high phase

Restore 50/50 duty cycle by delaying rising edge of the high clock phase

Clock source

Clock tree

Input Clock

Output ClockDelay line Td

Td

Pros. Single insertion point and easy to implementImplemented before clock tree expansionLocal delay tuning has little disturbance in layout

Page 43: Clock Balance Ieee Seminar04

Solution2: Implement a clock phase chopper – case 2: longer low phaseSolution2: Implement a clock phase chopper – case 2: longer low phase

A clock phase chopper for 50-Td/50+Td duty cycle distortionAn OR gate chopper to delay falling edge of the clock

Clock source

Clock tree

Input Clock

Output ClockDelay line Td

Td

Page 44: Clock Balance Ieee Seminar04

Issues and practical solutions in automatic clock tree synthesisIssues and practical solutions in automatic clock tree synthesis

Clock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 45: Clock Balance Ieee Seminar04

Clock gating efficiencyClock gating efficiency

Maximize gating effect to reduce idle powerMinimize ungated part of the clock tree

Place clock gating cells close to clock sources

Clock_Gating_Cell

Clock_Gating_Cell

Flop

Flop

CTB

CTB

CTB

CTB

CTB

Clock_Gating_Cell

Clock_Gating_Cell Flop

Flop

CTB

CTB

Page 46: Clock Balance Ieee Seminar04

Issues and practical solutions in automatic clock tree synthesisIssues and practical solutions in automatic clock tree synthesis

Clock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 47: Clock Balance Ieee Seminar04

Issues: Large net load on non-CTS cells in the clock paths can cause slew and IR drop violations.Solution: netweight and region based placement.

Clock signal integrityClock signal integrity

Crosstalk induced violationsSolution: Identify critical clock routes, reroute the nets with wide spacing or shielding

CTS

Long net => large load

Clock divider

Page 48: Clock Balance Ieee Seminar04

SummarySummary

Clock tree synthesis: FundamentalsClassical clock tree synthesis methodsAdvanced clock tree synthesis

Clock tree synthesis: Engineer perspectiveCustom vs. automatic clock tree synthesisClock phase delay controlClock skew controlClock duty cycle distortion controlClock gating efficiencyClock signal integrity

Page 49: Clock Balance Ieee Seminar04

Q & A