Top Banner
Understanding Understanding Clock Tree Synthesis Log Messages Log Messages
63

Understanding cts log_messages

May 24, 2015

Download

Documents

understanding CLOCK TREE SYNTHESIS MESSAGES
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Understanding cts log_messages

UnderstandingUnderstanding Clock Tree Synthesis

Log MessagesLog Messages

© Synopsys 2012 1

Page 2: Understanding cts log_messages

Agenda

• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 2

Page 3: Understanding cts log_messages

Agenda

• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 3

Page 4: Understanding cts log_messages

Prerequisite 1: Run the check clock tree Command• Run the check_clock_tree command prior to clock tree

synthesis, and fix the issues reported

_ _

• This command checks the following, and reports issues that can lead to bad QoR: Cl k T S Clock Tree Structure Constraints Clock Tree Exceptions

© Synopsys 2012 4

Page 5: Understanding cts log_messages

Prerequisite 2: Ensure Placement Legality• For clock tree synthesis to proceed without any errors, it is necessary to

have a legally placed design. • Use the check legality command to check whether the design is

g y

• Use the check_legality command to check whether the design is properly placed and legalized, prior to CTS.

• In case of legality issues, use the legalize_placement command to resolve these issuesresolve these issues.

Note:• Clock tree synthesis will abort in case of placement legality issues• Clock tree synthesis will abort in case of placement legality issues.• In some cases, like overlapping standard cells, it may still proceed and

issue a warning during placement legality checking, but continuing with placement legality issues may lead to bad QoRplacement legality issues may lead to bad QoR.

Warning: Some cells in the design are not legal. (CTS-242)

© Synopsys 2012 5

Page 6: Understanding cts log_messages

Default Constraints

• The default constraints that clock tree synthesis uses are as follows:

Maximum transition time 0.5nsMaximum capacitance 0.6pFM i f 2000Maximum fanout 2000

© Synopsys 2012 6

Page 7: Understanding cts log_messages

Design Rule Constraints• In addition to the clock tree design rule constraint values specified usingIn addition to the clock tree design rule constraint values specified using set_clock_tree_options, IC Compiler also considers the design rule constraint values from the logic library and the design.

• The following table summarizes how IC Compiler determines the design rule constraint

Case1:Default behavior:

t lib f t f l

Case2:Use library and SDC settings for maximum fanout:

t lib f t t

Case3:Use only user set settings for clock tree synthesis and clock tree optimization:

The following table summarizes how IC Compiler determines the design rule constraint values used during the design rule fixing stage of clock tree synthesis and optimization.

cts_use_lib_max_fanout=falsects_use_sdc_max_fanout=falsects_force_user_constraints=false

cts_use_lib_max_fanout=truects_use_sdc_max_fanout=truects_force_user_constraints=false

cts_force_user_constraints=true

Maximum capacitance

The minimum value from:• The set_clock_tree_options• The CTS default value (0.6pF)

The minimum value from:• The set_clock_tree_options• The CTS default value (0.6pF)

Value set using set clock tree optionsMaximum capacitance The CTS default value (0.6pF)

• The logic library• The SDC constraints

The CTS default value (0.6pF)• The logic library• The SDC constraints

_ _ _ p

Maximum transition time

The minimum value from:• The set_clock_tree_options• The CTS default value (0.5ns)

Th l i lib

The minimum value from:• The set_clock_tree_options• The CTS default value (0.5ns)

Th l i lib

Value set using set_clock_tree_options

• The logic library• The SDC constraints

• The logic library• The SDC constraints

Maximum fanout The value set usingset_clock_tree_options

The minimum value from• The logic library• The SDC constraints• The set clock tree options

The value set usingset_clock_tree_options

© Synopsys 2012 7

The set_clock_tree_options

Page 8: Understanding cts log_messages

Constraints Specified Using the set clock tree options Command• Library units are used for time and capacitance values specified by using

the set_clock_tree_options command

_ _ _ p

• The smallest values accepted for the -max_capacitance and -max_transition options of the set_clock_tree_optionscommand are 1fF and 1ps respectivelycommand are 1fF and 1ps respectively.

• For example, if the library units are pF and ps, and you specify the following command IC Compiler will issue an error:command, IC Compiler will issue an error:icc_shell> set_clock_tree_options -max_cap 0.0009 -max_tran 0.300Error: User max_cap constraint (0.900000 fF) is too small. (CTS-206)Error: User max_tran constraint (0.300000 ps) is too small. (CTS-207)

– IC compiler will not accept these small values, and will use the previously specified values or the default values for maximum capacitance and maximum transition, during clock tree synthesis.

© Synopsys 2012 8

Page 9: Understanding cts log_messages

Agenda

• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 9

Page 10: Understanding cts log_messages

Enabling Debug Messages

• To enable clock tree synthesis debug messages in IC Compiler, use: set cts use debug mode trueset cts_use_debug_mode true

• Many of the messages discussed in this presentation are available only when you enable the debug mode.y g

© Synopsys 2012 10

Page 11: Understanding cts log_messages

Agenda

• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 11

Page 12: Understanding cts log_messages

Messages in the compile_clock_treeCommand Log

• Before clock tree synthesis: D i d t

Command Log

– Design update– Buffer and Inverter information– Clock tree constraints– Clock structure before clock three synthesis

• During clock tree synthesis:– Clustering– Meeting target early delayMeeting target early delay– Gate level clock tree synthesis results

• After clock tree synthesis:S t– Summary report

– Embedded clock tree optimization– DRC fixing beyond exceptions– Placement legalization

© Synopsys 2012 12

Page 13: Understanding cts log_messages

START CMD: compile clock tree CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011

Overview of the compile_clock_tree Command Log _ p _ _ ( ) ( )(PSYN-508)

CTS: CTS Operating Condition(s): MAX(Worst)START_FUNC: prelude CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011

(PSYN-508)Loading design 'ORCA_TOP'

…Information: Design Library and main library capacitance units are matched - 1.000 pf.

Prelude

g y y p pEND_FUNC: prelude CPU: 56 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011

(PSYN-508)…****************************************************************Information: TLUPlus based RC computation is enabled. (RCEX-141)****************************************************************Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX-007)

Extraction related messagesInformation: The distance unit in Capacitance and Resistance is 1 micron. (RCEX 007)Information: The RC model used is TLU+. (RCEX-015)…CTS: Blockage Aware AlgorithmCTS: Marking Ignore Pins....…Warning: too small maximum transition (=0.300000) defined at library cell dl02d4. (CTS-619)CTS b ff ti t d k t t d l d i i i tCTS: buffer estimated skew target delay driving res input capCTS: invbdk [0.009 0.010] [0.043 0.058] [0.197 0.213] [0.059 0.059]... CTS: Prepare sources for clock domain SD_DDR_CLKCTS: Prepare sources for clock domain SDRAM_CLKCTS: Prepare sources for clock domain SYS_2x_CLK…

Buffer characterization

CTS: Region Aware Algorithm is automatically turned off when design has no region or only has one region.CTS: Info: Found net sys_2x_clk, on cell I_RISC_CORE/I_REG_FILE/REG_FILE_B_RAM is macro. Will not treat as pad.…clean drc fixing cell first...In all, 0 drc fixing cell(s) are cleanedIn all, 0 drc fixing cell(s) beyond exception pins are cleaned…

© Synopsys 2012 13

…CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignoreCTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_9/S is implicit ignore…

Page 14: Understanding cts log_messages

CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignoreCTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_11/S is implicit ignore…

Warning: Ignore net sd_CK since it has no synchronous pins. (CTS-231)CTS: Info: will use target transition value for initial CTS stages

Pruning library cells (r/f, pwr)Min drive = 0.000372606.

…Final pruned buffer set (7 buffers):

bufbd1

Pruning of buffers and inverters

…CTDN lib estimation: buffers should result in better clock power.CTS: BA: Net 'sdram_clk'CTS: Starting clock tree synthesis ...CTS: Conditions = worst(1)CTS: Global design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300] GUI = worst[0.300 0.300] SDC = undefined/ignored

Reporting global clock tree constraints

…Information: Removing clock transition on clock PCI_CLK ... (CTS-103)

CTS: gate level 1 clock tree synthesisCTS: clock net = sdram_clkCTS: gate level 1 clock tree synthesis resultsCTS: clock net : sdram clk

Clock tree synthesisCTS: clock net : sdram_clk…TS: Clock tree synthesis completed successfullyCTS: CPU time: 18 secondsCTS: Reporting clock tree violations ...…CTS: ------------------------------------------------

Reporting the results of clock tree synthesis

CTS: Clock Tree Synthesis SummaryCTS: ------------------------------------------------…CTS: Starting block level clock tree optimization…CTS: gate level 1 clock tree optimizationCTS: clock net = pclk

Embedded clock tree optimization

© Synopsys 2012 14

CTS: clock net = pclk

Page 15: Understanding cts log_messages

Gate Upsizing During Clock Tree Synthesis

• The compile_clock_tree command will upsize all the

Synthesis

preexisting cells in the clock tree before building the clock tree.

Information: Replaced the library cell of sys_ctl/sunburst_clk_mux_div1/clk_buf from bufbd4 to bufbdf (CTS 152)

Preexisting gate

bufbdf. (CTS-152)

• In the previous example the preexisting gate is upsized from a bufbd4 to a bufbdf.

• This upsizing helps in reducing the number of buffer levels needed to building the clock tree, thereby reducing the buffer count.g , y g

© Synopsys 2012 15

Page 16: Understanding cts log_messages

Maximum Capacitance and Transition Related Warnings• Even if the set_clock_tree_options command does not issue

any errors when you set the maximum capacitance and transition constraints, the compile_clock_tree command can issue warnings if the values are too small.

Warning: too small maximum transition (=0.050000) defined at pin instCLK1GC1/Q. (CTS-620)Warning: too small maximum capacitance (=0.050000) defined at pin instCLK1GC1/Q. (CTS-620)Warning: too small maximum transition (=0.050000) defined at

Max trans =50ps is too tight for the pin instCLK1GC1/Q

Max cap =50fF is too tight for the pin instCLK1GC1/Q

Warning: too small maximum transition ( 0.050000) defined at library cell bufbdk. (CTS-619)

• Tight constraints can cause clock tree synthesis to use an excessiveTight constraints can cause clock tree synthesis to use an excessive number of buffers to build the clock trees

© Synopsys 2012 16

Page 17: Understanding cts log_messages

Buffers and Inverters Used During Clock Tree Synthesis

• Before synthesizing the clock tree, IC Compiler characterizes each buffer and inverter To see the characterization details, set the following variable to true:g

set cts_do_characterization true After characterization is done, characterized values for each buffer and

inverter are reportedBuffer pCTS: buffer estimated skew target delay driving res input capCTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]CTS: bufbd7 [0.025 0.030] [0.223 0.234] [0.415 0.503] [0.008 0.008]CTS b fbd4 [0 047 0 053] [0 347 0 357] [0 786 0 880] [0 004 0 004]CTS: bufbd4 [0.047 0.053] [0.347 0.357] [0.786 0.880] [0.004 0.004]Inverter Rise delay Fall delay

• Driving resistance determines the drive strength of the buffer or inverter. • Smaller the driving resistance, greater is the drive strength. • In the previous example, bufbdf is the buffer with the highest drive strength.

© Synopsys 2012 17

Page 18: Understanding cts log_messages

Unbalanced Buffers

• Buffers and inverters that have a big difference between their rise and fall delays, which is referred to as the rise/fall delay skew, are reported.CTS: inverter inv0da: rise/fall delay skew = 0.204816 (> 0.200000)

• Remove unbalanced buffers them from the buffer list specified for clock tree synthesis, as they can might cause bad skew.• Use the set_clock_tree_references command to specify the

buffers and inverters that should be used for clock tree synthesis

© Synopsys 2012 18

Page 19: Understanding cts log_messages

Pruning of Buffers and Invertors• Pruning is a process by which IC Compiler selects the buffers and

inverters which are best suited for clock tree synthesis, based on the buffer and inverter characterization, and prevents the remaining ones f b i dfrom being used.

• IC Compiler prunes the buffers and inverters based on drive strength and power:and power:Pruning library cells (r/f, pwr)

Min drive = 0.264263.Pruning inv0d0 because drive of 0.149845 is less than 0.264263.Pruning inv0d2 because it is (w/ power-considered) inferior to invbd2.

• IC Compiler calculates a minimum drive value based on heuristics. Buffers and inverters whose drive strength is less than the minimum drive value are considered as weak drivers and are pruned by IC d e a ue a e co s de ed as ea d e s a d a e p u ed by CCompiler.

• It is not possible to override the default pruning process

© Synopsys 2012 19

Page 20: Understanding cts log_messages

Maximum Transition, Maximum Capacitance and Timing ConstraintsCapacitance and Timing Constraints

Before clock tree synthesis begins, all the global clock tree constraints are reported in the log in the format shown below:

Default value or the value set usingset clock tree optionsThe value

reported in the log, in the format shown below:

CTS: Global design rule constraints [rise fall]CTS: max transition = worst[0.050 0.050] GUI = worst[0.100 0.100] SDC = worst[0.050 0.050]

Value from SDC

_ _ _ pused by CTS

[ ] [ ] [ ]CTS: max capacitance = worst[0.600 0.600] GUI = worst[0.600 0.600] SDC = undefined/ignoredCTS: max fanout = 2000 GUI = 2000 SDC = undefined/ignored

on s

Undefined means no value ifi d i SDCCTS: Global timing/clock tree constraints

CTS: clock skew = worst[0.100]CTS: insertion delay = worst[2.000]CTS: levels per net = 200

Skew

/inse

rtio

dela

y ta

rget

s

Values set using the

specified in SDC

Ignored means the value from SDC is ignored as the cts force user constraints

© Synopsys 2012 20

S d Values set using theset_clock_tree_options command

cts_force_user_constraints

variable is set to true

Page 21: Understanding cts log_messages

Clock Tree Synthesis Target Specifications

• Target specifications are the internal targets for clock tree synthesis,

Clock Tree Synthesis Target Specifications

but are not guaranteed. Only target constraints are guaranteed to be achieved CTS: Global target spec [rise fall]CTS: transition = worst[0.250 0.250]CTS: capacitance = worst[0.300 0.300]

CTS: fanout= 32 (This target fanout value is not considered by CTS)

• Target specifications: maxTransSpec: Min(0.25, 80%of max_transition constraints) maxCapSpec: Min(0.30, 80%of max_capacitance constraints)

© Synopsys 2012 21

Page 22: Understanding cts log_messages

Preexisting Clock Tree Information in the Log FileMaximum number of Before starting to

CTS: Design infomationCTS: total gate levels = 8CTS: Root clock net CLK2CTS: clock gate levels = 2

Number of sinks

Maximum number of gate levels available

e le

vels

Before starting to build the clock tree, the preexisting clock tree structure is printed in the log file

CTS: clock sink pins = 4CTS: level 2: gates = 1CTS: level 1: gates = 1CTS: Buffer/Inverter list for CTS for clock net CLK2:CTS: invbdk

Existing gate levels and number of gates at each level

Num

ber o

f gat

efo

r clo

ck C

LK2 printed in the log file

CTS: bufbdk...CTS: Root clock net CLK1CTS: clock gate levels = 8CTS: clock sink pins = 8431

N f

CTS: clock sink pins 8431CTS: level 8: gates = 2CTS: level 7: gates = 3CTS: level 6: gates = 4CTS: level 5: gates = 3CTS: level 4: gates = 1ev

els

from

ps to

war

dsso

urce

CTS: level 4: gates = 1CTS: level 3: gates = 5CTS: level 2: gates = 4

CTS: level 1: gates = 1CTS: Buffer/Inverter list for CTS for clock net CLK1:CTS i bdk

Gat

e l

flip-

flocl

ock

s

© Synopsys 2012 22

CTS: invbdkCTS: bufbdk...

Page 23: Understanding cts log_messages

Real Gates and Guide Buffers• You may see the term real gates in the preexisting clock tree structure

information section:CTS: Root clock net CLK1CTS: clock gate levels = 16CTS: clock gate levels = 16CTS: clock sink pins = 70644...CTS: level 13: gates = 14 (real gates = 4)CTS: level 12: gates = 111 (real gates = 101)CTS: level 11: gates = 146 (real gates = 136)g ( g )CTS: level 10: gates = 2488 (real gates = 2478)

• Real gates are preexisting gates in the clock tree, and are not gates added by the tool

• Guide buffers are buffers or inverters that are inserted by the tool, before it begins to build the tree. They are intended to help clock tree synthesis build a better clock tree

• The number of guide buffers inserted at each level can be determined from the difference between gates and real gates.– In the above example, the tool has added 10 guide buffers at each of the clock tree

© Synopsys 2012 23

Page 24: Understanding cts log_messages

Buffers and Inverters Used

• Before it begins to build the clock tree, the tool will list all the buffers and inverters it will use to build the treeCTS: Buffer/Inverter list for CTS for clock net sdram clk:_CTS: CLKBUFX20CTS: CLKBUFX16CTS: CLKBUFX12CTS: Buffer/Inverter LEQ cell list for Boundary Cell for clock net sdram_clk:CTS CLKBUFX20

CTS uses this list

CTS: CLKBUFX20CTS: CLKBUFX16CTS: CLKINVX8CTS: Buffer/Inverter LEQ cell list for CTO for clock net sdram_clk:CTS: CLKBUFX20

CTS uses this list for inserting boundary cells

CTS: CLKBUFX16CTS: CLKINVX8CTS: Buffer/Inverter list for DelayInsertion for clock net sdram_clk:CTS: CLKBUFX20

CTO uses this list for sizing

CTO thi li t f d l i tiCTS: CLKBUFX16CTS: CLKINVX8

• You can change the buffer and inverter list by using the following command:

CTO uses this list for delay insertion

© Synopsys 2012 24

set_clock_tree_references

Page 25: Understanding cts log_messages

Clock Tree Synthesis Removes User-Specified Ideal Attributes on Clocks

• Synthesized clocks are set to be propagated, and clock transition, which is an attribute of an ideal clock, is removed

Ideal Attributes on Clocks

CTS: Information: Removing clock transition on clock SP0XCLK ... (CTS-103)CTS: Information: Removing clock transition on clock SP0RCLK ... (CTS-103)

• Latency, another attribute of an ideal clock, is also removedLatency, another attribute of an ideal clock, is also removedCTS: Information: Removing clock latency on pin

Idma_scr_wrap0__Idma_scrba0_m2m0_wrap/I_dma_scrba0_m2m0/ I_dma@ ... (CTS-098)

• Source Latency is removed for generated clocksInformation: Removing clock source latency on clock CLK1GC1 ... (CTS-289)

• These messages are informational only, and no action is required

© Synopsys 2012 25

Page 26: Understanding cts log_messages

Overlap or Reconvergent Paths

• Overlap or reconvergent paths occur when multiple clocks can drive a nodenode

• IC Compiler issues warnings about such pathsWarning: Either the driven net has been synthesized previously or

clock path overlaps/reconverges at pin periph/U1852/Y. (CTS-209)

• Such messages should be treated as informational, rather than as warnings– IC Compiler has no problems handling such situations

© Synopsys 2012 26

Page 27: Understanding cts log_messages

Cl k t b ildi i d t l l b t l l t ti f th

Gate Level-by-Level Clock Tree Synthesis• Clock tree building is done gate level by gate level, starting from the

sinks to the clock root

• For each gate level, just before the synthesis starts, the following information will be printed in the log:CTS: gate level 2 clock tree synthesisCTS: clock net = I BLENDER 1/gclk Net and driver at_ _ gCTS: driving pin = I_BLENDER_1/U483/ZCTS: gate level 2 design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300]CTS: max capacitance = worst[0.300 0.300]

Net and driver atthis gate level

CTS: max fanout = 2000CTS: gate level 2 target spec [rise fall]CTS: transition = worst[0.240 0.240]CTS: capacitance = worst[0.240 0.240]CTS: driver cap. = worst[0.088 0.088]C S: d e cap. o st[0.088 0.088]CTS: fanout = 32CTS: gate level 2 timing constraintsCTS: clock skew = worst[0.000]CTS: levels per net = 200

© Synopsys 2012 27

CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]

Page 28: Understanding cts log_messages

• The clock tree building starts with clustering. Clustering is the process of

Clustering During Clock Tree Synthesisg g g p

dividing a set of sink pins (fanouts) into groups. Each group is driven by a buffer The instances of a cluster are all close to each other

• The following message says that 423 sink pins are divided into 27 clusters• The following message says that 423 sink pins are divided into 27 clusters, each with approximately 423/27 sink pins

CTS: gate level 2 clock tree synthesis...CTS: gate level 2 design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300]CTS: max capacitance = worst[0.300 0.300]CTS: max fanout = 2000CTS: gate level 2 target spec [rise fall]CTS: transition = worst[0.240 0.240]CTS: capacitance = worst[0.240 0.240]p [ ]CTS: driver cap. = worst[0.088 0.088]CTS: fanout = 32CTS: gate level 2 timing constraints...CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]

Before clustering After clustering

CTS: Starting clustering for bufbda with target load worst[0.240 0.240]CTS: Completed 423 to 27 clusteringCTS: BA: lp (1.520, 0.673): skew (0.149, 0.080) c(1.481, 0.198) viol(n y)CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS: Completed 27 to 4 clusteringCTS: BA: lp (0 673 0 597): skew (0 080 0 105) c(0 198 0 026) viol(n n)

One buffer level is added with each clustering

Represents DRCs (cap,trans)

© Synopsys 2012 28

CTS: BA: lp (0.673, 0.597): skew (0.080, 0.105) c(0.198, 0.026) viol(n n)CTS: -----------------------------------------------

y : violation presentn : no violation Skew (Before clustering, After clustering)

Page 29: Understanding cts log_messages

Clustering With Hookup Pins• Hookup pins are input pins of gates or macros

• Unlike clock pins of flip-flops and latches (sink pins), hookup pins have a nonzero phase delay that must be balanced with the sink pins

© Synopsys 2012 29

Page 30: Understanding cts log_messages

Initially the tool makes attempts to cluster hookup pins along with the normal sinks (trial

Clustering With Hookup Pins• Initially, the tool makes attempts to cluster hookup pins along with the normal sinks (trial

clustering)CTS: gate level 1 clock tree synthesis...CTS: gate level 1 design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300] In this example there are 479 sinksCTS: max capacitance = worst[0.300 0.300]CTS: max fanout = 2000CTS: gate level 1 target spec [rise fall]CTS: transition = worst[0.240 0.240]CTS: capacitance = worst[0.240 0.240]CTS: driver cap. = worst[0.150 0.150]CTS: fanout = 32

In this example, there are 479 sinks and 1 hookup pin

CTS: fanout 32CTS: gate level 1 timing constraints...CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS: Completed 480 to 34 clusteringCTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS C l t d 34 t 6 l t i

TrialclusteringCTS: Completed 34 to 6 clustering

CTS: BA: this delay [max min] (skew) = worst[0.000 0.000] (0.000)CTS: BA: next delay [max min] (skew) = worst[0.124 0.124] (0.000)CTS: BA: target cap = 0.070 pfCTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS: BA: CAC set: target cap = 0.070317: targetWireCap = 0.274866CTS: Completed 479 to 39 clustering

clustering

Actuall t iCTS: BA: lp (1.574, 0.770): skew (0.821, 0.451) c(1.737, 0.269) viol(n y)

CTS: -----------------------------------------------

• At the trial clustering stage, the hookup pin is considered along with the other sink pins and (479+1) to 34 to 6 clustering is obtained

• At the actual clustering stage the tool clusters the 479 sink pins separately from the hookup

clustering

© Synopsys 2012 30

• At the actual clustering stage, the tool clusters the 479 sink pins separately from the hookup pin

Page 31: Understanding cts log_messages

Clustering With Hookup Pins:Hookup Pin Clustered With Sinks

• If the trial clustering gives good QoR results, the following message shown in blue is displayed :

Hookup Pin Clustered With Sinks

blue is displayed :CTS: BA: lp (1.968, 2.031): skew (0.257, 0.194) c(0.076, 0.072) viol(y y)CTS: -----------------------------------------------CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxedCTS: Completed 2 to 2 clusteringCTS: Completed 2 to 2 clusteringCTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxedCTS: Completed 2 to 1 clusteringCTS: BA: this delay [max min] (skew) = worst[2.040 1.844] (0.196)CTS: BA: next delay [max min] (skew) = worst[2.161 1.965] (0.196)CTS: BA: next delay [max min] (skew) worst[2.161 1.965] (0.196)CTS: BA: target cap = 0.048 pfCTS: Pin 1: periph/U5659/A is selected for next levelCTS: delay [max min] (skew) = worst[1.976 1.921] (0.055)CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]CTS: Completed 2 to 2 clusteringp gCTS: BA: lp (2.031, 2.153): skew (0.194, 0.210) c(0.072, 0.026) viol(n n)CTS: -----------------------------------------------

• When the phase delay of the hookup pin periph/U5659/A matches with the delay of the already built tree at that gate level, it will be clustered at that buffer

© Synopsys 2012 31

y y g ,level.

Page 32: Understanding cts log_messages

Meeting Target Early Delay• After the synthesis of the root clock net (gate level 1 synthesis), the tool checks if the delay

constraint set by the user is being met or not.

• If it is not met, the tool inserts some buffers at the root clock net to achieve the target delay specified by the user.p y

• In the following message, 16 buffers are inserted at the root clock net to increase the delay from 0.569ns to 2ns, which is the user specified target.

CTS: gate level 1 clock tree synthesis CTS: clock net = sys clkC S: c oc et sys_cCTS: driving pin = sys_clkCTS: gate level 1 design rule constraints [rise fall]...CTS: gate level 1 target spec [rise fall]...CTS: gate level 1 timing constraints Constraint set by the userCTS: clock skew = worst[0.000]CTS: insertion delay = worst[2.000]CTS: levels per net = 200CTS: -----------------------------------------------CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]...CTS: -----------------------------------------------CTS: CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]CTS: Completed 19 to 2 clusteringCTS: BA: lp (0.563, 0.569): skew (0.142, 0.112) c(0.008, 0.008) viol(n n) CTS: -----------------------------------------------CTS: Inserting delay cells for clock tree sys_clk ...CTS: current delay = worst[0.569] worst[0.457]

© Synopsys 2012 32

CTS: constraint = worst[2.000] worst[0.000]CTS: inserted 16 (buffd3) delay cells to the clock net sys_clk

Page 33: Understanding cts log_messages

CTS: gate level 1 clock tree synthesis results

Synthesis Results of One Gate Level After the synthesis of aCTS: gate level 1 clock tree synthesis results

CTS: clock net : sdram_clkCTS: driving pin: sdram_clkCTS: load pins : 5 sink pins, 0 gates/macros pins, 0 ignore pinsCTS: buffer level 1: bufbd7 (1)CTS: buffer level 2: bufbd7 (1)de

lay

at th

edr

am_c

lk)

After the synthesis of a gate level, the results are printed in the log

CTS: clock tree skew = worst[0.036]CTS: longest path delay = worst[0.327](rise)CTS: shortest path delay = worst[0.291](rise)CTS: total capacitance = worst[0.389 0.389]CTS: buffer level phase delayCTS 1 (I) t[0 293]( i ) t[0 256]( i ) k t[0 036]d

inse

rtion

dn

A (h

ere

sd

Operating ConditionCTS: 1 (I): worst[0.293](rise), worst[0.256](rise); skew = worst[0.036]CTS: (O): worst[0.151](rise), worst[0.129](rise); skew = worst[0.022]CTS: 2 (I): worst[0.150](rise), worst[0.128](rise); skew = worst[0.022]CTS: (O): worst[0.004](rise), worst[0.000](rise); skew = worst[0.004]CTS: buffer level output transition delays [rise fall]CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]

Ske

w a

nddr

ivin

g pi

n

CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]CTS: load 0: worst[0.088 0.085] worst[0.088 0.085]CTS: level 1: worst[0.111 0.115] worst[0.091 0.092]CTS: load 1: worst[0.111 0.115] worst[0.091 0.092]CTS: level 2: worst[0.158 0.153] worst[0.080 0.071]CTS: load 2: worst[0.158 0.153] worst[0.080 0.071]CTS: buffer level total load capacitanceCTS: level 0: worst[0.045 0.045]CTS: level 1: worst[0.093 0.093]CTS: level 2: worst[0.251 0.251]CTS: drc violations: 0 0

21A CB

Load capacitance value is added and is

© Synopsys 2012 33

Load capacitance value is added and isreported as total capacitance of the subtreeNumber of cap

violationsNumber of trans violations

Page 34: Understanding cts log_messages

Maximum Transition and Capacitance Violations• After each gate level is synthesized, the maximum capacitance and

maximum transition violations at that gate level are reported

Violations

CTS: gate level 3 clock tree synthesis results...CTS: buffer level total load capacitance...CTS it i l ti i h/CTS 755CTS: capacitance violation on periph/CTS_755CTS: capacitance = worst[0.052 0.052]CTS: constraint = worst[0.050 0.050]CTS: capacitance violation on periph/CTS_757CTS: capacitance = worst[0.051 0.051]CTS: constraint = worst[0 050 0 050]CTS: constraint worst[0.050 0.050]...CTS: transition delay violation at periph/CLKBUFX20_G3B1I3/ACTS: transition delay = worst[0.052 0.050] worst[0.052 0.050]CTS: constraint = worst[0.050 0.050]CTS: transition delay violation at periph/CLKBUFX20_G3B2I14/ACTS: transition delay = worst[0.053 0.051] worst[0.053 0.051]CTS: constraint = worst[0.050 0.050]...CTS: drc violations: 18 5

Number of cap violations

Number of trans violations

© Synopsys 2012 34

violations violations

Page 35: Understanding cts log_messages

A More Complex Synthesis ResultsCTS: gate level 1 clock tree synthesis resultsCTS: clock net : clkCTS: driving pin: clkCTS: load pins : 80 sink pins, 0 gates/macros pins, 0 ignore pinsCTS: buffer level 1: CLKBUFX20 (1)CTS: buffer level 2: CLKBUFX20 (2) CLKBUFX12 (1)CTS: clock tree skew = worst[0.001]CTS: longest path delay = worst[0.248](rise)CTS: shortest path delay = worst[0.246](rise)CTS: total capacitance = worst[0.549 0.549]CTS: buffer level phase delayCTS: 1 (I): worst[0.247](rise), worst[0.246](rise); skew = worst[0.001]CTS: (O): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]CTS: 2 (I): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]CTS: (O): worst[0.001](rise), worst[0.000](rise); skew = worst[0.001]CTS: buffer level output transition delays [rise fall]CTS: level 0: worst[0.000 0.000] worst[0.000 0.000]CTS: load 0: worst[0.000 0.000] worst[0.000 0.000]CTS: level 1: worst[0.089 0.076] worst[0.089 0.076]CTS: load 1: worst[0.089 0.076] worst[0.089 0.076]CTS: level 2: worst[0.109 0.093] worst[0.104 0.091]CTS: load 2: worst[0.109 0.093] worst[0.104 0.091]CTS: buffer level total load capacitanceCTS: buffer level total load capacitanceCTS: level 0: worst[0.038 0.038]CTS: level 1: worst[0.108 0.108]CTS: level 2: worst[0.403 0.403]CTS: drc violations: 0 0

© Synopsys 2012 35

Page 36: Understanding cts log_messages

Gate Level and Buffer Level Nomenclature

21 21

) ate

leve

l 2

ate

leve

l 1

ate

leve

l 2

ate

leve

l 1

leve

l 3

e le

vel 2

leve

l 4

e le

vel 2

vel 1

so

urce

pin

evel

2

evel

1 o

f g

evel

2 o

f g

evel

2 o

f g

evel

1 o

f g

Buf

fer

of g

ate

Buf

fer

of g

ate

Gat

e le

v(C

lock

s

Gat

e Le

Buf

fer l

e

Buf

fer l

e

Buf

fer l

e

Buf

fer l

e

Red: Preexisting gates At each gate level, the clock tree is built

© Synopsys 2012 36

Black: CTS introduced gates bottom-up, but the buffer names are changed to appear top-down

Page 37: Understanding cts log_messages

DRC Violation Report After Synthesis• After building the complete clock tree, all the remaining DRC violations in

the entire clock tree gets reported in the log file:

CTS: Clock tree synthesis completed successfullyCTS: CPU time: 50 secondsCTS: Reporting clock tree violations ...CTS: Global design rules:CTS: maximum transition delay [rise,fall] = [0.05,0.05] CTS: maximum capacitance = 0.05 ConstraintsCTS: maximum fanout = 2000CTS: maximum buffer levels per net = 200 CTS: transition delay violation at sdram_clkCTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]CTS: constraint = worst[0.050 0.050]

Constraints

CTS: transition delay violation at CLKBUF_X20_G1B21I1/Z CTS: transition delay = worst[0.051 0.050] worst[0.051 0.050]CTS: constraint = worst[0.050 0.050]CTS: capacitance violation on CTS_6557CTS: capacitance = worst[0.074 0.074]

Reports only transitionand capacitance violationsp [ ]

CTS: constraint = worst[0.050 0.050]CTS: Summary of clock tree violations:CTS: Total number of transition violations = 2CTS: Total number of capacitance violations = 1

p

Total transition andcapacitance violations

© Synopsys 2012 37

Page 38: Understanding cts log_messages

Summary Report AfterClock Tree Synthesis

CTS: ------------------------------------------------CTS Cl k T S th i S

Clock Tree Synthesis

CTS: Clock Tree Synthesis SummaryCTS: ------------------------------------------------CTS: 5 clock domain synthesizedCTS: 30 gated clock nets synthesizedCTS: 26 buffer trees insertedCTS: 722 buffers used (total size = 45974.2)CTS: 752 clock nets total capacitance = worst[76.868 76.868]

Each gate level canh l i lhave multiple nets

© Synopsys 2012 38

Page 39: Understanding cts log_messages

Clock-by-Clock Summary• A summary is reported for each clock:

CTS: ------------------------------------------------CTS: Clock-by-Clock Summary Buffer tree is inserted

only if necessaryCTS: ------------------------------------------------CTS: Root clock net pclkCTS: 3 gated clock nets synthesizedCTS: 2 buffer trees inserted

only if necessary

CTS: 2 buffers used (total size = 159.667)CTS: 5 clock nets total capacitance = worst[0.514 0.514]CTS: clock tree skew = worst[0.341]CTS: longest path delay = worst[5.959](rise)CTS: longest path delay worst[5.959](rise)CTS: shortest path delay = worst[5.619](rise)CTS: Root clock net sys_clk...

© Synopsys 2012 39

Page 40: Understanding cts log_messages

Embedded Clock Tree Optimization• After clock tree synthesis, embedded clock tree optimization begins• The characteristics of the buffers and inverters used are reported again

CTS: buffer estimated skew target delay driving res input capCTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]...

• The global constraints for clock tree are also reported againCTS: Global design rule constraints [rise fall]

CTS: max transition = worst[0.050 0.050] GUI = worst[0.050 0.050] SDC = undefined/ignored...C S Gl b l i i / l k iCTS: Global timing/clock tree constraintsCTS: clock skew = worst[0.000]...CTS: Global target spec [rise fall]CTS: transition = worst[0.040 0.040] ...

Note: Embedded clock tree optimization is called only when the compile_clock_treecommand is used It is not called when the l k t command is used

© Synopsys 2012 40

command is used. It is not called when the clock_opt command is used

Page 41: Understanding cts log_messages

More Messages on Real Gates andGuide Buffers

• At the beginning of optimization, you might get the following

Guide Buffers

messages:CTS: Root clock net chip_sclk_srcCTS: clock gate levels = 75CTS: clock sink pins = 125896CTS: clock sink pins 125896...CTS: level 73: gates = 3 (real gates = 1)CTS: level 72: gates = 2 (no real gates, guide buffers only)

ff• All the gates are guide buffers and inverters inserted during clock tree synthesis.

• This information is similar to the one printed prior to clock tree h isynthesis.

© Synopsys 2012 41

Page 42: Understanding cts log_messages

Gate Level Optimization• The clock tree optimization is also done for each gate level

• Similar to when the clock tree is built

• Before optimizing a gate level, the current skew, longest path delay and shortest path delay from the driving pin of that gate level, is reported.

CTS: gate level 2 clock tree optimizationCTS: clock net = I_BLENDER_1/gclkCTS: driving pin = I_BLENDER_1/U483/ZCTS: clock tree skew = worst[0.517]CTS: longest path delay = worst[5.339](rise)CTS: shortest path delay = worst[4.822](fall)

• After which that gate level is optimized

© Synopsys 2012 42

Page 43: Understanding cts log_messages

Buffer Sizing

• The following message indicates that buffer sizing was successfulCTO-BS: Starting buffer sizing ...Information: Replaced the library cell of CLKBUF_X20_G2B2I1 from CLKBUF_X20 to CLKBUF_X16. (CTS-152)CTO-BS: CPU time = 0 seconds for buffer sizing

• Clock tree optimization will try to resize buffers, and improve skew and insertion delay. If it does not find it beneficial, then the original cell master will be restored.

CTO-BS: Starting buffer sizing ...CTO-BS: Restoring original cellMaster <CLKBUF_X20> of <CLKBUF_X20_G2B2I4>CTO-BS: CPU time = 1 seconds for buffer sizing

© Synopsys 2012 43

Page 44: Understanding cts log_messages

CTO-GS: Starting gate sizing ...

Gate SizingInformation: Replaced the library cell of I7188625 from TLQMUX2X60 to TULQMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I7586451 from TLTMUX2X60 to TLTMUX2X50. (CTS-152)Information: Replaced the library cell of I3342873 from TULTMUX2X50 to TLTMUX2ZSX60. (CTS-152)Information: Replaced the library cell of I1387108 from TULTMUX2X80 to TULTMUX2ZSX80. (CTS-152)...I f ti R l d th lib ll f I6717862 f THQMUX2ZSX80 t TSTMUX2ZSX20 (CTS 152)

14 cells sizedInformation: Replaced the library cell of I6717862 from THQMUX2ZSX80 to TSTMUX2ZSX20. (CTS-152)Information: Replaced the library cell of I9359863 from TLTMUX2ZSX80 to TULTMUX2ZSX60. (CTS-152)Information: Replaced the library cell of I10258160 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I7636259 from TLTMUX2ZFFX80 to TULTMUX2ZSX60. (CTS-152)CTO-GS: 1: Sized 14/40 cell instances (tested 40X247)CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] Summary of the first round of sizingy ( ) [ ] [ ]; [ ]CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO-GS: improvement = worst[0.106%]Information: Replaced the library cell of I2130284 from TLTMUX2X80 to TLTMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I8618764 from TLTMUX2ZFFX80 to TLTMUX2X80. (CTS-152)Information: Replaced the library cell of I1749911 from TULTMUX2ZFFFX80 to TULTMUX2ZFFX80. (CTS-152)

• Number of gate sized (Here 14 out of 40 gates)• Shows the improvement in skew

Information: Replaced the library cell of I3342873 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I8872989 from TULTMUX2ZFFFX60 to TLTMUX2ZFFX80. (CTS-152)Information: Replaced the library cell of I1387108 from TULTMUX2ZSX80 to TULTMUX2X50. (CTS-152)CTO-GS: 2: Sized 6/40 cell instances (tested 40X247)CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO GS: delay (to) = worst[9 104] worst[8 633]; skew = worst[0 471]CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO-GS: improvement = worst[0.000%]CTO-GS: Summary of cell sizingCTO-GS: Sized 20/40 cell instances (tested 80X247)CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]

Overall summary of gate sizing done at this gate level. Total 14+6 =20 gates sized giving an 0 106% i t i k t thi t l l

© Synopsys 2012 44

yCTO-GS: improvement = worst[0.106%]CTO-GS: CPU time = 2413 seconds for gate sizing

0.106% improvement in skew at this gate level

Page 45: Understanding cts log_messages

Gate Relocation

• Gate relocation works on preexisting gates.

• If you have no preexisting gates, you might see the following message:g

CTO-GR: gate relocation is skipped since there are no hookup pins

© Synopsys 2012 45

Page 46: Understanding cts log_messages

A Successful Gate Relocation

CTO-GR: Starting gate relocation ...CTO-GR: delay [max min] (skew) = worst[9.023 8.563] (0.460)

2 cells were tried at 47new locations, 1 was moved

CTO-GR: 1: Relocated 1/40 cell instances (tested 2 cell instances at 47 points)CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]CTO-GR: delay (to) = worst[9.023] worst[8.563]; skew = worst[0.460]CTO-GR: improvement = worst[0.000%]CTO GR d l [ i ] ( k ) t[9 018 8 563] (0 455)

Initial skewFinal skew

Improvement in skewCTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)CTO-GR: 2: Relocated 2/40 cell instances (tested 5 cell instances at 83 points)CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]y ( ) [ ] [ ] [ ]CTO-GR: improvement = worst[1.118%]CTO-GR: Summary of cell relocationCTO-GR: Relocated 3/40 cell instances (tested 7 cell instances at 130 points)CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] Overall summary of

t l ti t thiCTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]CTO-GR: improvement = worst[1.118%]CTO-GR: CPU time = 2 seconds for gate relocation

gate relocation at this gate level

© Synopsys 2012 46

Page 47: Understanding cts log_messages

Gate Relocation: Failed Attempts

CTO-GR: Starting gate relocation ...CTO-GR: Summary of cell relocationCTO-GR: Summary of cell relocationCTO-GR: Relocated 0/1 cell instances (tested 1 cell instances at 24 points)CTO-GR: delay (from) = worst[1.207] worst[0.980]; skew = worst[0.227]CTO-GR: delay (to) = worst[1.207] worst[0.980]; skew = worst[0.227]CTO-GR: improvement = worst[0.000%]CTO-GR: CPU time = 0 seconds for gate relocation

• In this example, clock tree optimization tried to move one gate instance to 24 different locations. Since the attempts did not improve the QoR, the gate relocation was abandoned

© Synopsys 2012 47

Page 48: Understanding cts log_messages

Buffer Relocation

• Buffer relocation is done on all clock tree synthesis inserted buffersCTO-BR: Buffer relocation ...CTO BR: Buffer relocation ...CTO-BR: Optimization level: netCTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)CTO-BR: 1: Relocated 1/6 cell instances (tested 6 cell instances at 74 points)CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]CTO-BR: improvement = worst[2.013%]CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)CTO-BR: 2: Relocated 1/6 cell instances (tested 5 cell instances at 62 points)CTO-BR: delay (from) = worst[9 087] worst[8 503]; skew = worst[0 584]CTO BR: delay (from) worst[9.087] worst[8.503]; skew worst[0.584]CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]CTO-BR: improvement = worst[0.000%]CTO-BR: Summary of cell relocationCTO-BR: Relocated 2/6 cell instances (tested 11 cell instances at 136 points)CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]CTO-BR: delay (to) = worst[9.099] worst[8.503]; skew = worst[0.584]CTO-BR: improvement = worst[2.013%]

CTO-BR: CPU time = 0 seconds for buffer relocation

Th i f i i i il l i

© Synopsys 2012 48

• The information is similar to gate relocation

Page 49: Understanding cts log_messages

• After the embedded clock tree optimization, the tool prints the summary.• It looks exactly similar to the summary printed after clock tree synthesis

Post Embedded Clock Tree Synthesis• It looks exactly similar to the summary printed after clock tree synthesis.CTS: ------------------------------------------------CTS: Clock Tree Optimization SummaryCTS: ------------------------------------------------CTS: 4 clock domain synthesizedCTS: 5 gated clock nets synthesizedCTS: 5 buffer trees insertedCTS: 1000 buffers used (total size = 16570 8)CTS: 1000 buffers used (total size = 16570.8)CTS: 1005 clock nets total capacitance = worst[14.010 14.010]CTS: ------------------------------------------------CTS: Clock-by-Clock SummaryCTS: ------------------------------------------------CTS: Root clock net sdram_clkCTS: 1 gated clock nets synthesizedCTS: 1 buffer trees insertedCTS: 1 buffer trees insertedCTS: 302 buffers used (total size = 5039.47)CTS: 303 clock nets total capacitance = worst[4.170 4.170]CTS: clock tree skew = worst[0.035]CTS: longest path delay = worst[2.041](rise)CTS: shortest path delay = worst[2.006](fall)CTS: Root clock net sys_2x_clk...

• After the summary, all the trans and cap violations on the clock tree are also reported.CTS: Global design rules:CTS: maximum transition delay [rise,fall] = [0.05,0.05]CTS: maximum capacitance = 0.05CTS: maximum fanout = 2000CTS: maximum buffer levels per net = 200CTS: transition delay violation at sdram_clkCTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]CTS: constraint = worst[0.050 0.050]CTS: transition delay violation at buffd2_G1B1I1/Z...CTS: Summary of clock tree violations:

© Synopsys 2012 49

CTS: Summary of clock tree violations: CTS: Total number of transition violations = 3994CTS: Total number of capacitance violations = 1

Page 50: Understanding cts log_messages

DRC Fixing Beyond Exceptions

• After embedded clock tree optimization, the tool will start fixing the DRC violations beyond exceptions.

• The messages are similar to clustering:CTS: fixing DRC beyond exception pins under clock CLK1

CTS: gate level 2 DRC fixing (exception level 1)CTS: clock net = CLK1_G1IPCTS: driving pin = bufbd2_G1IP_1/ZCTS: gate level 2 design rule constraints [rise fall]CTS: max transition = worst[0.100 0.100]CTS: max capacitance = worst[0.600 0.600]CTS: max fanout = 2000CTS: max fanout 2000CTS: -----------------------------------------------CTS: Starting clustering for bufbdf with target load = worst[0.056 0.056]CTS: Completed 4 to 1 clusteringCTS: -----------------------------------------------CTS: Starting clustering for bufbd7 with target load = worst[0.050 0.050]

1 1 iCTS: Completed 1 to 1 clusteringCTS: ------------------------------------------------

• After fixing the DRC violations, the whole summary and the clock-by-clock summary of DRC fixing beyond exceptions are reported.

© Synopsys 2012 50

by clock summary of DRC fixing beyond exceptions are reported.

Page 51: Understanding cts log_messages

Placement Legalization is CalledAfter Clock Tree Synthesis

• When clock tree synthesis places a clock tree buffer or inverter, it

After Clock Tree Synthesis

places it at a legal location, but the location might be occupied Causes overlaps which needs to be resolved

• The tool calls the placement legalizer which moves the cells to resolve the overlaps.

• After legalization, the cells with large displacement gets reported in the logLargest displacement cells:

Cell: periph/U122 (AND3X)Input location: (906.380 1597.520)Legal location: (897.140 1582.400)Displacement: 17 720 um e g 3 52 row height

1 of 6 cells thatwere displaced

Displacement: 17.720 um, e.g. 3.52 row height.Total 6 cells has large displacement (e.g. > 15.120 um or 3 row height)

© Synopsys 2012 51

Page 52: Understanding cts log_messages

Agenda

• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 52

Page 53: Understanding cts log_messages

The optimize_clock_tree Command Log File Messages

• Optimization options

Log File Messages

p p• Report before optimization• Optimization• Report after optimization

© Synopsys 2012 53

Page 54: Understanding cts log_messages

Standalone Optimization Using the optimize clock tree Command

• Standalone optimization differs from embedded optimization in the

optimize_clock_tree Command

algorithms used

• Some of the log messages are similar to those of when you use the g g ycompile_clock_tree command Design update information Buffer characterizationBuffer characterization Pruning of cells List of cells used for clock tree optimization

© Synopsys 2012 54

Page 55: Understanding cts log_messages

CTS-352 Warning

• The default delay calculation engine is Elmore. Elmore delay calculation might lead to inferior accuracy in skew and latency estimation.

• Enable the Arnoldi delay calculation engine for more accurate delay y g ycalculation during optimization, by using the following command:

set_delay_calculation –clock_arnoldi

• Otherwise, the optimize_clock_tree command will issue the following warning:Warning: set_delay_calculation is currently set to 'elmore'.

'clock arnoldi' is suggested (CTS 352)'clock_arnoldi' is suggested. (CTS-352)

© Synopsys 2012 55

Page 56: Understanding cts log_messages

Optimization Options

• Before starting optimization, the optimize_clock_treed h i d h i i i i f hcommand reports the root pin and the optimization options for each

clock.• The following are the options which you have specified, by using the

set clock tree optimization options commandset_clock_tree_optimization_options command

Initializing parameters for clock CLK2GC:Root pin: instCLK2GC/QRoot pin: instCLK2GC/QUsing the following optimization options:

gate sizing : ongate relocation : onpreserve levels : offarea recovery : onrelax insertion delay : offbalance rc : off

© Synopsys 2012 56

balance rc : off

Page 57: Understanding cts log_messages

Preoptimization Report• Before the tool begins to optimize the clock tree, it reports some of

the current characteristics of the clock tree:****************************************** Preoptimization report (clock 'CLK3') * Clock name* Preoptimization report (clock CLK3 ) ******************************************

Corner max'Estimated Skew (r/f/b) = (0.073 0.000 0.073)Estimated Insertion Delay (r/f/b) = (1.903 -inf 1.903)

Corner 'RC-ONLY'

Clock nameCTS corner

The starting skew and ID for the clock as seen by CTO

Estimated Skew (r/f/b) = (0.005 0.000 0.005)Estimated Insertion Delay (r/f/b) = (0.008 -inf 0.008)

Wire capacitance = 0.8 pfTotal capacitance = 2.3 pfMax transition = 0.448 ns

CTO

Maximum transition value present in the clock tree

Cells = 24 (area=67.500000)Buffers = 23 (area=67.500000)Buffer Types============

bufbd2: 1bufbdf: 8

p

Information about the buffers and inverters

t i th l k tbufbdf: 8bufbd7: 5bufbd4: 3bufbd1: 6

present in the clock tree

© Synopsys 2012 57

Page 58: Understanding cts log_messages

Optimization Messages • During optimization, the tool prints out messages for sizing, insertion

and removal, and switching of metal layers:

Deleting cell I_SDRAM_TOP/bufbda_G1B1I10 and output net I_SDRAM_TOP/sdram_clk_G1B1I10.iteration 1: (0.314104, 3.328620)Total 1 buffers removed on clock CLK3Start (3.256, 3.527), End (3.015, 3.329)

Buffer RemovalStart (sp, lp) : Initial delays

(skew, ID)

....iteration 2: (0.313991, 3.314841)iteration 3: (0.308073, 3.295621)Total 2 cells sized on clock CLK3Start (3 015, 3 329), End (2 988, 3 296) Cell Sizing

Start (sp, lp) : Initial delaysEnd (sp, lp) : Final delayssp: shortest path delaylp: longest path delay

Start (3.015, 3.329), End (2.988, 3.296)....iteration 6: (0.305181, 3.275623)Total 1 delay buffers added on clock sck_in12 (LP)Start (2.975, 3.283), End (2.970, 3.276) Buffer Insertion....Switch to low metal layer for clock ‘CLK3':Total 9 out of 13 nets switched to low metal layer for clock ‘CLK3' with largest cap change 0.00 percent

© Synopsys 2012 58

Metal layer switching

Page 59: Understanding cts log_messages

Optimization Messages

• If area recovery option is enabled, the tool does area recovery after optimizing each clock and reports the changes made to that clock:optimizing each clock, and reports the changes made to that clock:

Area recovery optimization for clock ‘CLK3':15% 23% 30% 46% 53% 61% 76% 84% 92% 100%Deleting cell cell I_SDRAM_TOP/bufbda_G1B1I9 and output net I_SDRAM_TOP/sdram_clk_G1B1I9.

Total 1 buffers removed (all paths) for clock ‘CLK3'

© Synopsys 2012 59

Page 60: Understanding cts log_messages

• After completing the optimization of a clock, the tool reports the new Post Optimization Report

p g p , pcharacteristics of the clock tree.

• This is similar to the information printed in before optimization:*************************************************** Multicorner optimization report (clock 'CLK3') ***************************************************

Corner ‘max'Estimated Skew (r/f/b) = (0.041 0.000 0.041)E ti t d I ti D l ( /f/b) (1 725 i f 1 725)Estimated Insertion Delay (r/f/b) = (1.725 -inf 1.725)

Corner 'RC-ONLY'Estimated Skew (r/f/b) = (0.007 0.000 0.007)Estimated Insertion Delay (r/f/b) = (0.009 -inf 0.009)

Wire capacitance = 0.8 pfTotal capacitance = 2.3 pfMax transition = 0.356 nsCells = 24 (area=59.000000)Buffers = 23 (area=59.000000)Buffer TypesBuffer Types============bufbd7: 4bufbdf: 6bufbd4: 5

© Synopsys 2012 60

bufbd1: 7bufbd2: 1

Page 61: Understanding cts log_messages

Reporting the Longest and Shortest Paths

• The longest and shortest paths corresponding to all corners are reported, soon after the post optimization report:

++ Longest path for clock CLK3 in corner 'max':object fan cap trn inc arr r locationclk3 (port) 32 0 0 r ( 440 748)clk3 (net) 13 97…I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_3__8_/CP (senrq1)

167 4 289 r ( 521 520)

++ Shortest path for clock CLK3 in corner 'max':object fan cap trn inc arr r locationobject fan cap trn inc arr r locationclk3 (port) 32 0 0 r ( 440 748)clk3(net) 13 97…I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_4__11_/CP (senrq1)

217 4 247 r ( 687 656)217 4 247 r ( 687 656)

• Placement legalization related messages are located at the end of the optimize_clock_tree command log

© Synopsys 2012 61

Page 62: Understanding cts log_messages

Thank you

© Synopsys 2012 62

Page 63: Understanding cts log_messages

© Synopsys 2012 63