Page 1
Motion Estimation Processor for Last Generation
Mobile Devices
Nuno Carlos Andr e Sebasti ao
Dissertacao para obtencao do Grau de Mestre em
Engenharia Electrot ecnica e de Computadores
JuriPresidente: Doutor Jose Antonio Beltran GeraldOrientador: Doutor Paulo Ferreira Godinho FloresVogais: Doutor Leonel Augusto Pires Seabra de Sousa
Mestre Nuno Filipe Valentim Roma
Outubro 2007
Page 3
Acknowledgments
I could not have accomplished this project without the help and support of some people to
whom I want to express my gratitude. First of all, my supervisor, Prof. Paulo Flores, for all of
his work, help and suggestions, that made this project possible, and also for the time spent into
reviewing this dissertation. I also want to thank my co-supervisor, MSc Nuno Roma, for his time,
encouragement and suggestions throughout this project and on reviewing this dissertation.
I also want to thank MSc Tiago Dias for his support during the project, and all of the remaining
researchers of the SIPS group at INESC-ID, Lisboa, that in someway or another helped me in
performing my work.
Finally, I want to thank my family for all of their support and understanding throughout my entire
life, but especially in these last few months.
Page 5
Agradecimentos
Nao teria conseguido realizar este projecto sem a ajuda e apoio de algumas pessoas, as
quais quero expressar a minha gratidao. Em primeiro lugar, ao meu orientador, Prof. Paulo
Flores, pelo seu trabalho, ajuda e sugestoes, que tornaram este projecto possıvel, e ainda pelo
tempo dispendido a rever esta dissertacao. Quero tambem agradecer ao meu co-orientador,
Mestre Nuno Roma, pelo tempo, encorajamento e sugestoes durante o decorrer deste projecto e
ao rever esta dissertacao.
Quero tambem agradecer ao Mestre Tiago Dias pelo seu apoio durante este projecto, e a
todos os restantes investigadores do grupo SIPS no INESC-ID, Lisboa, que de alguma forma me
ajudaram a desempenhar o meu trabalho.
Por fim, quero agradecer a minha famılia por todo o apoio e compreensao ao longo da minha
vida, mas especialmente ao longo destes ultimos meses.
iii
Page 7
Abstract
The use of video coding in battery-supplied platforms has lead to the development of efficient
low-power video coding systems. Moreover, the most computationally expensive part of video
coding is motion estimation. Therefore, an efficient processor for motion estimation (the Adaptive
Motion Estimation Processor (AMEP)) was previously proposed and implemented in an FPGA.
This dissertation focus on the Application Specific Integrated Circuit (ASIC) implementation of the
AMEP processor and the required design changes so it can be efficiently tested after manufac-
turing. The processor was described using the VHDL language and implemented in the UMC
CMOS 0.18µm 1P6M technology process using a standard cell library from Faraday Technology
Corporation.
Dedicated test structures were added to the circuit to allow the verification of the manufactured
circuit. A dedicated test controller, to efficiently test the included memories in the processor, was
developed and implemented. Two scan chains were built, to provide an efficient way to input test
vectors to verify the correct operation of the circuit’s internal logic. The IEEE 1149.1 standard
(JTAG) was also implemented to allow the test of the circuit’s interconnections when it is included
in a board and to provide a standard interface for circuit testing. In particular, it allows the control
of the internal memory Built-In Self Test (BIST) controller that was developed.
The layout of the circuit was obtained using EDA tools for the tasks of synthesis, placement,
routing, clock tree generation, power planning and others that are required to achieve a fully
functional layout that implements the function described in the VHDL source. Analysis of the
obtained layout indicated that the AMEP is able to work at a maximum clock frequency of 100MHz
consuming only 14.5mW which makes it suitable for motion estimation in battery-supplied devices
Keywords
Motion Estimation Dedicated Processor; Application Specific Intgerated Circuit (ASIC) Imple-
mentation; Standard Cell Library; Design for Test (DFT) Techniques; Memory Test (BIST)
v
Page 9
Resumo
A utilizacao de codificacao de vıdeo em dispositvos alimentados por baterias tem levado ao
desenvolvimento de sistemas eficientes de codificacao de vıdeo de baixo consumo. Adicional-
mente, a parte computacional mais exigente da codificacao de vıdeo e a estimacao de movi-
mento. Por esse motivo, um processador eficiente para estimacao de movimento (“AMEP”)
foi anteriormente proposto e implementado numa FPGA. Esta dissertacao incide sobre a
implementacao em ASIC deste processador e nas necessarias alteracoes para permitir que seja
eficientemente testado depois da sua fabricacao. O processador foi descrito usando a linguagem
VHDL e implementado na tecnologia CMOS 0.18µm 1P6M da UMC usando uma biblioteca de
celulas padrao da Faraday Technology Corporation.
Estruturas dedicadas para teste foram adicionadas ao circuito para permitirem a verificacao
do circuito fabricado. Um controlador de teste dedicado, para testar eficientemente as memorias
incluıdas no processador, foi desenvolvido e implementado. Duas cadeias de “scan” foram con-
struıdas, por forma a permitirem um processo eficiente para introduzir os vectores de teste e
verificar o correcto funcionamento da logica interna do circuito. A norma IEEE 1149.1 (JTAG)
foi tambem implementada para permitir o teste das interligacoes quando este circuito estiver
integrado numa placa de sistema e para proporcionar um interface padronizado para teste de cir-
cuitos. Em particular, permite comandar os controladores de teste interno (“BIST”) das memorias,
que foram desenvolvidos.
O desenho final do circuito foi obtido usando ferramentas EDA para as tarefas de sıntese,
colocacao, encaminhamento, geracao da arvore de relogio, planeamento da alimentacao e outras
que sao necessarias para obter um desenho totalmente funcional que implementa as funcoes
descritas no VHDL. A analise do desenho obtido indicou que o AMEP consegue funcionar a uma
frequencia de relogio de 100MHz consumindo apenas 14.5mW o que o torna adequado para a
estimacao de movimento em dispositivos alimentados por baterias.
Palavras Chave
Processador Dedicado para Estimacao de Movimento; Implementacao em ASIC; Biblioteca
de celulas padrao; Tecnicas de Projecto para Teste; Teste de Memorias
vii
Page 11
Contents
1 Introduction 1
1.1 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Processor Arquitecture 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Motion Estimation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Design for Test 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Circuit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Automatic Test Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Observability and Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Scan Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 JTAG Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7 Memory Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 ASIC Design 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Foundry and Technology Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Library and Technology Characterization . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.1 Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6 Pin Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 FrontEnd - From Behavioral VHDL code to Verilog netlist 41
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Design Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 DFT Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.3 BSD Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.4 TetraMAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
ix
Page 12
Contents
5.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Basic workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.2 Workflow with insertion of scan chains . . . . . . . . . . . . . . . . . . . . . 50
5.3.3 Workflow with JTAG insertion . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3.4 Workflow for test generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6 BackEnd - From Verilog netlist to GDS Layout 59
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.1 First Encounter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.2 NanoRoute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7 Results 69
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8 Conclusions 73
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A VHDL Code 79
A.1 Memory Test Controller VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B Scripts and Configuration Files 93
B.1 Synopsys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.1.1 Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.1.2 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.1.2.A Design Compiler Script file . . . . . . . . . . . . . . . . . . . . . . 94
B.1.2.B Tetramax Script file . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B.2 Cadence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
B.2.1 Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
B.2.1.A Configuration file for importing the design to Encounter . . . . . . 99
B.2.1.B I/O assignment file . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2.1.C Clock Tree Synthesis configuration file . . . . . . . . . . . . . . . . 101
B.2.2 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
x
Page 13
List of Figures
2.1 Composition of a macroblock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Current and previous frames used in motion estimation. . . . . . . . . . . . . . . . 8
2.3 AMEP Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 AMEP external interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Video coding platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 D type Flip-Flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 JTAG Basic Boundary Scan Cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 JTAG Boundary Shift Register and TAP controller connections. . . . . . . . . . . . 21
3.4 TAP state machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Implemented March Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Simplified Memory BIST Controller architecture. . . . . . . . . . . . . . . . . . . . 25
4.1 Generic workflow for ASIC design. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 I/O cell and pad combinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Bonding pad layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Power rings for I/O buffers and core cells. . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Memory interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6 Diagram of I/O cells position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1 Top level design structure required by BSD Compiler. . . . . . . . . . . . . . . . . . 46
5.2 Synopsys Basic Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Synopsys Workflow with scan structures. . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 AMEP interface after inserting scan chains. . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Synopsys JTAG Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 AMEP interface for JTAG insertion by BSD Compiler. . . . . . . . . . . . . . . . . . 55
5.7 Synopsys TetraMAX ATPG Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1 Design flow for Encounter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Die block size, floorplan and core size. . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.1 AMEP chip layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
xi
Page 14
List of Figures
xii
Page 15
List of Tables
2.1 AMEP Instruction Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 AMEP Instruction Set Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1 Faraday’s FSA0A C Standard Cell Library General Characteristics. . . . . . . . . . 33
4.2 I/O cell dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 Cost Function default priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1 Cadence tools versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.1 Results from synthesis tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.2 Layout results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.3 Power analysis results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xiii
Page 16
List of Tables
xiv
Page 17
List of Acronyms
AGU : Address Generation Unit
AMEP : Adaptive Motion Estimation Processor
ASIC : Application Specific Integrated Circuit
ASIP : Application Specific Instruction Set Processor
ATE : Automatic Test Equipment
ATPG : Automatic Test Pattern Generation
BIST : Built-In Self Test
BSDL : Boundary Scan Description Language
BSR : Boundary Scan Register
CAD : Computer Aided Design
CLCC : Ceramic Leadless Chip Carrier
CMOS : Complementary Metal Oxide Semiconductor
CTS : Clock Tree Synthesis
DFT : Design For Test
DRC : Design Rule Check
ECO : Engineering Change Order
EDA : Electronic Design Automation
ESD : Electrostatic Discharge
FAN : Fanout Oriented
FPGA : Field Programmable Gate Array
FSBM : Full Search Block Matching
GDS : Graphic Data System
GTECH : Generic Technology
GUI : Graphical User Interface
xv
Page 18
List of Acronyms
HDL : Hardware Description Language
IC : Integrated Circuit
IDDQ : quiescent supply current
IEEE : Institute of Electrical and Electronics Engineers
ISA : Instruction Set Architecture
I/O : Input/Output
JTAG : Joint Test Action Group
LEF : Library Exchange Format
LSSD : Level Sensitive Scan Design
LVS : Layout versus Schematic
ME : Motion Estimation
MPEG : Moving Picture Experts Group
MVFAST : Motion Vector Field Adaptive Search Technique
PIOS : Programmable I/O on Silicon
PODEM : Path Oriented Decision Making
P&R : Place and Route
PCB : Printed Circuit Board
RAM : Random Access Memory
RAPS : Random Path Sensitization
RTL : Register Transfer Level
RISC : Reduced Instruction Set Computer
SRAM : Static RAM
SAD : Sum of Absolute Differences
SADU : SAD Unit
SDC : Synopsys Design Contraints
SDF : Standard Delay File
SSF : Single Stuck-at Fault
STIL : Standard Test Interface Language
TAP : Test Access Port
TCK : Test Clock input
xvi
Page 19
List of Acronyms
TDI : Test Data Input
TDO : Test Data Output
TMS : Test Mode Select
TRST : Test Reset
VHDL : VHSIC Hardware Description Language
VHSIC : Very High Speed Integrated Circuit
VLSI : Very Large Scale Integration
xvii
Page 20
List of Acronyms
xviii
Page 21
1Introduction
Contents1.1 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1
Page 22
1. Introduction
The demand on consumer products to include realtime video communications and video
recording capabilities has been increasing over the last years. This trend is mainly possible due
to the use of digital video. However, using uncompressed digital video would not allow these
capabilities to exist in consumer products due to the requirements of high bandwith and storage
space. Video coding and compression techniques are essential to reduce the bandwidth and
space requirements of video communication and video storage. As a consequence, video coding
systems have been assuming an increasingly important role in personal communications, wire-
less multimedia and remote video-surveillance. Meanwhile, the MPEG-4 and the H.264 video
coding standards were established, to face these requirements in terms of image quality and
bandwidth. However, with such a wide range of target applications imposing quite different con-
straints, such as power consumption or computational resources, the specific implementation of
these standards has been carried out either through pure-software, pure hardware or a mixture of
both. As an example, the low-power constraints are mandatory requirements in battery-supplied
portable devices, such as 3G mobile phones, PDA’s and remote assistance devices.
The high compression rates involved in these new technologies impose the use of prediction
techniques to minimize temporal redundancy. In particular, the motion compensation prediction
mechanism, constructs a prediction of the current frame by using the blocks from previous frames.
Basically, a block of the current picture is predicted by translating a block, from the previous im-
age, by a given motion vector. Motion estimation is the process by which this motion vector is
determined. This is the most computationally expensive part in most of the current compres-
sion formats and the use of highly optimized and dedicated hardware, to determine the motion
vector in battery supplied platforms, is often necessary [1]. Such dedicated hardware structures
usually play the role of a co-processor, that is tightly interconnected with the main video coding
system. Furthermore, not only should such co-processor allow an efficient way of controlling the
power consumption but should be flexible enough to allow the implementation of most present
and upcoming block matching algorithms. In [1] it was proposed one efficient architecture of such
co-processors, the Adaptive Motion Estimation Processor (AMEP), specially optimized for the
implementation of fast block-matching or even data-adaptive motion estimation algorithms.
A first prototype of the proposed motion estimation processor was implemented using a Field
Programmable Gate Array (FPGA) [1], to prove the processor functionality and validate its archi-
tecture. An ASIC based on this architecture is going to be implemented, to demonstrate not only
that it is able to efficiently perform motion estimation but it is also suitable for battery-supplied
platforms.
The main objective of this work is the implementation of the AMEP in an ASIC, using a stan-
dard cell library based on the UMC CMOS 0.18µm 1P6M technology process. In this dissertation
all the steps given to reach a final GDSII description of the circuit layout are described. Afterwards,
this layout will be sent for an ASIC foundry to be manufactured.
2
Page 23
1.1 Dissertation Outline
In this work, special attention was given, in the design phase, to the test capabilities of the
circuit after being manufactured. For an implementation in a FPGA device such test procedures
are unnecessary, due to the nature of the device.
However, the implementation of the desired circuit in an ASIC requires the manufactured circuit
to be submitted to a set of test procedures, in order to validate the correct manufacture of the
whole chip. Most often, inserting hardware structures dedicated to test is required, to improve the
testability of the circuit. This might require a change of the architecture (to explicitly include these
structures), the use of the synthesis tool to automatically insert them or both. Furthermore, the
test procedures and the inserted test structures can also be used to validate the correct design of
the circuit and possibly helping in determining any design flaws. This is particularly helpful while
the design is in the prototyping stage.
The description of the circuit was done using the VHSIC Hardware Description Language
(VHDL), allowing the description of this complex system in a technology independent way using
a programming-like language. The synthesis process will then translate this description into a
technology-dependent netlist, that implements the circuit’s function using the standard cells avail-
able in the library. This netlist represents the connections between the several standard cells, at
a logical level.
After having the representation of the circuit’s function in a gate-level technology-dependent
netlist, it is possible to initiate the physical implementation of the circuit. The first phase is the
placement, which consists on determining the actual location of each of the used standard cells in
a bidimensional floorplan. The second phase is routing, which consists on interconnecting all of
the cells inputs and outputs using the available metal layers, according to the generated gate-level
netlist.
A completely routed design should then pass the Layout versus Schematic (LVS) and Design
Rule Check (DRC) verifications to ensure the connections done in the routing phase correspond
to those in the netlist and that the design complies with the design rules set by the foundry where
the Integrated Circuit (IC) will be manufactured. The routed design (which includes the connec-
tion’s layout) is then merged with the used cells layout, to obtain the required information for
manufacture. This information is then used to produce the required masks for IC manufacture.
1.1 Dissertation Outline
This dissertation is organized in eight chapters and three appendixes. Besides this introduc-
tion, Chapter 2 describes the processor that will be manufactured and its architecture. It also
summarizes some algortihms that may be programmed to implement the motion estimation.
The description of generic test structures, of the design for test techniques and of the test
pattern generation process is done in Chapter 3. More detailed description about the particular
3
Page 24
1. Introduction
implementation of these structures and the adopted techniques are discussed in the chapter that
covers the frontend phase (Chapter 5). Memory testing is also addressed in this chapter, as well
as the description of the architecture of the specifically designed memory BIST controller.
The description of a generic workflow for the ASIC implementation is presented in Chapter 4.
It starts by explaining the main motivations for choosing a given technology and by characterizing
the adopted technology and the standard cell library that were actually selected, including the
available memory devices. Some constraints imposed by the adopted technology and standard
cell library are also discussed. Some of these constraints will restrict the available options during
the design implementation.
Chapter 5 describes the frontend stage taken for the AMEP circuit. It shows the synthesis
process and describes the tool’s capabilities and the used workflow. The technological constraints
and design options that were taken into account regarding the test structures are also explained
in this chapter.
The backend stage is described in Chapter 6, including the used tool’s capabilities and the
followed workflow. Some generic options and techniques that are referred in Chapter 4 are par-
ticularized for the implementation of the AMEP circuit.
In Chapter 7 the results of this work with the final layout are presented. It also presents the
results concerning the simulation-based timing and power consumption values of the different
implementations, to assess the validity of the included test structures.
Chapter 8 states the main conclusions of this work and addresses possible trends for future
work.
The developed VHDL code for the designed memory BIST controller is presented in Ap-
pendix A. In Appendix B all the command files used in the several tools are presented.
4
Page 25
2Processor Arquitecture
Contents2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Motion Estimation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1
5
Page 26
2. Processor Arquitecture
2.1 Introduction
Motion estimation is a fundamental operation in video encoding, to exploit temporal correlation
in sequences of images. It is, however, the most computationally costly part of video encoding
systems. With the increasing demand of video encoding in portable battery supplied devices,
the use of dedicated hardware, with low power consumption, to achieve the most computationally
costly part of video encoding is often necessary [2].
Several algorithms can be used to implement the Motion Estimation (ME) procedure. The Full
Search Block Matching (FSBM) algorithm provides the optimal solution but it is also the most
computationally expensive. Nevertheless, other non-optimum, faster and adaptive algorithms ex-
ist. These algorithms, like the Motion Vector Field Adaptive Search Technique (MVFAST), exploit
the temporal correlations by considering information about past motion vectors and previously
computed error values in order to predict and adapt the actual search space.
To efficiently implement these adaptive ME algorithms a new Application Specific Instruction
Set Processor (ASIP) was proposed in [2]. A minimum and specialized instruction set, specific for
ME and composed of only eight different instructions, was defined. To support this instruction set,
a simple and efficient micro-architecture was designed and implemented [2].
2.2 Motion Estimation
A video stream can be defined as a sequence of bidimensional images, ordered in time, rep-
resenting motion scenes. Each image is composed by a set of picture elements (pixels) with
discrete intensity levels. The image pixels are distributed in a rectangular matrix. In a colored
video stream, each image is composed of three components: Red, Green and Blue (RGB). Due
to being highly correlated, most of the video coding systems transform this RGB color space in
a less correlated space: a luminance component (Y ) and two color-difference or chrominance
components (Cb and Cr). Usually these components are processed as three independent im-
ages [3]. Furthermore, the chrominance components are subsampled in relation to the luminance
component because the human eye is less sensitive to color. This is a technique often applied in
digital image compression and allows the use of less bits to represent a given image.
In the majority of current video compression standards, the image is divided into blocks. A
macroblock is defined as the fundamental unit of information for motion compensation and con-
sists of a 16 X 16 matrix of luminance (Y ) pixels (4 blocks of 8 X 8 pixels) and two matrixes of
chrominance (Cb and Cr) pixels. The number of chrominance pixels (CbCr) varies according to
the chrominance pixel structure defined in the video sequence header and usually has three pos-
sible formats: the 4:2:0, 4:2:2 and 4:4:4 formats. In the 4:2:0 subsampling format, the resolution of
the chroma components is half of the luminance resolution in both the horizontal and the vertical
dimensions (4 Y blocks, 1 Cr block and 1 Cb block). In the 4:2:2 format, the chroma components
6
Page 27
2.3 Motion Estimation Algorithms
have the same vertical resolution of the luminance component, but the horizontal resolution is
halved (4 Y blocks, 2 Cr blocks and 2 Cb blocks). In the 4:4:4 format all components have identi-
cal resolutions (4 Y blocks, 4 Cr blocks and 4 Cb blocks) [4]. Figure 2.1 shows the composition of
the macroblocks in these different formats [4].
Y Cb Cr
16
168 8
88
(a) 4:2:0 format.
Y Cb Cr
16
16
8 8
1616
(b) 4:2:2 format.
Y Cb Cr
16
16
16 16
1616
(c) 4:4:4 format.
Figure 2.1: Composition of a macroblock.
To achieve a compression ratio in video streams as high as possible, the time correlation
between images is exploited. A common technique that is used to exploit time correlation is motion
compensation, which uses past (and in some cases also the “future”) image’s macroblocks, as
well as the calculated motion vectors, to construct a prediction of the current image. Since the
information contained in a motion vector is far less than the information required to encode a
macroblock, the compression ratio is higher when such technique is used.
Despite the high compression ratios provided by motion compensation, the corresponding
increase in computational effort is significant. In fact, to find the current macroblock’s motion
vector, a search procedure (within a defined search area) must be carried out in another (past or
future) image area, to find the best matching candidate macroblock. This involves the calculation
of a distortion measure for every candidate macroblock of the search area. To accomplish this,
block matching algorithms are usually applied to find the best match for each macroblock in a
reference frame, according to a search algorithm and a given distortion measure.
Nevertheless, Motion Estimation is a computationally expensive task. It can take more than
80% of the operations required to implement a MPEG-4 video encoder [2]. Although general pur-
pose processors can be used to accomplish this task, they tend to be very inefficient, especially
in battery supplied devices, where high power consumption is not supported. As a consequence,
the use of a specialized processor to efficiently implement the ME algorithms with low power
consumption is advisable in such environments.
2.3 Motion Estimation Algorithms
Several Motion Estimation algorithms have been proposed in the literature [4]. They try to find
the best match for each macroblock in a reference frame according to a search algorithm and a
given distortion measure. Most of the algorithms that have been proposed use the Sum of Abso-
7
Page 28
2. Processor Arquitecture
lute Differences (SAD) as the distortion measure [1]. In figure 2.2 it is represented the reference
macroblock in the current frame and the corresponding search area, for the block matching al-
gorithm, in a previous frame. The best candidate macroblock is also represented in the previous
frame, as well as the respective motion vector.
motion vector
search area
(a) Previous frame. (b) Current frame.
Figure 2.2: Current and previous frames used in motion estimation.
The optimum Full Search Block Matching (FSBM) algorithm is an exhaustive search algorithm
that obtains the best match for a given candidate block within a search area, by examining all
possible displaced candidates within that search area. Nevertheless, it requires a large amount of
computations, which makes it difficult to implement in most real-time portable or battery supplied
encoding systems.
Meanwhile, faster, sub-optimum algorithms have also been proposed. These algorithms re-
duce the search space by guiding the search pattern according to general characteristics of the
motion, as well as the computed values for distortion. These algorithms can be grouped into two
main classes: regular search pattern algorithms, that treat each macroblock independently as-
suming that the distortion decreases monotonically as the search moves towards the best match
direction; and algorithms that also exploit interblock correlations, both in space and time, to adapt
the search patterns. The three step search, the four step search and the diamond search algo-
rithms are examples of fast regular search pattern algorithms. These algorithms have a predeter-
mined possible sequence of locations that are considered along the search procedure. Adaptive
algorithms, like the MVFAST, potentially use information from adjacent macroblocks to obtain an
initial prediction of the motion vector.
The sub-optimum algorithms require much less computations than the FSBM [1], which makes
them particularly well suited for low-power applications. Among these, the data-adaptive algo-
rithms usually provide the best performance, not only in terms of the involved amount of compu-
tations, but also in what concerns the provided performance levels, both in terms of video quality
and bit rate [1]. Consequently, the use of data-adaptive algorithms with dedicated hardware struc-
8
Page 29
2.4 Instruction Set
tures, for implementing these ME algorithms, is often the best option for use in battery-supplied
devices.
2.4 Instruction Set
The designed ASIP to implement data-adaptive ME algorithms is characterized by a special-
ized data-path and a minimum and optimized instruction set to meet the requirements of most
ME algorithms, including adaptive ones [2]. In fact, this AMEP follows a Reduced Instruction Set
Computer (RISC) philosophy and has the instruction set shown in table 2.1.
Table 2.1: AMEP Instruction Set.
Instruction category Instruction DescriptionControl J JumpRegister data transfer MOVR Move register to registerRegister data transfer MOVC Move immediate to registerArithmetic DIV2 Integer division by 2Arithmetic ADD Add two register valuesArithmetic SUB Subtract two register valuesGraphics SAD16 Compute sum of absolute differencesMemory data transfer LD Load local memory with pixel data
This specialized instruction set enables the processor to implement several ME algorithms. It is
composed of only eight instructions, with the appropriate encoding, that enables the determination
of the motion vectors with low power consumption and with a small implementation area [2].
The Instruction Set Architecture (ISA) is based on a register-register architecture, due to its
simplicity and efficiency, and its reduced number of operations focuses the most widely executed
instructions in ME algorithms. The AMEP register file consists of 24 general purpose registers
and eight special purpose registers capable of storing one 16-bit word each [2].
The operations supported by the AMEP ISA are divided into five categories, as shown in
table 2.1. These instructions are encoded using a 16-bit fixed-format, according to table 2.2 [1, 2].
Each instruction has an opcode and up to three operands, depending on the instruction’s category.
Table 2.2: AMEP Instruction Set Architecture.
Instruction 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0LD 000 t -J 001 cc - Address
MOVR 010 Rd - RsMOVC 011 t Rd ConstantSAD16 100 - Rd Rs1 Rs2DIV2 101 - Rd Rs1 -ADD 110 - Rd Rs1 Rs2SUB 111 - Rd Rs1 Rs2
The memory data transfer operation, LD, allows loading of macroblock and search area pix-
9
Page 30
2. Processor Arquitecture
els into the corresponding local memories. The loading itself is performed independently of the
instruction, using a special unit for address generation. Whether it is the macroblock or search
area data that should be loaded is defined by the 1-bit control field of the LD operation. The jump
control operation, J, allows a change in a program’s control-flow, by updating the program counter
with an immediate value that corresponds to an effective address. The register data transfer op-
erations, MOVR and MOVC, allow the data loading into a general purpose register or a special
purpose register of the register file. In case of a MOVR operation, the data to be moved is the
content of another register. In case of a MOVC operation, the data is an 8-bit width immediate
value which can be loaded into the register’s high or low order byte, depending on the control
field (t). The graphics operation SAD16 allows the computation of the similarity measure between
a reference macroblock and a candidate macroblock by computing the SAD value for two sets
of sixteen pixels (the minimum amount of pixels for a macroblock in the MPEG-4 video coding
standard), accumulating the result in the content of a special purpose register. The arithmetic op-
erations ADD, SUB and DIV2 perform, respectively, the addition, subtraction and integer division
by two [2].
2.5 Microarchitecture
The designed microarchitecture for the AMEP follows strict power and area driven policies to
support its implementation in portable and mobile platforms. It presents a modular structure and
is composed by simple and efficient units to optimize the data processing. Figure 2.3 shows the
processor architecture.
R2 R3
R6 R7
... ...
R22 R23
R26 R27
R30 R31
Σ
...
...
ASR
SADUAGUΣ
‘0’
‘1’
Negative
Zero
RAM(Firmware)
Instruction Decoding
R0 R1
R4 R5
... ...
R20 R21
R24 R25
R28 R29
IR
PC
10
10
16
4
5
16
16
16
5
10
8
8
6
16
16
16
8
8
16
5
...
MBMEM
SAMEM
MUX
MUX
MUX
MUX
MUX
MUX
MUX
Figure 2.3: AMEP Architecture.
The datapath includes the hardware needed to implement the arithmetic operations included
in the instruction set. For the most complex and specific instructions, such as the SAD16 and
LD instructions, the datapath also includes specialized units to improve the efficiency of such
operations: the SAD Unit (SADU) and the Address Generation Unit (AGU), respectively [2].
The SADU calculates the SAD value between two macroblocks. It can be implemented using
10
Page 31
2.6 Interface
several possible architectures. The choice for a specific arquitecture has influence on the circuit
area, consumed power and number of cycles needed to compute the SAD value, which ranges
from one clock cycle, using a parallel processing architecture, up to sixteen clock cycles, using
a serial processing architecture [2]. Since this is a processor to be used in low power (mobile)
platforms, the prototype will use the serial SADU due to its reduced power consumption.
The AGU generates the necessary addresses to fetch all the pixels for both a macroblock and
an entire search area. This unit is capable of working in parallel with the remaining functional
units, to maximize the efficiency of data processing [2].
The AMEP architecture also includes three memory blocks: program memory (firmware),
search area memory (SA MEM) and macroblock memory (MB MEM). The macroblock and search
area memories are dual port Static RAM (SRAM) memories with 512 8-bit words and 2048 8-bit
words, respectively. The program memory is a single port SRAM memory with 1024 16-bit words.
These memories are used to locally store the current macroblock’s and all of the search area’s
pixels to allow the efficient execution of the SAD16 operation. Dual port configuration for the
search area and macroblock SRAM memories enables writing of new data to be processed while
the SADU is executing. This allows the SADU to be continuosly processing data, improving the
efficiency. The use of these memories in the architecture will impose some care when designing
the test procedures and planning the layout.
2.6 Interface
The external interface of the implemented processor is shown in figure 2.4 [1].
AMEP
done
req
gnt
clk en rst
data
addr
#oe_we
8
20
Figure 2.4: AMEP external interface.
This interface is used in normal operation mode. For testing purposes, additional inputs and
outputs will be used. In this mode of operation the AMEP works as a coprocessor that is inter-
connected with the main processor of the video encoding platform (in this case, a Power-PC), as
illustrated in figure 2.5.
According to [1], the interface with the external frame memory was designed to allow 8 bits
data transfers from a 1MB memory address space. The interface with the external memory bank
11
Page 32
2. Processor Arquitecture
AMEPcore
done
req gnt
enrst
data
addr
#oe_we
8
20RAMPower-PC
Memory Controller
data
addr
reqgnt
Figure 2.5: Video coding platform.
is done using three I/O ports: a 20 bits output port that specifies the memory address for the
data transfers (addr), an 8 bits bidirectional port for transferring data (data) and a 1-bit output
port that sets whether it is a load or store operation (#oe we). Since the external frame memory
is shared with the video encoder, the interface also has two extra 1-bit control ports to implement
the required handshake protocol with the bus master: the req port allows requesting the control
of the bus, while the gnt port allows the bus master to grant such control. The coordinates of the
best matching motion vectors are also outputted through the data port. This operation requires
two distinct clock cycles to complete: one to output the motion vector’s low-order 8 bits (horizontal
coordinate) and a second to output its high-order 8 bits (vertical coordinate). In addition, every
time a new value is outputted through the data port, the status of the done output port is toggled,
to signal the video encoder that new data awaits to be read at the data port.
The processor firmware, corresponding to the compiled assembly code of the considered ME
algorithm, is also downloaded into the program memory through the data port. To do so, the
processor must be in the programming mode, which it enters whenever a high level is simultane-
ously set into the rst and en input ports. In this operating mode, after having acquired the bus
ownership, the master processor supplies memory addresses through the addr port and loads the
corresponding instructions into the internal program RAM. The processor exits this programming
mode as soon as the last memory position of the 1K 16bit-word program memory is filled in. Each
of these 16 bits instruction takes two clock cycles to be loaded into the program memory, which is
organized in the little-endian format.
12
Page 33
3Design for Test
Contents3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 43.2 Circuit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 43.3 Automatic Test Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Observability and Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Scan Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.6 JTAG Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.7 Memory Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2
13
Page 34
3. Design for Test
3.1 Introduction
Testing is one of the most important steps when manufacturing a chip. A defective component
integrated into a system will most probably result in an unusable system. The cost of replacing
or repairing such system is many times much higher than the cost of testing the component
before its integration. Testing can be done at chip, board or system level. The decision to test
every chip, board or system or just a sample of each is influenced by several factors, namely
the test cost per unit, the yield and the repair/substitution cost. Testing procedures are also
important at prototyping, since they can be used to verify the correct implementation of the design
and to assure that the circuit performs as intended by the designer, while potentially helping in
determining the cause of a potential flaw.
3.2 Circuit Testing
A fault is a physical defect that occurs in a circuit and that may cause the change of the circuit’s
logic function. An error is a wrong value that is present at a defective circuit’s output.
An error is a consequence of a fault and a fault is only observable by the error it causes. Faults
might exist in a circuit and never cause an error (e.g. when redundancy exists). A complete test
would verify the circuit outputs for every input combination at every state (in sequential circuits) at
full clock speed. Such test is impractical for today’s dimensions of Very Large Scale Integration
(VLSI) designs, but would assure a functional circuit (at test time).
At its most basic use, testing is used to make a ”good/defective” decision after manufacturing,
either of a chip, a board or an entire system. Additionally, the information that is acquired by
testing a system can also be used for debugging and to make a diagnostic of the malfunction,
either being caused by a manufacturing defect or by a design error.
When integrated circuits had a reduced complexity level, testing was relatively simple, as
almost all of the internal circuit nodes could easily be controlled by simply changing the primary
inputs of the circuit, and the logical value of a certain node could also be easily observed by
propagating it to the primary outputs. As the complexity of integrated circuits increased, controlling
and observing the logic values became more and more challenging, and with the inherent increase
of possible faults, testing became extremely cumbersome and time consuming.
To reduce the complexity of the testing procedure, new techniques were developed to increase
controllabillity and observability and to provide incircuit testing. These techniques have the ad-
vantage of increasing fault coverage and reduce the test complexity and test time. However, they
often present the disadvantages of increasing the development time, chip area, I/O pin count,
power dissipation and even the number of possible faults in the circuit. Due to the additional hard-
ware, these techniques also decrease, even marginally, the circuit’s performance (because of the
increase in the number of logic levels in the design).
14
Page 35
3.2 Circuit Testing
There are several techniques to increase the fault coverage, but the used methods must take
into account the additional costs that are incurred when new hardware for testing purposes is
added. In sequential circuits, testing can be done either at the nominal clock frequency or at a
reduced clock frequency. At nominal clock frequency, additional faults can be detected (e.g. faults
that are timing related due to charging capacities), but most of the times it is hard to make an
efficient test at full clock speed using only the primary inputs. To overcome such difficulties, a Built-
In Self Test (BIST) could be performed. This requires the use of a BIST controller that generates
pre-determined test vectors and analyzes the circuit responses to determine its correctness. This
way, test vectors can be delivered at nominal clock frequency with the advantage of eliminating
the need for external test equipment. However, the additional cost, in circuit area and complexity,
often makes this type of test procedure unviable. Therefore, tradeoffs between costs and benefits
must be made and the most adequate techniques to the circuit under development should be
applied.
Ad-hoc techniques to improve testability include the insertion of test points, sequential circuit
initialization and avoiding redundant logic. Meanwhile, test methods have evolved and instead
of using only ad-hoc test strategies, there are several techniques that allow a structured Design
For Test (DFT). Structured DFT techniques include boundary scan and internal scan chains [5].
DFT techniques are supported by most major synthesis tools. Moreover, the use of both DFT
techniques and ad-hoc techniques are sometimes useful to overcome certain design difficulties.
A multitude of physical defects may occur during chip manufacture. Ideally the chip should be
tested for all of these possible defects. Some of these defects are equivalent, in their nature, and
as such can be modeled in the same way with regard to the effects they produce.
Logical faults represent the effect of physical faults on the behavior of the modeled system. By
modeling physical faults as logical faults, the analysis complexity is reduced since many different
physical faults can be modeled by the same logical fault and the analysis is performed in a logical
rather than a physical level. Additionally, some logical fault models are technology independent.[5]
The most used fault model is the Single Stuck-at Fault (SSF) model. This model assumes that
only one fault exists in the circuit under test and the faulty node is stuck at the logical value 1 or
0. In the case of a circuit having n nodes, then there will be 2n possible faults (a stuck-at-0 and
a stuck-at-1 for each node). Some of these faults are said to be equivalent. Faults f1 and f2 are
equivalent if all of the test vectors that detect one of them also detects the other.
Besides this model, the transition delay fault model is used for detection of single node slow-
to-rise or slow-to-fall faults. The path delay fault model is used to detect timing faults in the circuit’s
critical paths, at nominal working frequency, and is thus used to detect manufacturing defects or
process variations that have a negative impact on the circuit’s timing.
Some physical defects induce logical faults only under certain conditions which might not
happen at test time. To test some of these physical defects that might not result in logical faults at
15
Page 36
3. Design for Test
test time (e.g. a drain source short circuit in the p-MOSFET at a CMOS gate), a quiescent supply
current (IDDQ) test might also be used. This type of test measures variations in the supply current
and is able to detect some defects at the physical level.
Hence, it should be noted that the SSF model does not take into account many of the possible
faults that may occur in today’s submicron VLSI designs. However, it is a technology independent
fault model that represents many different physical faults. Experience shows that tests that detect
SSFs also detect many non classical faults [5]. Moreover, Automatic Test Pattern Generation
(ATPG) tools widely support this fault model and the test vectors generated using the SSF model
can usually be applied to a circuit without using expensive Automatic Test Equipment (ATE).
3.3 Automatic Test Pattern Generation
ATPG is a method to automatically generate an input vector that will enable the detection of
a given fault based on the different circuit output, in the presence of that fault. Test generation is
a complex problem which is influenced by various factors. Among these factors, the cost of the
test generation, the quality of the generated test and the cost of applying the test are the most
important. A low cost method for generating test patterns is a random pattern generator. However,
the cost for either determining the test quality (by fault simulation) and the cost of test application
(due to the high amount of test data) may be too high. On the other hand, deterministic test
generation produces test vectors by processing a model of the circuit. Although the generating
cost is more expensive than random generation, the test quality is usually higher and the cost of
test application may be significantly lower due to a smaller amount of test data.
The quality of the generated test vectors is measured by the fault coverage. The definitions
of fault coverage are different according to each author. In [5], the fault coverage for detectable
faults is a relative measure that indicates the number of detected faults in relation to the number of
detectable faults, according to the used fault model (the number of faults in the design subtracted
by the number of undetectable faults). In [6] the fault coverage is defined as the relation between
detected faults and all faults. According to this author, the definition of test coverage is the relation
between the detected faults and the number of detectable faults according to the used fault model.
The nomenclature adopted in this work follows the definitions given in [6]. Equation 3.1 is used to
calculate the test coverage [6] and equation 3.2 is used to calculate fault coverage.
testcoverage =#detectedfaults
#totalfaults − #undetectablefaults(3.1)
faultcoverage =#detectedfaults
#totalfaults(3.2)
According to equation 3.1 all proven redundant faults (included in the undetectable faults) are
excluded from the fault universe. This requires the test generation algorithm to be able to identify
16
Page 37
3.4 Observability and Controllability
redundant faults [5].
Test generation can be fault oriented or fault independent. Fault oriented algorithms include
the D-algorithm, the 9-V algorithm, the Path Oriented Decision Making (PODEM) algorithm and
the Fanout Oriented (FAN) algorithm and aim to generate a test for a specific fault [5]. These
algorithms belong to a class of test generation algorithms referred to as path-sensitization algo-
rithms and require the determination of an initial set of faults, the selection of a target fault and the
maintenance of a set of remaining undetected faults [5]. To detect a certain fault, it is required to
set each node logic value to the opposite of the value produced by the fault under analysis (fault
activation). It is then required to propagate the resulting value by sensitizing a path from that node
up to a primary output (fault propagation). Fault independent algorithms aim to compute a set of
test vectors that detect a large set of SSFs, without targeting any individual fault. Having in mind
that half of the SSFs along a critical path of a test vector are detected by that test, it is desirable
to generate tests that produce long critical paths. The critical-path test generation algorithm does
this [5].
The advantage of the random test generation is the simplicity of vector generation. The main
disadvantage is that the set of randomly generated vectors that detect a given set of faults is much
larger than the set of deterministically generated test vectors. There are combined test genera-
tion methods, like Random Path Sensitization (RAPS), that attempt to merge the advantages of
deterministic and random test generation methods [5].
All the previously mentioned algorithms are meant for combinational logic. Test vector gen-
eration for sequential circuits is significantly more difficult because the test of a certain fault may
require the input of various test vectors in sequential order. Some test generation methods for se-
quential circuits use iterative array models, in which each array element represents a time frame.
This way, sequential circuits test generation is done by converting the sequential circuits into com-
binational circuits, where previous test generation methods can then be used. Simulation based
test generation can also be used to generate test vectors in sequential circuits, by generating and
simulating trial vectors. Based on the simulation results, these trial vectors are evaluated using a
predefined cost function and the best trial vector is added to the test sequence. Other methods
exist to generate test vectors for sequential circuits that use Register Transfer Level (RTL) models
or random test generators [5].
3.4 Observability and Controllability
Observability may be defined as the ability to observe the changes at the internal nodes
through the primary outputs. On the other hand, controllability is the ability to control the internal
nodes using the primary inputs.
To evaluate the value present at a given circuit node, there has to be a path through the logic
17
Page 38
3. Design for Test
circuit up to a primary output in such way that a change in the node’s logic value induces an
equivalent change in an output’s logic value. This way, the node’s value can be observed on a
primary output. Similarly, to force a node to a certain logic value, the primary inputs must be
defined in such way that the node’s logic value can be set.
To test a circuit, specific values must be set at the primary inputs (test vectors) to control the
logic value of some internal nodes and to allow its propagation to the primary outputs, in order
to observe and certify the correctness of the logic values at those nodes. To achieve both these
tasks, the internal nodes must be simultaneously controllable and observable.
To control and observe a certain node in a sequential circuit, logic values may need to be
passed through several memory elements (such as latches or flip-flops). This will naturally in-
crease the complexity of generating test vectors for a certain fault. It is also often required that the
control signals of those memory elements (set, reset, enable and clock signals) are controllable
during the entire testing time. In certain situations, some of these control signals are generated
by the logic inside the circuit and, therefore, are difficult to control from the primary inputs. To
avoid such situation, additional hardware (such as multiplexers) are added to these control lines,
so that, when in test mode, these values can be controlled from a primary input. Additionally, it
is also advisable not to use gated clocks, as they can be harder to control during test mode, and
many of the ATPG tools do not support them during test vector generation.
Enhancing observability can be accomplished by inserting observation points in the circuit.
The observation points are dedicated outputs that are directly connected to an internal node and
allow the observation of that node’s logic value. If these points are carefully chosen, they can
increase the ability to detect faults. Although this technique requires additional I/O pins, this is
often the only available method to increase the observability and to increase the fault coverage.
3.5 Scan Structures
As explained, testing requires the delivery of test patterns to the circuit inputs, in order to
control the logic value of a given node. It also requires the propagation of the resulting logic value
present at that node to a primary output, so that it can be observed. In sequential circuits, this
task is significantly more complex, due to the memory elements present at the circuit. One way
to simplify the delivery of the test patterns is to rearrange, at test time, these memory elements in
order to form a shift register (called a scan chain). With this shift register, the controllabillity and
the observability are greatly increased as it becomes much easier to set and capture the logic
value of a node which is deep inside the circuit.
In order to implement this scan chain, the ordinary flip-flops used in the design must be re-
placed by scan flip-flops. These scan flip-flops have additional inputs that enable them to either
function in normal mode or in test mode. When in test mode, they are usually connected to form
18
Page 39
3.6 JTAG Boundary Scan
a shift register. Figure 3.1 shows the additional hardware that is needed to transform a normal
flip-flop into a multiplexed scan D flip-flop.
CLK
D QD
Clock
Q
(a) non-scan flip-flop.
CLK
D Q
SEL
0
1
Out
Scan_Enable
D
Scan _in
Clock
(b) multiplexed scan flip-flop.
Figure 3.1: D type Flip-Flop.
The scan chain is used to shift in the test vectors into the circuit and after one or more clock
cycles, to allow the propagation of the values through the combinational and/or sequential logic,
it is used to capture the resulting values and shift them out. During these shift operations, data
propagates between the flip-flops that form the scan chain (shift register). To capture the values
of the combinational logic, the normal inputs of the flip-flops are selected (using a control signal)
and then the flip-flops are set back to scan mode to allow shifting out of the captured values.
The addition of scan chains and of the corresponding scan flip-flops increases the circuit area,
the consumed power and has impact in circuit timing, because of the additional logic elements.
Different scan styles have been proposed [5], and in general they differ in the penalty incurred in
each of these factors and the complexity of generating the test control signals.
The scan styles (with the associated scan flip-flops) available for the designer to choose from
are, among those most commonly used and supported by ATPG tools, the Multiplexed Flip-Flop,
the Clocked-Scan, Level Sensitive Scan Design (LSSD) and the Auxiliary-Clock LSSD [7]. The
choice for the most suitable scan style for a given circuit can be made based on each style’s
advantages and disadvantages. However, when standard cell libraries are used, the available
types of scan flip-flops must also be taken into account, since each of these methods requires
different flip-flops that may not be available. Nevertheless, even if the standard cell library does
not include any type of scan flip-flops, it is still possible to assemble a multiplexed flip-flop using
discrete cells (normal flip-flops and multiplexers) to enable the implementation of the multiplexed
flip-flop scan style. However, this strategy incurs a larger penalty in timing and area. The adopted
standard cell library in this work only contains multiplexed scan flip-flops. As a consequence, the
multiplexed flip-flop scan style was chosen.
3.6 JTAG Boundary Scan
The test of component interconnection at board level has become more complex with the
advent of multilayer PCBs and non-lead-frame ICs. To overcome this difficulty, the Joint Test
19
Page 40
3. Design for Test
Action Group (JTAG) proposed a process to test interconnection between board components
(ICs) that included a Test Access Port (TAP) controller and special I/O cells in every chip. These
special I/O cells (boundary scan cells) are controlled by the TAP controller and can be serially
connected, at test time, to implement a Boundary Scan Register (BSR). Figure 3.2 shows a basic
boundary scan cell that is used to build the BSR. Other cells are used according to the function
of the pin. This cell can be used on input and output pins but not on three-state pins.
CLK
D Q
SEL
0
1
Out
CLK
D Q
SEL
0
1
Out
Clock-DR Update-DRShift In
Shift Out
Mode
Shift DR
IN
OUT
Figure 3.2: JTAG Basic Boundary Scan Cell.
Boundary scan cells can be classified between observe-only, control-only and control-and-
observe cells. Observe-only cells are typically used with the clock signal, since no control should
be exerted. A control-only cell can be used for the enable signal of three-state buffers, while
control-and-observe cells can be used on all the two-state inputs and outputs. A three-state driver
usually has a more complex cell, composed by two control-and-observe cells (one for the input
and another for the output) and might include a control-only cell for the enable signal.
The BSR can be used to shift in and out the values at the various chip’s I/O pins and thus
set and capture the signals propagated through the Printed Circuit Board (PCB)’s pathways. Fig-
ure 3.3 represents a possible interconnection between several chip’s I/O cells to implement a BSR
and the required signals for the TAP controller.
With the already available hardware inside every chip, the TAP controller could be modified to
control additional test functions, such as BIST, scan chains and other user defined hardware. This
interface can also be used to load programming values into programmable devices like FPGAs.
The JTAG proposal became IEEE Standard 1149.1 [8] in 1990.
By implementing the JTAG interface, the IC is not only easier to test using already available
test equipment that complies with the IEEE 1149.1 Standard, but also allows the testing of the
circuit and its connections when included in a larger system.
As described in the standard, the TAP includes Test Clock input (TCK), Test Mode Select
(TMS), Test Data Input (TDI) and Test Data Output (TDO) connections and, when a power-up
reset of the test logic is not performed, it also provides a Test Reset (TRST) connection. All of the
20
Page 41
3.6 JTAG Boundary Scan
Boundary Scan Cell
Shift Out(to next cell)
Shift In(from previous cell)
IN OUT
Core Logic
TAP
Core Logic
TAP
Core Logic
TAP
TCK
TMS
TDI TDO
Figure 3.3: JTAG Boundary Shift Register and TAP controller connections.
TAP inputs and outputs are dedicated connections and should not be used for any other purpose.
In order to be compliant with the standard the TMS, TDI and TRST inputs must behave like if a
logic 1 is applied when the input is undriven (an internal pull-up must be present at these inputs).
The JTAG TAP controller includes a state machine that is controlled by the TMS signal. By
driving this signal with the appropriate values, the control of the internal TAP state machine is
performed according to the state diagram in figure 3.4
According to the IEEE standard, the implementation of the BYPASS, EXTEST, SAMPLE and
PRELOAD instructions is mandatory. All other instructions that may be implemented are either
optional instructions, defined in the standard, or user specified instructions.
For every IEEE 1149.1 compliant device, there has to be a Boundary Scan Description
Language (BSDL) file associated with it. This file describes the nature of the IC pins (input,
output or bidirectional pin), the logical correspondence between signal names and physical pins,
the identification of the pins used by the JTAG interface, the description of the instruction register,
the implemented instructions and their opcodes, the identification of each data shift register that
is accessed in each of the instructions and a description of the BSR, listing all the cells in it and
their functionality. This file allows IEEE 1149.1 compliant test equipment to know the capabilities
of the circuit’s test logic and, if an additional file containing the description of IC interconnections
in a system board is given, allows it to perform test procedures on the system board using the
assembled BSR, as represented in figure 3.3.
21
Page 42
3. Design for Test
Test Logic Reset
Select DRRun Test / Idle Select IR
Capture DR
Shift DR
Exit1 DR
Pause DR
Exit2 DR
Update DR
Capture IR
Shift IR
Exit1 IR
Pause IR
Exit2 IR
Update IR
1
111
1 1
1 1
1 1
1 1
1 1
1 1
0
0
0 0
0 00 0
0 000
0 0
0 0
Figure 3.4: TAP state machine.
3.7 Memory Test
When memory blocks are present in a circuit, they also need to be tested. However, memory
cells are usually tested using a more complex fault model than the SSF model, because memories
have more physical faults that can not be modeled by stuck-at lines. Hence, bridging faults and
coupling faults need to be taken into account. A rather exhaustive memory testing could be
performed, at nominal speed, by writing a bit and verifying that it was written correctly and that
neither of the remaining bits had their value changed. Then, the complementary value should
be written on the same bit and verified its correctness and that neither of the remaining bits had
their value changed. Although this is a thorough test, it would take too much time to complete.
Consequently, instead of testing, for every changed bit, the remaining memory bits, common
testing procedures only consider the surrounding bit cells, as these are the most likely to be
affected with transitions on a given bit cell. Although this drastically reduces the test time, it
requires information about the physical memory cell layout.
A march test is composed by a set of march elements. An ascending (descending) march
element is a finite sequence of read or write operations that are repeated in each memory cell in
ascending (descending) address order. The march test is applied to each cell in memory before
proceeding to the next cell, which means that if a pattern is applied to one cell then it must be
applied to all cells. All operations of a march element are done before proceeding to the next
22
Page 43
3.7 Memory Test
address [9]. The faults that may exist are detected in the read operations, when the read values
are compared with the values defined in the test. The read and write operations are denoted by
the r and w symbols. The read and write notation is complete when the value to be read or written
is presented after the r and w symbols (e.g. r0 or w1). A march element can contain several read
or write operations for the same address. This is represented like (w0,r0,w1) in which, for every
address, a write 0 followed by a read 0 and by a write 1 operation is performed. An ascending
march element is denoted by the ⇑ notation while the ⇓ notation denotes a descending march
element. The m notation denotes an either ascending or descending march element [10].
An example: the march test {⇑(w0,r0,w1);⇓(r1)} would start at the lowest address and per-
form a write 0, followed by a read 0 and a write 1 to that addres. It would then increase the
address by one position and perform the same operations. When the last address is reached and
all the operations are done, the first march element is concluded. The next element starts at the
highest address and performs a read 1 operation. Then it decreases the address by one position
and repeats the read 1 operation. If the read value is not 1, a fault is detected. When the address
reaches its lowest value, the march element is concluded as well as the entire march test.
The previous notation assumes that individual bit cells are addressed. In word oriented mem-
ories, however this is not the case and words are written into the memories and not individual bits.
The notation adopted in this work, replaces the 0 and 1 values in the march elements with the
values written in a word (repressented in hexadecimal base).
To test the memory blocks embedded in the designed processor, a march test was produced.
Addressing these memory blocks and observing their outputs is not possible using the chip’s
inputs and outputs that are used under normal operation. However, since the chip includes scan
chains, these memories can be addressed and their outputs observed using these chains. This
is a low cost option, as it requires no additional hardware to the already included scan flip flops.
However, it is a poor testing method, as it requires a large amount of time to conclude (which might
not be a problem when prototyping) and it does not allow the memories to be tested at full clock
speed. To enable a full clock speed testing with a higher fault coverage test, a BIST controller was
designed. This controller provides the testing options required by the memories and enables the
designer to gather information that could help in diagnosing the design and, possibly, still use a
partially defective memory.
As stated in section 4.4, due to the adopted library the memory layout is not available and
thus the memory physical structure is not known. Therefore, these patterns were chosen having
in mind the possibility that adjacent word bits correspond to adjacent memory cells and thus
achieve a higher fault coverage. The march test that was produced for these memories, as seen in
figure 3.5, detects all transition faults, all stuck at faults, and all address decoding faults. However,
since the memories structure is not known and the test is not exhaustive, only some coupling faults
and some state coupling faults will be detected. The used memory test patterns are 01010101b
23
Page 44
3. Design for Test
(55h), 10101010b (AAh), 00000000b (00h) and 11111111b (FFh).
���������� ���� ������)00();00,();;00();00(
);55();55;();,55();55,();,55();55(
hrhwrFFhwFFhhrhw
hrhwrAAhwAAhhrhwrAAhwAAhhrhw
Figure 3.5: Implemented March Test.
The test is done by comparing the values that are read from the memory with the values
supposed to be stored at the various addresses. In the event of a mismatch, the BIST controller
will stop its operation. By means of a scan chain, the controller has the capability to shift out
the address of the failing cell. Furthermore, the controller also has the capability to resume the
test sequence (from the failed address), in order to complete the test sequence. For prototyping
purposes this has the advantage of returning more information than a simple good/defect test
result. The designer can then use this information and still is able to partially operate a defective
circuit (e.g. if a program memory cell is defective, the designer could make an assembly code that
would avoid that particular address and still be able to use all of the remaining circuit).
As mentioned before, in section 2.5, the memories used in the AMEP are two dual-port SRAMs
with 8-bit words each and one single-port SRAM with 16-bit words. The memories available in
the adopted technology will be described, in detail, in section 4.4.1. Since the single-port SRAM
uses the bytewrite capability, the BIST controller will also need to test this feature.
Since these memories do not provide a dedicated interface for test data, multiplexers have to
be added in order to control the data applied into the memory inputs: test data from the memory
BIST controller or normal data from the implemented processor.
The architecture of the implemented memory BIST controller is shown in figure 3.6. This con-
troller is composed of one comparator (with one of the inputs registered) for error detection, an
up/down counter for sequential address generation and a shift register for bytewrite enable signal
generation (if the memory has bytewrite). The controller’s state machine has 31 states and is re-
sponsible for implementing the march test. The controller interface with the outer circuitry includes
three input control signals and two output result signals. The input control signals are the bisten
(enable), bistrst (reset) and bistgo (start/resume test sequence) and the output result signals
are the bistrslt (fault detected) and bistend (end of test sequence). While performing the test
sequence, the controller’s enable signal, bisten, is high. To actually start the test sequence, the
bistgo signal must be high during one clock cycle. The bistrslt signal indicates the test result
(logic value 0 if no error was detected; logic value 1 if an error was detected). The bistend signal
indicates the end of the test sequence. If no error is detected, the bistend signal is set high and
the bistrslt signal will remain low. However, if during the test sequence, an error is detected,
the bistrslt signal will be set to high, while the bistend signal remains low, and the controller
will enter into a pause state. At this state, the controller will wait for the bistgo signal to go high,
24
Page 45
3.7 Memory Test
indicating that the result has been read and the memory address has been scanned out (if desired
by the user), and thus the test sequence may be resumed. The controller also includes output
signals to address the memory (bistaddr), to set the memory data inputs (bistctr dout) and to
control the memory write enable signals (bistbwen and bistwen). The bistctr din input of the
BIST controller is driven by the memory data output.
Considering that this controller is part of a power efficient processor, it should also minimize its
power consumption. As a consequence, the controller should be deactivated during the normal
operation mode of the processor, by deactivating the controller’s enable signal. This guarantees
that no transitions occur in the sequential elements and, consequently, the memory BIST con-
troller reduces its power consumption to a minimum. Nevertheless, the bistctr din input, which
is driven by the memory output, will naturally present some switching activity during normal oper-
ation mode. Since this input directly drives the combinational comparator, some power would be
consumed by the comparator logic. Therefore, an array of AND gates is placed between the mem-
ory output and the comparator input to disable the propagation of any switching activity during the
normal operation mode, thus minimize the inherent power consumption.
The VHDL code used to describe and synthesize the memory BIST controller can be found
in Appendix A. This VHDL description allows for the synthesis of memory BIST controllers for
State Machine
bisten
bistrst
CLK
Shift RegisterN
2
Up/Down Counter
M
DIR
RSTEN
Comparator
RegisterEN
N
EN
bistctr_dout
bistctr_din
N
N
bistwen bistend bistrslt bistbwen bistaddr
to memory to memory
from memory
bistgo
EN
RST
Figure 3.6: Simplified Memory BIST Controller architecture.
25
Page 46
3. Design for Test
several memory configurations.
To avoid routing congestions and unnecessary added BIST controller complexity due to dif-
ferent memory configurations, a dedicated memory BIST controller was implemented for each of
the three memory blocks of the processor. Since each of the memory BIST controllers needs
three control and two output signals, the total number of additional pins to control and observe the
test result is fifteen. Nevertheless, to avoid the unnecessary use of I/O pins, the enable signals
(bisten) of the memory BIST controllers are encoded using a two bit signal. Moreover, and since
only one controller is active at a time, all the individual input (bistgo) signals can be driven by the
same global (bistgo) signal. Furthermore, the (bistend) and the (bistrslt) output signals, of
the three controlers, can be multiplexed to reduce the need for extra output pins. By using this ap-
proach, the required number of Input/Outputs (I/Os) is reduced from fifteen to five I/Os exclusively
assigned to the memory BIST structures.
26
Page 47
4ASIC Design
Contents4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 84.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Foundry and Technology Selection . . . . . . . . . . . . . . . . . . . . . . . . . 324.4 Library and Technology Characterization . . . . . . . . . . . . . . . . . . . . . 334.5 Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.6 Pin Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
27
Page 48
4. ASIC Design
4.1 Introduction
An ASIC design usually begins with its description using a Hardware Description Language
(HDL). As soon as the functional or structural description has been validated by simulation, the
design can be synthesized using a standard cell library (the frontend stage). This implementation
step translates the design into basic design blocks (standard cells). These standard cells must
then be placed inside the available die area (placement) and the connections between these cells
should be made (routing) using the metal layers available in the chosen manufacturing process
(the backend stage). Although the design may be simulated during the several stages, a final
simulation with timing information should be done to validate the layout, prior to fabrication, in
order to assure a high probability of first time success.
In this chapter, the several steps that were required to manufacture the processor, using a
standard cells library, from a given HDL description are explained. The selection of the particular
technology that was adopted is discussed and the used standard cell library is characterized.
Some design options that were taken are also explained and some general guidelines that should
be observed during implementation are discussed.
4.2 Design Flow
In figure 4.1, the generic workflow to achieve a layout, starting with a HDL description, is
shown.
The description of a digital circuit often is performed using a given HDL which allow the de-
scription of a digital circuit’s function, structure or behavior using text-based programming-like
syntax. Furthermore, HDLs usually allow a given circuit to be described using a mixture of struc-
tural, behavioral or RTL descriptions. With these capabilities, complex systems can be easily
described and simulated, in order to verify that they have the desired functionalities, in a technol-
ogy independent way.
When a full structural description of the circuit is adopted, using the netlist format, the designer
individually instantiates the required cells from the library and assembles them into a circuit that
performs as intended. This allows full control of the implementation, but drastically increases
the design time. On the other hand, when a behavioral description of the circuit is adopted,
the designer describes the intended functions and algorithms of the circuit. Nevertheless, a full
behavioral description (using an algorithmic description) of a circuit may not be synthesizable.
Therefore, the circuit has to be described in a way that is implementable in hardware and that is
understood by synthesis tools. This is usually called a RTL description. In a RTL description, the
circuit is described as a set of register elements and a set of transfer functions that describe the
data flow between the register elements. The structure of the description is very much alike the
model of a sequential circuit (sequential elements + combinational logic). Therefore, the designer
28
Page 49
4.2 Design Flow
Test Insertion
Synthesis
Place&Route
Sign- Off Timing Analysis
LVS/DRC
Frontend
Backend
HDL description
Layout
Figure 4.1: Generic workflow for ASIC design.
needs no adhere to a given description style that will be correctly interpreted by the synthesis tool.
Between RTL and structural descriptions, the RTL description is much simpler for the designer
but might not always achieve the desired results in terms of performance. Consequently, some
designers adopt a mix of both RTL and structural descriptions, making a gate-level description
only for the critical blocks. After being correctly described, the RTL or structural descriptions
can be implemented using the available cells in the library. This translation process between the
RTL or structural descriptions and the standard cells is called synthesis and is performed by the
synthesis tool. Despite possible, behavioral descriptions do present additional problems when
being interpreted by synthesis tools.
Among the several HDLs, Verilog and VHDL are the most widely used and supported lan-
guages. In particular, the description of the implemented ME processor was done in the VHDL
language using a RTL and structural description style.
As soon as it is described, the circuit is simulated in order to verify if it performs as intended
by the designer. After this simulation, the design is synthesized against a given target technology.
This can either be an FPGA or a standard cell library. In this work, the target technology is a
standard cell library for UMC 0.18µm 1P6M process.
A standard cell is a group of transistors and interconnecting structures that implement a simple
29
Page 50
4. ASIC Design
logic gate (e.g. NAND, NOR), a combinational logic function (e.g. 1-bit full adder, multiplexer), a
sequential element (e.g. D type flip-flop) or even a memory cell. A set of standard cells, which
implement several different logic functions, is usually named as a standard cell library. Which cells
and, consequently, which logic functions compose this library is determined by the standard cell
manufacturer.
Each standard cell has a logic description (the logic function it implements) that corresponds
to a physical implementation (layout). The logic description of the cell is usually named as the
logical view and the physical implementation is named the layout view. The logical view is an
abstraction level that has the cell’s truth table (for combinational logic) or the state transition table
(for a sequential element). This view allows automatic synthesis tools to implement a complex
system by interpreting a circuit’s logic function without being aware of the physical implementation
details. Additional views are available, to characterize other attributes of the cell (e.g. timing,
power, interface), that are used in the different phases of circuit design.
Constraints can be given at this phase in order to guide the synthesis and produce a circuit
that meets the designer’s expectations. These constraints are used by the synthesis algorithms to
select the most adequate architecture and choose the appropriate cell in the library. In general, a
library has cells with the same logical function but with different characteristics (e.g. area, power
consumption and propagation delays). These constraints are not mandatory and do not need to
be set in order for a tool to synthesize a circuit. Nevertheless, if no constraints are explicitly set
by the designer, the outcome of the synthesis process is the result of default constraints. Usually,
most synthesis tools tend to synthesize a circuit for minimum area when no other constraints
are set, which may lead to results quite different from the designer’s expectations. To obtain a
better result, an iterative process shall be followed, in which the constraints are introduced and
their values properly tuned in each cycle, until a satisfactory result is obtained. At the end of the
synthesis process, a gate-level netlist representing the interconnections between the standard
cells that compose the design is obtained.
A simulation of this synthesized circuit should then be performed to ensure that the synthesized
circuit still performs as intended. This simulation may already take into account some timing
information regarding the cell delays and, if it exists, an estimate of the interconnection delays
between the cells. Even though it is not a complete timing simulation, it is usefull enough to detect
some design errors.
At this stage, most synthesis tools also allow the insertion in the circuit of test related structures
(e.g. scan chains) using their interfaces. Therefore, these structures do not have to be described
using the HDL. Furthermore, synthesis tools are usually able to perform testability checks before
automatically assembling the test structures. These capabilities simplify the DFT step, removing
a very significant part of the designer’s workload.
After having the circuit synthesized, the steps to physically implement the generated netlist
30
Page 51
4.2 Design Flow
are performed. Usually before performing cell placement, the power planning is done. The power
planning phase consists on defining the power rings and power stripes (for power and ground)
that will take VDD and GND to the entire chip. An initial estimate of the current requirements
must be made to define the geometry and number of stripes. After the chip is completely routed,
information regarding the power consumption can be extracted and analyzed, to determine if the
initial power structures are correctly sized. If power constraints are not satisfied the power and
ground nets must be resized in order to meet the required values and another iteration, which
may include placement and/or routing, must be done.
Placing the cells and routing signals are the next steps in the design flow. In the placement
phase, the standard cells are placed inside the available silicon area. Typically, the standard cells
have a constant size in one of its dimensions (e.g. all cells can have the same height but different
widths), allowing them to be distributed in rows by the placement tool, certifying that none of them
overlap and eventually leaving extra intra-cell spacing. This procedure is extremely important,
since it has a direct impact in circuit timing, routing congestion and feasibility. For better results,
constraints should be given to the placement engine, so it can have information about the required
timing of the circuit.
In synchronous digital circuits, the clock signal should arrive, at the same time, at all of the
synchronous cells. Therefore, this requirement implies that all of the clock paths should have the
same propagation time. Nevertheless, this is extremely difficult to be accomplished using only
the delay imposed by the propagation on the signal lines, since it would require that all paths
should had the exact same length and load. Clock skew is defined as the maximum difference of
the clock arrival times at sequential elements. The maximum allowable clock skew is such that
no data signal transition, consequence of a given clock transition, will arrive at the next clocked
element in its path before that clock transition (considering a setup time of zero). One approach
to reducing clock skew consists in the insertion of delay buffers in the shortest paths so that the
arrival times of the clock signals are approximately the same. Such procedure is automatically
conducted during the synthesis of the clock trees, which analyses the several paths and inserts
buffers in order to reduce or eliminate the clock skew.
After the placement, the power planning and the clock tree synthesis are done, the design is
ready to be routed. Routing is the procedure that implements the interconnection of the various
cells inputs and outputs using the available metal layers. Today’s routing engines not only try
to avoid congestion and comply with the given timing constraints, but also try to reduce adverse
effects that may also happen, like cross-talk. The routing process also connects power and ground
structures to all of the cells. The routing of power structures is usually performed before the routing
of signal lines.
After the design has been routed, it is possible to make a parasitic extraction and obtain timing
information from the resulting layout. This information can be incorporated into a more detailed
31
Page 52
4. ASIC Design
simulation model to validate the final layout where cell and interconnection delays are considered.
Moreover, an electrical simulation, using an extracted electrical model, could also be performed
on the clock network tree to verify if it performs according to specifications.
An important aspect of the manufacturing process is the yield. If the yield is too low, the design
could become economically condemned due to the high costs of fabrication per unit. This could
make a working chip too expensive to be viable. As a consequence, the designer should take this
into account and Computer Aided Design (CAD) tools should also provide the means to increase
the yield [11]. One possible alternative is the usage of additional logic. As an example, when
memories are fabricated, they usually have additional cells (built-in redundancy) that can be used
to replace defective cells. Unfortunately, this is not the case in this work, since the used memories
do not possess any additional cells. As mentioned in [12], via duplication also improves yield.
Wire widening and spreading are other factors that improve the manufacturing yield [13]. In this
work, via duplication was the only used method to improve the yield.
4.3 Foundry and Technology Selection
Several different technologies are available to implement a given circuit. Complementary Metal
Oxide Semiconductor (CMOS) is currently the most used technology for IC manufacturing, due to
its low static power consumption. Within these, there are several foundries with various process
dimensions and their own set of design rules.
As a consequence, foundry and technology selection is a crucial and very important aspect
of an ASIC design. It influences the area, the power consumption, the delays and operating
frequency, the available cells and memories, the manufacturing costs and the manufacturing dates
(runs). Support for the standard cell library and the availability of the corresponding configuration
files may also constraint the set of usable Electronic Design Automation (EDA) tools (e.g. if the
library does not have characterization files for a given tool) or the other way around (the available
tools constraining the choice of standard cell libraries).
The manufacturing of the considered circuit is done through EUROPRACTICE, including the
acquisition of the standard cell library. EUROPRACTICE IC service allows the production of pro-
totypes at relative low costs, by using Multi Project Wafer runs. Each wafer is composed by de-
signs coming from several participants, thus distributing the cost of mask production through the
various participants (proportionally to the occupied area). Furthermore, universities and other re-
search institutes, which usually have small prototype designs, have access to EUROPRACTICE’s
mini@SIC program. This program reduces the fabrication costs of small designs by reducing the
minimum design area imposed to each participant and thus decreasing the cost of small designs.
Among the supported processes and foundries under EUROPRACTICE’s mini@SIC program,
the UMC foundry, with its 0.18µm CMOS process with 1 poly and 6 metal layers (UMC L180 1P6M
32
Page 53
4.4 Library and Technology Characterization
MM/RFCMOS), is the available implementation technology with the most stable libraries and with
a financial cost covered by the budget of the project. In this process, the general Multi Project
Wafers are divided in blocks of 5 x 5 mm each. The mini@SIC program further subdivides each
of these 5 x 5 mm blocks in 9 regular square sub-blocks. Designs may occupy one, two, three,
four, six or nine of these sub-blocks. Nevertheless, using nine of these sub-blocks is economically
discouraged, since using a complete 5 x 5 mm block (the equivalent of the nine sub-blocks)
on the general program is less expensive. A design that occupies one sub-block may have a
maximum size of 1525 x 1525 µm, while a two sub-block design may have a maximum size of
3240 x 1525 µm.
During a preliminar phase of this project, the Standard Cell Library from Virtual Silicon [14]
was used. Since UMC has discontinued the support to this Standard Cell Library, an alternative
Standard Cell Library from Faraday Technology [15] was used. This change of the adopted library
implied a susequent change on the used memories architecture and interface, which required an
adaptation of the processor, including the memory BIST controller. It also changed the capabilities
of I/O cells which led to a new selection of these cells. Furthermore, the available core cells also
changed, which led to different synthesis results.
4.4 Library and Technology Characterization
The FSA0A C library [15] is a 0.18µm standard cell library tailored for the UMC 0.18µm logic
process. The nominal supply voltage is 1.8V for the core cells, and 3.3V for the I/O cells, with
some I/O cells being 5V tolerant. Table 4.1 shows the general characteristics of this library [15].
Table 4.1: Faraday’s FSA0A C Standard Cell Library General Characteristics.
Characteristic DescriptionTechnology UMCs 0.18µm 1.8V / 3.3V 1P6M logic processMinimum drawn channel length 0.18µmSupply voltage 1.62V to 1.98V for core cells
2.97V to 3.63V for 3.3V I/O cellsPerformance Td = 27.5ps / stage (measured with a 101 stage
inverter ring and a typical process operated un-der 1.8V, 25◦C)
Gate density 110K gates / mm2
Power consumption 29 nW / MHz / gate (measured with a 2-inputNAND, output load = 2 standard load, and a typ-ical process operated under 1.8V, 25◦C)
Reference cell area 9.794µm2 (2-input NAND with normal drivingstrenght - ND2)
This Standard Cell Library is composed of core cells and I/O cells. The core cells include all
logic function cells like AND, NAND, OR, NOR, XOR, NXOR, Multiplexers, Flip-Flops, Latches,
1-bit full and half adders and other cells. These cells are used to build the logic core. The I/O cells
33
Page 54
4. ASIC Design
come in two formats: Inline and Staggered. The Inline format is recommended for core limited
designs, while the Staggered format is recommended for I/O limited designs. The dimensions of
the I/O cells available in this library are presented in table 4.2 [15]. Both I/O cell formats can be
combined with inline or staggered pads. This would make four possible combinations between
I/O cells and pads, as shown in figure 4.2.
Table 4.2: I/O cell dimensions.
Height ( µm) Width ( µm) Bonding pad positionInline I/O cell 140.12 62.62 Outside I/O cellStaggered I/O cell 235.60 34.10 Outside I/O cell
Figure 4.2: I/O cell and pad combinations.
The I/O cells available in this library, do not have a physical pad included in their description
and there is no pad cell defined in the library. As such, a custom made pad must be used. Since
the physical layout of the I/O cells is not available when these libraries are supplied by EURO-
PRACTICE, designing a custom pad to connect with the library cells is not possible. EURO-
PRACTICE made available a generic bondpad, which was specially designed to properly connect
with these I/O cells and that complies with UMC Bonding Pad Layout Rules [16]. This pad has
69x69µm with a passivation window of 65x65µm. The pad and the connecting metal layers to the
standard I/O cells have 69x79µm, as shown in figure 4.3.
The I/O cells of this library offer the possibility to be programmed, after being implemented
on silicon. The use of Programmable I/O on Silicon (PIOS) allows the user to enable pull-up or
pull-down resistances, as well as Schmitt trigger control for inputs. It also allows programming
of slew rate and driving capacity for outputs. These features are controlled by additional control
pins in these cells. Hence, although these features could be of interest, they do require additional
input pins to achieve the desired configurations. However, since there is no need for such features
in this project, the configuration of these cells was done by hardwiring the control inputs to the
desired values. The input and bidirectional I/O cells were configured to not use Schmitt trigger
nor pull-up or pull-down resistances, with the exception of the TMS, TRST and TDI inputs of the
JTAG TAP controller, which were programmed to include pull-up resistances. The output and
34
Page 55
4.4 Library and Technology Characterization
Passivation window
69m
69
m65m
79
m
Metal track
Metal track
Figure 4.3: Bonding pad layout.
bidirectional I/O cells were programmed to have a 2mA output driving capacity (the minimum
possible value) with a fast slew rate.
This library, requires the usage of three power supplies: one power supply net for the core
(VCCK at 1.8V) and two power supply nets for I/O (VCC3I and VCC3O, both at 3.3V). Figure 4.4
shows a representation of the power supplies for these cells library [15]. The VCCK power net
supplies the internal cells, the 1.8V input drivers and the output pre-drivers. The VCC3I net
supplies the 3.3V input receivers and the I/O control logic. Additionally, the VCC3O net supplies
the 3.3V output buffers. Every power net has its ground counterpart net. Therefore, there are
also the GNDK, GNDI and GNDO ground nets. The connection to these power and ground
nets is done through special I/O cells. These I/O cells provide the connection between the pad
and the internal power and ground nets. Hence, to use separate power and ground I/O cells to
individually connect to all of these power nets, it is required a minimum of 6 power and ground
pads in the design. The adopted library also provides power and ground I/O cells (named VCC3IO
and GNDIO) that simultaneously supply the VCC3I and VCC3O power nets and the GNDI and
GNDO ground nets, thus reducing the minimum number of power pads to 4. However, these cells
have less current driving capacity and should only be used if the expected current needed by the
I/O cells is reduced.
Other cells, with specific functions, are also included in the library. Corner cells, for instance,
are provided to allow continuity of I/O power rings in the corners of the die. Since a chip can have
only inline I/O cells or only staggared I/O cells or a mixture of inline I/O cells on one of its sides
and staggared I/O cells on the other, there are three different types of corner cells (inline-only,
staggered-only or inline-staggered). There are also empty cells to be added to the I/O ring that
35
Page 56
4. ASIC Design
Figure 4.4: Power rings for I/O buffers and core cells.
provide continuity of the well and also power and ground rails for I/O. Usually, I/O power rings
would have to be placed by the designer. However, in this library the I/O power rings are already
included in the I/O and empty cells layout. Therefore the only thing the designer has to be certain
of is that all I/O cells and empty cells are placed adjacently (abut) (are adjacent).
Faraday’s Standard Cell Library also includes Electrostatic Discharge (ESD) protection cir-
cuitry in I/O cells, to prevent an ESD event from damaging the circuit. All I/O cells include these
components to provide current paths for ESD events. According to [17], when this library is used,
the designer only has to make sure that the pads that supply the VCC3I and VCC3O power nets
are connected to the same pin in the package. The same is required for the pads that supply the
GNDI and GNDO ground nets. If the designer uses the VCC3IO and GNDIO cells, this rule can
be ignored, because such a connection is already assured in the I/O cell.
Another special cell that is included in the library, is the filler cell. This cell can be used to fill
in the empty spaces between standard core cells, in order to provide continuity of the well and,
if determined by the designer’s choice, to provide decoupling capacitance. Tie1 and Tie0 cells
are also provided to allow connection of nets to power and ground, respectively. It is advisable to
connect all the nets with fixed logic values to these cells, instead of directly connecting them to
power or ground nets in order to keep ESD robustness (this rule does not apply to I/O cells inputs
which can be directly connected to power or ground) [15].
Due to being acquired through EUROPRACTICE, the library does not include the layout view.
As a consequence, the designer is unaware of the cells and memories layout. This lack of layout
information inhibits some types of analysis as they require this information.
The adopted standard cell library is designed for the UMC L180 1P6M GII Logic process.
However, the available process in the mini@SIC program is the UMC L180 1P6M MM/RFCMOS.
The basic difference between these two processes is the thickness of the top level metal layer. In
the GII Logic process the metal 6 layer is 8kA thick while in the MM/RFCMOS process the metal
6 layer is 20kA thick. Therefore, a different set of topological layout rules exists for the metal
6 layer. Since the layout rules for the thick top level metal process (20kA) are stricter, designs
36
Page 57
4.4 Library and Technology Characterization
that follow the layout rules for the 8kA process will fail on DRC checks of the 20kA process.
Consequently, if the metal 6 layer is used, the designs that are implemented using the adopted
standard cell library will fail the DRC checks of the 20kA process. Therefore, the metal 6 layer
will not be used for routing in this project, in order to avoid DRC violations.
4.4.1 Memories
The memory devices available in this library include single and dual port SRAM and have the
interfaces shown in figure 4.5. The single-port SRAM, used in the program memory, supports both
word write and byte write operations (the WEB port includes the write enable signals for each of the
word’s bytes). This is particularly useful since the processor’s program memory loading procedure
is done using a one byte interface. Data is input through port DI and stored in the memory
position addressed by A. The write operation is performed in a given byte of the memory word
if the respective byte-write enable signal, in port WEB, is low. Memory read and write operations
are only performed if the CS signal is high. The three-state output buffers are only active if the OE
signal is high.
Dual-port SRAMs allow independent read and write access to the memory contents through
both ports (portA and portB). Each port has its own clock signal (CLKA and CLKB). It is up to
the designer to assure that accesses made through both ports maintain data coherence. The
dual-port SRAMs, used in the macroblock and search area memories, are 8-bit (1 byte) word
memories. Therefore, in these two memories, the byte-write capability is not used. Nevertheless,
the generic model for these memories also supports byte-write in both ports through the WEAN and
WEBN ports. These memories also possess two pairs of access ports, DIA and DIB for data input,
and DOA and DOB for data output. Address ports A and B specify the address for each port, while
the OEA and OEB signals control operation of the three-state buffers of each port. The chip select
signal CSA and CSB allows control over the operation of each port.
These memories are also represented in the design as the standard cells, using several dif-
ferent views. The memory views are usually generated by a memory compiler, which is capable
of generating a predetermined set of memories. When using the EUROPRACTICE services,
these memories are generated on request. However, the supplied memories do not have all the
necessary views for the synthesis tools. Such absence does not compromise the resultant circuit,
because memories are defined using a structural description (they are explicitly instantiated in the
VHDL code and not inferred by the synthesis tool), but impairs the analysis by the synthesis tools.
For instance, timing, power and area analysis do not take into account the memory elements.
37
Page 58
4. ASIC Design
CK CS
DODI
OE
AWEBM
N N
K
(a) Single Port SRAM.
A
B
DIA
DIB
OEA
OEBCKA CKB CSA CSB
DOA
DOB
WEAN WEBN
K K
N
N
M
M
N
N
(b) Dual Port SRAM.
Figure 4.5: Memory interfaces.
4.5 Packaging
After being manufactured, the chip has to be encapsulated. Encapsulation protects the silicon
die from environmental aggressions and assures a mechanically robust interface. Packages for
IC encapsulation are available in several materials, pin count and form factors. Some packages
are meant for permanent placement while others are designed to be connected using sockets and
sustain the mechanical stress of being inserted and removed from the socket.
While at prototyping stage, the AMEP package should be socket oriented, because the test
platform is unique and, as such, multiple prototypes will need to be tested using a single socket.
Depending on the package manufacturer and product line, the available pin counts may vary
significantly but they are usually available at discrete values. By using the EUROPRACTICE
Packaging service, several ceramic packages are available. Among these, the Ceramic Leadless
Chip Carrier (CLCC) provides a square package with socket connection capability. The available
pin count for this type of package is 44 or 68 pins (in the required range for the AMEP).
The AMEP functional interface requires 35 signal pins. With the additional power and test pins,
the required number of pins in the package will be greater than 44. As a consequence, the CLCC
package with 68 pins was chosen.
The package area, where the die is placed is usually square. If the die is also square, then
bonding should offer no difficulties. However, if the die has a rectangular shape, some package
pins may not be available for connection, as they may violate the maximum and minimum angle
between the bondwire and the package [18]. Since the AMEP die shape is rectangular (as will
38
Page 59
4.6 Pin Positioning
be seen in Chapter 7), and due to the adopted pad pitch (distance between the centers of two
adjacent pads), die dimensions and packaging rules, a maximum of 56 pins on the CLCC 68
package are available for bonding.
4.6 Pin Positioning
The pad pitch must be such that it complies with the minimum requirements of the technology
and of the bonding process. The minimum pad pitch required by the technology is 60µm [16]. The
recommended pad pitch by EUROPRACTICE for bonding is 90µm [18]. A pad pitch lower than
this recommended value will incur in extra costs. Since the design is not I/O limited, the 90µm
value is adequate and was adopted as the minimum pitch in this work.
The distribution of the signals through the various pins on the package has impact on cross-talk
effects, IR (voltage) drop inside the chip, clock and signal delays and routing congestion (either
inside the chip and outside the board). For instance, clock pads have to be relatively distant from
power and ground pads, to avoid interference on these lines.
In order to have a balanced power distribution, two pairs of VCC/GND power cells were used
to supply enough current to the core and these were positioned on opposite sides of the chip.
Two pairs of VCC and GND pads for I/O cells (that simultaneously supply VCC3I and VCC3O)
were also added to the design. Figure 4.6 shows the considered disposition of the I/O cells in the
AMEP layout.
Address[2]
Address[1]
Address[0]
GNDK_1
CLK
VCCK_1
TDI
Address[18]
Address[19]
#oe_we
VCCK_2
TCK
GNDK_2
en
rst
Add
res
s[3
]A
ddre
ss[
4]
Add
res
s[5
]A
ddre
ss[
6]
Add
res
s[7
]V
CC
3IO
_1
Add
res
s[8
]A
ddre
ss[
9]
Ad
dre
ss[1
0]A
dd
ress
[11]
Ad
dre
ss[1
2]
Ad
dre
ss[1
3]A
dd
ress
[14]
Ad
dre
ss[1
5]A
dd
ress
[16]
Ad
dre
ss[1
7]
GN
DIO
_1
tes
t_m
ode
TR
ST
TM
ST
DO
gn
tG
ND
IO_
2re
qdo
ne
Da
ta[0
]D
ata
[1]
Da
ta[2
]V
CC
3IO
_2
Da
ta[3
]D
ata
[4]
Da
ta[5
]D
ata
[6]
Da
ta[7
]
OUTPUT
Power/Ground
INPUT
Bidirectional
Legend
CHIP CORE
Figure 4.6: Diagram of I/O cells position.
39
Page 60
4. ASIC Design
40
Page 61
5FrontEnd - From Behavioral VHDL
code to Verilog netlist
Contents5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 25.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
41
Page 62
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
5.1 Introduction
In this section, it is described the process to perform the compilation of the VHDL source
code into a gate-level Verilog netlist, mapped in the selected technology (the frontend stage).
Three distinct workflows are used at this stage: a simple flow without any changes to the design
structure, a second flow which provides insertion of scan structures and a third flow which includes
the insertion of JTAG structures. The flow that inserts the scan structures is an extension of the
basic flow. On the other hand, the flow for JTAG structures insertion is an independent flow, that
can be performed after any of the previous two flows. The principal characteristics and capabilities
of the used tools in the frontend are also described.
5.2 Tools
In this work, the Synopsys Inc. software package was used to perform the synthesis of the
HDL code, as well as the insertion of the test structures in the circuit. Although several soft-
ware manufacturers provide synthesis tools, the Synopsys package was chosen because it is the
software with the best support and is the industry’s de facto reference software for synthesis.
The software package is composed of several tools. These tools are instantiated by Design
Compiler, the main application of this package, whenever they are needed. This significantly
reduces the designer’s workload because all the functions can be integrated into one single tool,
using a single interface. The Design Compiler can be accessed either through a command-line
interface or through the Graphical User Interface (GUI). Although, in certain situations, the use
of the command-line is useful, the GUI is more user-friendly but it is also able to convey more
information to the designer, which is especially useful when the design is under a development
stage. There are two available graphical interfaces: Design Analyzer and Design Vision. Since
Design Vision is the most complete and functional GUI and is recommended by Synopsys, it was
chosen as the main interface.
The version of the Synopsys tools used in this work is Version Y-2006.06 for Linux – May 25,
2006. This software was supplied by the EUROPRACTICE software package.
5.2.1 Design Compiler
Design Compiler is the synthesis and timing analysis application from Synopsys Inc.
The designs described using a HDL are compiled and mapped into a Generic Technology
(GTECH) of cells. These GTECH cells are technology independent cells that describe the func-
tions of certain blocks that can then be implemented in any technology. For instance, a generic
sequential cell could be implemented as a D Flip-Flop with associated logic to perform a syn-
chronous reset.
42
Page 63
5.2 Tools
The Design Compiler allows a designer to set optimization constraints, which guide the pro-
gram to find the closest solution of the designer’s objectives. Besides these optimization con-
straints, design rule constraints are also used by Design Compiler. These include maximum
fanout, maximum capacitance, maximum transition time and other constraints. The design rule
constraints are set in the technology libraries and take precedence over optimization constraints.
Moreover, a designer may even override the technology design rules, by making them more re-
strictive. However, setting too many constraints or setting unrealistic values for the constraints
may have an adverse effect and guide the algorithms into a solution that is far from the designer’s
objectives and from the solution that could be achieved without such tight constraints.
In the compile phase, Design Compiler optimizes the design and translates the GTECH cells
into the target technology cells. It is during this optimization and translation process that the
defined constraints are taken into account. These constraints are used to calculate cost functions
and are prioritized according to table 5.1 [19].
Table 5.1: Cost Function default priority
Priority (descending order) Constraint typemaximum transition time Design Rule Constraintmaximum fanout Design Rule Constraintmaximum capacitance Design Rule Constraintcell degradation Design Rule Constraintmaximum delay Optimization Constraintminimum delay Optimization Constraintmaximum power Optimization Constraintmaximum area Optimization Constraint
Two cost functions are calculated during the gate-level optimization of the compile phase.
These are the Design Rule Cost Function and the Optimization Constraints Cost Function, which
group the design rule constraints and the optimization constraints, respectively. The cost functions
are calculated based on the differences between the values set for the constraints and their actual
values. The objective is to set these cost functions to zero. The Design Compiler evaluates each
component independently, in order of importance, and accepts an optimization step if it decreases
the cost of one component without increasing higher priority costs [19].
The design rules cost function is calculated according to equation 5.1.
Cost design =∑
∆max transition +∑
∆max fanout +∑
∆max capacitance (5.1)
The optimization constraints cost function is calculated according to equation 5.2
Cost optimization =∑
∆max delay +∑
∆min delay +∑
∆max power +∑
∆max area
(5.2)
Design Compiler uses the concept of path groups, to perform time related optimizations and,
consequently, to calculate the cost functions. A path group is a set of paths that can be implicitly or
43
Page 64
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
explicitly set. Path groups are implicitly set when a clock signal is defined. All the paths between
clocked elements by that clock signal are automatically added to the same path group. A user can
explicitly define other path groups, according to his needs. The path groups can be used to guide
the Design Compiler in performing timing optimizations in circuit regions set by the designer. In
this work only one clock signal exists, and since there was no need for particular optimizations,
no additional path groups were defined.
Among the various cost functions components, the maximum delay has a particular impor-
tance, since it influences an important goal: the maximum working frequency. The maximum
delay cost function can be determined using two methods: the Worst Negative Slack Method or
the Critical Range Negative Slack Method. The Worst Negative Slack Method takes into account
only the delays of the worst violating path in each path group (the critical path). The Critical Range
Negative Slack Method takes into account the violating paths of each path group that are within
a specified delay margin (referred to as the critical range) of the worst violator [19]. The latest
method, although more computationally intensive, has the advantage of optimizing not only the
critical path but also the near critical paths that might become critical after Place and Route (P&R),
because complete timing information, at this phase, is not yet available.
5.2.2 DFT Compiler
The DFT Compiler is responsible for determining the architecture of scan structures and their
insertion into the design. This tool is integrated with the Design Compiler and DFT commands are
passed and processed by DFT Compiler. Therefore, it provides integrated design-for-test capabil-
ities, including constraint-driven scan insertion during compile. The DFT Compiler is responsible
for the replacement of normal cells with scan cells and for the interconnection between them, to
form the scan chains. During this process, additional signals and input pins are inserted, to allow
the scan chains to be controlled from the primary inputs. DFT Compiler is also responsible for
generating the appropriate output files for ATPG and for ATE operation.
Several scan styles are supported by DFT Compiler, namely: the Multiplexed Flip-Flop Scan
Style, the Clocked-Scan Scan Style, the LSSD Scan Style and the Auxiliary-Clock LSSD Scan
Style [7]. The designer may choose among these scan styles the one that best fits his require-
ments and that can be supported by the cells available in the target technology.
DFT Compiler is capable of performing scan chain insertion either on unmapped designs
(from a HDL source), or on mapped designs without scan structures (from a netlist) or on mapped
designs with scan structures (in this case, DFT Compiler only optimizes the netlist). Inserting
scan structures on unmapped designs achieves the best results, since DFT Compiler and Design
Compiler can work simultaneously on the same design to perform constraint-driven scan insertion
(this is named by Synopsys as a Test-Ready Compile) [20].
A test protocol must be also created during the DFT Compiler session. The test protocol de-
44
Page 65
5.2 Tools
fines test signals and their timing and initialization sequences. The test protocol can either be
automatically generated, based on the signal definitions given to DFT Compiler, or by reading a
Standard Test Interface Language (STIL) file. Test initialization sequences are patterns that must
be sequentially set in a circuit’s inputs so that it may enter in test mode. These initialization se-
quences must be given to DFT Compiler when the design includes certain custom test structures
that are already defined in the source files and that may have relatively complex or non usual
forms of entering test mode. If the design requires a test initialization sequence, it has to be de-
scribed in the STIL file, since DFT Compiler does not support this type of definitions using internal
commands.
By using the defined test protocol, DFT Compiler is capable of performing DFT DRC analysis
to determine which, if any, test rules violations occur. DFT DRC checks for violations that prevent
scan insertion, data capture or that reduce fault coverage. For instance, an uncontrollable clock
or an uncontrollable asynchronous control signal of a flip-flop prevents that flip-flop from being in-
cluded in a scan chain. If the asynchronous control signals of a given flip-flop are asserted during
the test procedure, that will also prevent the flip-flop from being inserted in the scan chain. A data
capture violation is reported if the clock signal drives the data input or more than one input pin of
the same flip-flop, or if a black box component drives the clock or an asynchronous control signal
of a flip-flop. If a three-state bus contention occurs a data capture violation is also reported [20].
The use of black boxes in the designs, as is the case of the processor memory blocks, reduces
fault coverage as the outputs of such blocks cannot be determined by DFT Compiler.
Violations that may be present in the design should be corrected in the HDL description. Nev-
ertheless, DFT Compiler offers the possibility to automatically correct some of the violations using
a feature called AutoFix. This feature automatically fixes scan rule violations associated with un-
controllable clocks, uncontrollable asynchronous set and reset signals and three-state signals.
The AutoFix feature is able to fix violations in LSSD and Multiplexed Flip-Flop Scan Style de-
signs [20]. Autofix adds multiplexers to the violating flip-flops signal inputs to allow them to be
controlled during test mode. Besides the signal needed to control these multiplexers (the test
mode signal), additional signals and ports may be added to the design by AutoFix.
To control the scan chains operation, scan enable and test mode signals are used. The scan
signal controls the multiplexer of the multiplexed scan flip-flops in the design. It allows the flip-flops
to select their data input between the regular circuit connection (in normal mode) or the output
of the previous flip-flop in the serial scan chain (in serial shift mode). The test mode signal is
responsible for maintaining all the flip-flop’s control signals (reset, preset) deasserted during the
test procedure. This is necessary, because if the control signals are generated by internal logic,
they may be asserted during test, which will make the generation of test vectors more complex
and may even impair the test procedure. The scan enable signal is active when in scan mode
(serial shift) and the test mode signal is active during the entire test procedure.
45
Page 66
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
5.2.3 BSD Compiler
The BSD Compiler is responsible for implementing the IEEE 1149.1 standard and for veri-
fying that the design complies with it. It is able to the insert boundary scan cells and the TAP
controller, as well as producing the necessary files to make the device interoperable with IEEE
1149.1 compliant test equipments. This tool is also integrated into the Design Compiler synthesis
environment.
BSD Compiler requires the definition of a top level design in which all inputs to the core logic
and the inputs for IEEE 1149.1 functionality are defined and have I/O pad cells associated with
them. Only the inputs to the core logic should be connected to the core design. Figure 5.1 shows
the required interface [21].
ENBTDI
TMS
TRST
TCK
test_si
test_se
i 1
i n
test _si
test_se
i 1
i n
test _so
o1
o2
on
test_so
Top Level Design
Core Design
.
.
.
.
.
.
.
.
.
o1
o2
on
.
.
.
TDO
Figure 5.1: Top level design structure required by BSD Compiler.
According to the IEEE 1149.1 standard, the TMS, TRST and TDI input lines have to behave like
if a logic 1 was applied to it when that input is undriven. In this work, this can be accomplished by
using pull-up resistors that are enabled by configuring the respective PIOS cells available in the
library.
The BSD Compiler is also responsible for generating the Boundary Scan Description
Language (BSDL) file that contains the following information: the nature of the pins in the de-
sign (input, output or bidirectional pin), the logical correspondence between signal names and
physical pins, the identification of the pins used by the IEEE 1149.1 TAP interface, the description
of the instruction register, the implemented instructions and their opcodes and which data shift
register is accessed by each instruction and a description of all BSR cells and their functionality
(e.g. observe-only, observe-and-control).
The generation of test vectors to test the boundary scan logic and the TAP controller is per-
formed by the BSD Compiler. The generated test vectors can then be simulated by TetraMax.
46
Page 67
5.2 Tools
These test vectors are generated by BSD Compiler, instead of using TetraMax ATPG capabilities,
because the BSD Compiler has an architectural knowledge of the inserted logic that TetraMax is
unaware of. Therefore, the BSD Compiler is capable of generating the test vectors without us-
ing generic algorithms that would require more computational effort and the definition of complex
initialization patterns.
5.2.4 TetraMAX
TetraMAX is the ATPG tool from Synopsys. It is capable of generating test patterns that max-
imize fault/test coverage using a minimum number of test vectors in various design types and
flows. Functional and stuck-at testing are the traditional circuit testing methods. Functional test-
ing exercises the device as it would actually be used in the target application. However, this type
of testing has only a limited ability to test the integrity of the devices internal nodes. With scan
testing, the sequential elements of the device are connected into chains and used as primary in-
puts and primary outputs for testing purposes. By using ATPG techniques, a much larger number
of internal faults may be tested than with functional testing alone [6].
This tool has three different ATPG modes: Basic-Scan ATPG, Fast-Sequential ATPG and Full-
Sequential ATPG. In Basic-Scan mode, TetraMAX works as a full-scan, combinational-only ATPG
tool. By using this mode, all sequential elements have to be included in a scan chain in order
to achieve a high-fault coverage. Fast-Sequential mode provides limited support for partial-scan
designs (designs where not all sequential elements belong to scan chains). This mode allows
multiple capture procedures (clock transitions) between scan load and scan unload, allowing data
to be propagated through nonscan sequential elements like nonscan flip-flops and Random Ac-
cess Memorys (RAMs). In this case, all clock and reset signals of these nonscan elements must
be controllable at a primary input. Full-Sequential ATPG is similar to Fast-Sequential ATPG, al-
though in this case the clock and reset signals of the nonscan sequential elements do not need
to be controlled at a primary input [6].
TetraMAX is capable of generating test patterns for the following fault models: SSF, IDDQ,
transition delay, path delay and bridging. Among these, the SSF is the adopted fault model in
this work. The only required files for generating test vectors is the design netlist and the models
(described in Verilog) of the used cells. For complex designs, TetraMAX also requires a test
protocol file, where specific information about test structures and how to properly use them is
given [6]. This file contains test initialization procedures, capture procedures, shift procedures
and others that allow TetraMAX to set proper values at the test structure’s inputs to effectively use
them.
For all designs TetraMAX needs to have information identifying the clock ports, asynchronous
set and reset ports, scan chain inputs and outputs, ports that place the design in test mode,
that enable shifting of scan chains and that globally control bidirectional drive and their active
47
Page 68
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
states [6]. In simple designs, some of this information could be given directly by using TetraMAX
commands. On more complex designs, a STIL test protocol file must be provided which includes
all the necessary information.
TetraMAX also performs design checks that verify, among other aspects, the connection of the
scan chain’s inputs and outputs, if all clocks and asynchronous set and reset signals connected
to scan chain flip-flops are only controlled by primary input ports and if any internal multiple-driver
net can be in contention.
5.3 Workflow
5.3.1 Basic workflow
The basic workflow, that was implemented in this work using the Synopsys Design Compiler
tool, is outlined in figure 5.2. The figure includes the commands used to accomplish each of the
steps. These commands are found as part of the script presented in section B.1.2.
Specify libraries
Read design
Define design environment
Set design constraints
Select compile strategy
Optimize/Map design
Analyze and resolve design problems
Save the design database
Link_libraryTarget_librarySymbol_librarySynthetic_library
AnalyzeElaborate
Set_operating_conditionsSet_wire_load_model
Create_clockSet_clock_uncertaintySet_max_dynamic_power
Top_down
Compile
Check_designReport _areaReport _constraintReport _timing
Write
Develop HDL Files
Figure 5.2: Synopsys Basic Workflow.
Before starting the synthesis process, the designer has to make sure that all the necessary
libraries are available. These libraries are files that describe the available standard cells that can
be used during the mapping process, as well as their characteristics. The link and target libraries
48
Page 69
5.3 Workflow
are technology libraries that define the set of cells and related information, such as cell names, cell
pin names, delay arcs, pin loading, design rules, and operating conditions [19]. The symbol library
contains the symbols for schematic viewing of the design. It must be present if the GUI, Design
Vision, is to be used [19]. The location of all these libraries can either be set in the configuration
files or using the command line. In this work, the libraries were set using the configuration files
because they remain the same throughout the implementation flow.
After the definition of the libraries, the design was read into the Design Compiler work envi-
ronment using the HDL Compiler, which interprets the VHDL code with the circuit’s description
and converts it into a logic gate description, using GTECH cells. These stages correspond to the
“analyze” and “elaborate” commands, in the workflow.
The operating conditions and wire load model are set in the next stages. There are several
operating conditions defined in the libraries, that represent different process corners. Normally,
the Typical conditions, Best conditions and Worst conditions are defined. The Best conditions
setting is used to determine hold violations, while the Worst conditions setting is used to deter-
mine setup violations. Best conditions and Worst conditions were simultaneously used, during
synthesis, to allow the synthesis tool to perform timing analysis and cell selection based on the
most unfavorable conditions. The wire load model is an estimate of the characteristics (area, ca-
pacitance and resistance) of the interconnecting nets after routing. Since, at this stage, there is
no information regarding interconnection nets, this estimate is necessary in order to assess the
delays and perform timing analysis of the design before Place and Route (P&R). These wire load
models are predefined in the libraries. The choice for the specific wire load model that will be used
is based on the designers perception of the design’s interconnection characteristics after routing.
After P&R, it is possible to validate the choice and, if it produced non-optimum results, select
another wire load model and perform a new iteration. The chosen wire load model has influence
on the synthesized circuit, since there are various available cells, with the same logic function but
with different characteristics (area, drive strength, propagation delays, consumed power and input
capacitance) that can be selected by the synthesis tool in order to achieve its goals (whether an
area goal, a timing goal or even both). If the chosen wire load model represents interconnec-
tions with small resistance and capacitance values, the synthesis tool will choose cells with lower
drive capacity, because they still accomplish a given timing constraint and have a smaller area.
Nevertheless, if after P&R the interconnections have more resistance and capacitance than the
values of the wire load model, the timing constraints may be violated. On the other hand, if the
wire load model represents interconnections with high resistance and capacitance values, then
the synthesis tool may choose cells with high drive capacity that will, unnecessarily, occupy more
area and consume more power. In this work, the model named “G30K” was chosen. The “G30K”
is a mid-range model, that seems adequate for a design with the characteristics of the AMEP.
The next stage in the design flow is to set the constraints were then set using the appropriate
49
Page 70
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
commands. Since there is no need to restrict the design rule constraints already defined in the
libraries, only the optimization constraints were set in this work. Among the several possible con-
straints available, only the clock, the clock uncertainty and the maximum power constraints were
set. The clock constraint specifies the clock period of the circuit, while the clock uncertainty spec-
ifies the allowed clock skew. These constraints are only guidelines for the synthesis tool and their
compliance must be verified in subsequent steps. Note that a wire load model is used instead
of real wire characteristics. Consequently, if the synthesis tool reports that the design complies
with the timing constraint, it does not necessarily mean that in the final circuit, after P&R, the
constraints are still met. Therefore, subsequent checks will be made during and after the P&R.
The value that was set for the clock period constraint (10ns) was based on the maximum clock
frequency obtained in previous synthesis results for the FPGA implementation of the circuit, and
the required characteristics of the AMEP [2]. The final maximum power constraint (9.5mW) was
set after a few iterations that revealed approximate values for the circuit’s power consumption. It
is worth noting that this power estimate does not take into account the memory blocks, because
of the unavailable views for the synthesis tool. Moreover, after an initial iteration, it was verified
that the occupied area (1.20 mm2) was much less than the available area (5.2 mm2). As a con-
sequence, since unnecessary constraints should not be used, the maximum area constraint was
not set.
The top-down compile strategy was chosen since the design constraints are set at the top
level, as a global objective. The synthesis process of the various sub-blocks of the design is
automatically done as well as the optimization of the various blocks, in order to guarantee that
the global constraints are met. This reduced the workload and achieved the necessary goals.
Alternatively, it is possible to set individual constraints and perform compilation on each of the
sub-blocks. Afterwards, the design would be assembled with the already compiled sub-blocks.
According to the basic basic synthesis workflow shown in figure 5.2, design compilation should
be done in the next stage. During this phase, the design is mapped and optimized according to the
defined constraints. If after compilation the design would not meet the specified constraints, new
constraints should be set or the design should be changed. Then, after compilation, the design
is saved and exported into the formats required by the back-end tools. In this case, the design
would be saved in a proprietary format (DDC) and would also be exported as a Verilog netlist.
The timing constraints would be saved in a file using SDC format, in order to be later imported
into the P&R tool.
5.3.2 Workflow with insertion of scan chains
The basic flow, described in section 5.3.1, does not take into account for test structures. Since
the implemented design requires test structures that are not defined in the VHDL source, the pre-
vious workflow (shown in figure 5.2) must be extended to insert and architect these test structures.
50
Page 71
5.3 Workflow
The workflow that was followed in this work to achieve the insertion of scan chains is outlined in
figure 5.3.
Read design
Define design environment
Set design constraints
Select compile strategy
Run Test-Ready compile
AnalyzeElaborate
Set_operating _conditionsSet_wire_load_model
Create_clockSet_clock_uncertaintySet_max_dynamic_power
Top_down
Compile -scan
Set scan style
Check design rules
Set_scan_configuration
Define clocks and asyncs
Set_dft_signal
Create_test_protocolDft_drc
Correct problems
Check constraints Adjust constraints or compile strategy
Check design rules
Create_test_protocolDft_drc
Correct problems
Set scan configuration
Build scan chains
Optimized netlist with scan
Check design rules Correct problems
Check constraints Adjust constraints or compile strategy
Save testable design
Set_scan_configurationSet_scan_path
Insert_dft
report_constrain
Dft_drc
report_constrain
Preview scan chains
Preview_dft
Adjust scan configuration
Develop HDL Files
Figure 5.3: Synopsys Workflow with scan structures.
Synopsys tools support various workflows to insert scan structures, depending on the initial
state of a design. The workflow outlined in this work follows the Unmapped Design Flow [20],
because the initial design is read from a VHDL description without any defined scan structures,
which are subsequently inserted along with the basic design flow. The commands shown in this
picture can be found in the script presented in section B.1.2.
This flow starts by defining the libraries, reading the design files, defining the design envi-
ronment and setting the design constraints, just like the basic flow. Afterwards, it is necessary
51
Page 72
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
to define the scan style. In this work, the chosen scan style was the multiplexed flip-flop scan
style, as this is the only scan style supported by the cells of the selected target technology. Next,
the primary input and primary output pins that are used by the scan structures were set. This
directs the DFT Compiler to use specific pins for the test signals. In this work, the pins used by
the test structures are I/O cells that were instantiated in the VHDL code. The DFT Compiler must
be instructed to use these internal pins (hookup pin) as source of the test signals, otherwise it
would create additional ports in the top-level design. The signals required to be specified, for the
implemented test structures in this work, are the scan clock (which is also the system clock), the
scan enable signals (test se and test se2), the test mode signal (test mode) and the scan in
(test si1 and test si2) and scan out (test so1 and test so2) signals. Any set or reset signals
should also be specified, so that the Design Compiler is aware of its function (in this case, only
the reset signal, rst,was defined).
The next step is the creation of a test protocol. By using the previously defined signals, the DFT
Compiler automatically generates the test protocol. Since this circuit does not have any special
test initialization sequence, the automatically generated test protocol is sufficient. After creating
the test protocol, the DFT DRC was performed. At this stage, if the design presents violations,
these should be carefully analyzed to determined whether or not they may or should be corrected,
in order to improve test coverage. If such corrections need to be done, the HDL source files will
have to be edited or, if they can be automatically corrected, the AutoFix feature will have to be
enabled. At this stage, this project presented a few DFT DRC violations (uncontrolable flip-flop
reset lines and clock feeding data input violations). The violations involving the uncontrolable
reset signal of the flip-flops can be fixed using the AutoFix feature. Since the remaining violations
required changing the processor description and were of minor importance and with small impact
on the fault coverage, they were not corrected.
At this phase, the design is ready to be compiled. The compile strategy is the same as in
the basic workflow (a top-down compile strategy). The Test-Ready Compilation is done using the
additional switch in the compile command (-scan), as shown in figure 5.3 and in section B.1.2.
After compilation, the design already has the scan cells inserted on it, but they are not yet inter-
connected to form the scan chain shift register. The connection of the flip-flops to form the scan
chains is performed in a later stage. At this stage, it was verified that the imposed constraints were
still met. If they were not, either the set of constraints should be changed or a different compile
strategy should be chosen and, afterwards, a new compile iteration would be undertaken.
The test protocol was regenerated at this point and the DFT DRC was performed once again,
in order to check if any additional violations have appeared due to the compilation process. If no
violations are reported or if it is considered that the reported violations are of minor importance
the next step in the workflow is the configuration of the scan chains. If there are major violations,
then the test signals or their timings must be redefined and a new test protocol generated. In
52
Page 73
5.3 Workflow
this work, the violations that exist at this point are the same violations that existed in the last DFT
DRC, since the AutoFix has not yet been performed.
The configuration of the scan chains involves the determination of how many scan chains
should exist, which scan elements (flip-flops) belong to each of them and which signals control
each of the chains. The implemented circuit requires one independent scan chain, formed by the
error address registers of the three memory BIST controllers (scan chain 1). Such chain allows
the extraction of the addresses of possible faulty memory positions, without interfering with the
rest of the circuit. A second scan chain was built with the remaining flip-flops (scan chain 2). By
analysing the result of the two scan chains, it can be observed that the main scan chain has about
750 flip-flops while the scan chain used to extract the values from the memory BIST controllers
has 30 flip-flops. Usually, scan chains should have the same number of flip-flops, to minimize
the test time. Nevertheless, this is not the case with the implemented circuit, because one of the
chains has a very specific purpose (address extraction without interference with the remaining
circuit) which restricts the number of flip-flops in it.
Since the previewed scan chains had the intended configuration, these chains were effectively
implemented in the circuit. The implementation of the scan chains is carried out by connecting the
scan flip-flops that were previously inserted in the design at compile time. At this stage, the whole
set of multiplexers and logic elements that AutoFix found necessary to resolve violations were
also added. After building the scan chains, the result was an optimized netlist that represented
the circuit which performs as described in the HDL source files and that also includes the test
structures to provide a better test coverage.
A final DFT DRC was then performed to ensure that there were no violations or that the ones
that eventually exist are tolerable. In this work, besides the violations of the clock signal feeding
a data input, there were violations regarding the enable signal of the output three-state buffers
being affected by the value of a scan chain element. This last violation is also tolerable since the
AMEP
data
addr
#oe_we
8
20
done
req
gnt
clk en rst
test_se1 test_se2 test_mode
test_si1
test_si2
test_so1
test_so2
2
ram_bisten
Figure 5.4: AMEP interface after inserting scan chains.
53
Page 74
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
circuit will not be tested when other devices are connected to the bidirectional bus. A final check
to assure that the design still complies with the imposed constraints was also done. If the resulting
design had violations or if it did not meet the set constraints, new iterations should be performed.
Finally, the design was saved (in DDC format) along with its test protocol (in STIL format). The
output netlist was written in Verilog so that it could be imported into the P&R tool.
After the insertion of the scan structures, the AMEP has the interface shown in figure 5.4. The
test mode signal is used to force the circuit into test mode, which allows control of the reset inputs
of the flip-flops from a primary input. The test se1 and test se2 inputs control the operation of
the scan chain 1 (for memory BIST address extraction) and of the scan chain 2, respectively. Data
input for the respective scan chains is done through the test si1 and test si2 port. Data output
is done through the test so1 and test so2 ports.
5.3.3 Workflow with JTAG insertion
The workflow to insert the JTAG interface can begin from the stage where the gatelevel netlist
of the design is read or by continuing at the stage where the workflow with scan structures in-
sertion ended, as is the case of this implementation. Figure 5.5 shows the considered workflow
containing only the steps that were done by BSD Compiler.
Read design netlist or continue previous
workflow
Set boundary scan specifications
Set design constraints
Preview Boundary Scan
Insert Boundary Scan Logic
Generate BSD patterns
Set_dft_signalSet_bsd_instructionSet_scan_path
Create_clockSet_clock_uncertaintySet_max_dynamic_power
Preview_dft
Insert_dft
Generate Gatelevel Netlist
Generate BSDL file
Read Pin Map
Create_bsd_patterns
Read_pin_map
Write_bsdl
Write
Figure 5.5: Synopsys JTAG Workflow.
54
Page 75
5.3 Workflow
The insertion of the JTAG logic is done by Synopsys BSD Compiler and requires a special top
level design, as mentioned earlier. The used interface is shown in figure 5.6. With this purpose,
a new VHDL entity was defined. In this entity, the I/O cells for the JTAG tdi, tdo, tms, tck and
trst signals were instantiated. The core logic, which already contains the I/O cells, was also
instantiated and the proper connections were done.
AMEP CORE
ENBTDI
TMS
TRST
TCK
CLK ADDR
Top Level Design TDO
EN
RST
GNT
test_mode
#OE_WE
DONE
REQ
CLK
EN
RST
GNT
DATA
ADDR
#OE_WE
DONE
REQ
20test_se1test_se2
test_si1test_si2 test_so1
test_so2
8 DATA
t est_mode
ram_bisten
2
Figure 5.6: AMEP interface for JTAG insertion by BSD Compiler.
With the use of a JTAG interface, the memory BIST enable signals and the test structures
control signals can be controlled through the TAP controller, with exception of the test mode
signal. Therefore, the use of the I/O cells for the ram bisten, test se, test se2, test si1,
test si2, test so1 and test so2 signals is not needed, as these signals will be driven by the TAP
controller logic or their input and output will be done through the TDI and TDO ports of the JTAG
interface. As it can be observed in section B.1.2, the removal of these cells and the connection of
the resulting opened nets is done using the commands available in Design Compiler.
After setting the operating conditions, the design constraints and after defining the existing
clock signals for the core logic, the ports associated with each of the JTAG interface signals were
defined. It is important to define the clock signals for the core logic, because the boundary scan
cell for a clock pin should be an observe only cell and if a clock signal is not specified, BSD
Compiler will place a control-and-observe cell in that clock input.
In order to properly connect the Boundary Scan Register (BSR) cells, the BSD Compiler
should have information of the pin mapping used after packaging, in order to make the con-
55
Page 76
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
nections between adjacent scan cells. This is defined in a pin mapping file. Such mapping should
be equal to the pin layout order, used in the P&R phase.
The configuration of the TAP controller is performed in the next step. In this circuit, a 4 bit in-
struction register was used, adopting a binary instruction format, which gives a total of 16 possible
instructions. Among these, the three mandatory instructions EXTEST, SAMPLE and PRELOAD
must have their opcode defined. The mandatory instruction BYPASS has an opcode that is formed
by all instruction register bits with a high logic value (logic 1). The IEEE 1149.1 standard specifies
other optional instructions. This design implements the HIGHZ and IDCODE instructions that are
defined in the standard. The HIGHZ instruction is quite useful, since this IC has a three-state
bidirectional bus that may be connected to a shared system bus. Therefore, this instruction allows
the output drivers of this IC to be placed in high impedance, allowing other devices connected to
the same bus to be tested. The IDCODE instruction allows the device to be identified in a larger
system and to check the current version of the IC. This design also implements additional user-
specified instructions to control the memory BIST controllers. These are the SELECTSAMEM
(for the search area memory), SELECTMBMEM (for the macroblock memory) and SELECTINST-
MEM (for the instruction memory), which control the necessary signals to enable the respective
memory BIST controller, without the need for dedicated package pins.
After a preview, the implemented configuration was accepted and BSD Compiler generated
the necessary logic and automatically compiled and optimized only the top-level design which
includes the BSR and the TAP controller. The BSDL file was generated after compilation of
the JTAG logic. This file contains information that is essential for the characterization of the
implemented JTAG logic and that will allow the test equipment to use the available test features.
Compliance of the design with the IEEE 1149.1 standard is also assured by the BSD Compiler.
This step should be performed in order to verify that the design and the implemented JTAG logic
comply with the standard to allow it to be interoperable with other devices that follow the same
standard.
Boundary scan test patterns need to be generated using the BSD Compiler. The generated
test patterns are then simulated using TetraMAX.
5.3.4 Workflow for test generation
The workflow for test vector generation using TetraMAX is represented in figure 5.7. This flow
assumes that a STIL protocol file is available for the synthesized design. This is possible by using
the DFT Compiler that automatically generates the test protocol, as is the case in this work. If not,
a manually generated STIL protocol file has to be written or could be created, using TetraMAX,
for simple designs without test structures.
Before reading the design netlist, all used cells must be read into the internal library. This is
done using the “read netlist” command with the -library option, so that the Verilog models for
56
Page 77
5.3 Workflow
Read Netlist
Read Library Models
Build ATPG Model
Perform Test Design Rule Checking (DRC)
Prepare to Run ATPG
Run ATPG
Review Test Coverage
Rerun ATPG
Save Test Patterns
Read netlist
Run build_model
Set drcRun drc
Set atpgRemove faultsAdd faults
Run atpg
Report
Run atpg
Write Patterns
Read netlist
Figure 5.7: Synopsys TetraMAX ATPG Workflow.
TetraMAX of the I/O, standard cells and memory blocks, supplied by Faraday, are imported to the
internal library.
After reading the library cells, the design is read using the same read netlist command. Then,
after importing the design, the circuit model for ATPG generation is built. This is done with the
“run build model” command. The STIL protocol file is then defined, using the set drc command.
The ATPG settings are set next. In this work, the SSF model was used and all stuck-at faults
were added. Among these, some will be inherently untestable, due to set constraints on the pri-
mary inputs (e.g. since the test mode signal is constrained to a high value during test procedures,
a stuck-at-1 fault in this node is undetectable).
Initially, a basic-scan ATPG was done. This is the fastest test generation mode and will detect
most of the faults in the design, since this is a full scan design (all flip-flops are included in scan
chains). Nevertheless, some faults will remain untestable due to the existent memory modules
(these are also considered sequential elements).
A second ATPG was done, but in full-sequential mode. This run took longer, but detected
some additional faults that were not previously detected.
The generated test patterns where then saved in the necessary formats to be exported to the
ATE and to be simulated in a logical simulator. Additionally, a fault simulation using the TetraMAX
simulation engine might be performed. This is possible in this particular flow, but is normally used
only when pattern generation is done outside of TetraMAX and the later is only used to verify fault
coverage.
57
Page 78
5. FrontEnd - From Behavioral VHDL code to Verilog netlist
58
Page 79
6BackEnd - From Verilog netlist to
GDS Layout
Contents6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 06.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
59
Page 80
6. BackEnd - From Verilog netlist to GDS Layout
6.1 Introduction
A verilog netlist represents the logical connections between several components. These com-
ponents have to be physically placed on the die and connected to achieve a functional circuit.
Creating the power structures is also necessary to provide enough power for each cell in the
chip. This section describes the workflow followed to reach a final layout (described in Graphic
Data System (GDS) format), which is then used to produce the fabrication masks, starting from a
verilog netlist and using the Cadence Encounter platform.
6.2 Tools
The tools used for Place and Route belong to the Cadence Encounter family of products.
The Cadence software is largely used in the industry and is the de facto reference software for
placement and routing. Moreover, it is supported by major foundries which supply the necessary
files and libraries for this software.
Of the various Cadence packages available, the SoC Encounter product was selected. This
package of tools is used to make the placement, power planning, routing, clock tree synthesis,
optimizations and GDSII generation. Table 6.1 shows the versions of the tools used from the SoC
Encounter package.
Table 6.1: Cadence tools versions
Tool VersionFirst Encounter v04.10-s374 1 (32bit) 05/12/2005 20:09 (Linux 2.4)NanoRoute Version v04.10-s891 NR050505-1434/USR29-UB (database ver-
sion 2.30, 20) super threading v1.4
6.2.1 First Encounter
The main application in the SoC Encounter package is the First Encounter software. Some of
the functions are performed directly by First Encounter while others are performed by other tools
that are executed by the First Encounter software using its interface.
The First Encounter tool requires a technology description file and a physical library that
describes the standard cells. Both of these items should be available in Library Exchange
Format (LEF) and must be provided by the foundry.
First Encounter is able to perform RC parameter extraction of the routed design. For such
purpose a capacitance file should be provided to achieve better quality of results. Otherwise, First
Encounter can extract RC values based on capacitance and resistance values that it calculates,
using default process parameters and heuristic equations. A 3-D field solver, in this case the
Cadence Field Solver (Coyote), is used to calculate the capacitance values.
60
Page 81
6.3 Workflow
Delay calculation is also performed by the Encounter software, if the RC values and the cell
timing libraries have been supplied. Encounter is capable of reading the timing libraries in Synop-
sys Technology Library format (.lib) or in Timing Library Format (.tlf).
The synthesis of the clock tree is also performed by Encounter. Clock Tree Synthesis (CTS)
analyzes the clocks in a design and inserts buffers (or inverters) to reduce or eliminate clock
skew. The CTS process can be performed in automatic or manual mode. In automatic mode, the
number of buffer levels and the number of buffers per level are automatically determined based on
the timing constraints set in the clock tree specification file (e.g. maximum clock skew, maximum
and minimum delay), which are set by the designer. In manual mode, the number of levels and the
number of buffers per level are individually set by the designer before performing the synthesis of
the clock tree.
Encounter is also able to perform power analysis. It analyzes the power usage, power grid
IR drop and power grid electromigration of a design [22]. This analysis should be performed at
the sign-off stage to validate the circuit’s power structures. Nevertheless, the analysis relies on
information that may not be available in all standard cells libraries.
6.2.2 NanoRoute
NanoRoute is Cadence’s recommended routing engine. It performs concurrent signal integrity,
timing-driven and manufacturing aware routing of cell, block, or mixed cell and block level de-
signs [23].
NanoRoute is usually invoked by Encounter, but is able to work in standalone mode. When
in standalone mode, it can work using a graphical interface or in batch mode. In this project, the
NanoRoute software is primarily invoked from First Encounter except when performing the LVS
check in which it is run in standalone mode.
NanoRoute performs routing in two stages: global and detail routing. The global routing stage
minimizes congestion and optimizes signal timing by performing global interconnection planning.
This plan is created by routing signal nets at the global cell level. The detailed routing stage
creates the final routing by implementing nets according to design rules, and connecting the pins
of each cell or block to the corresponding nets. During the detailed routing, NanoRoute also
automatically performs search-and-repair, if there are any remaining problems in the circuit nets.
The NanoRoute tool automatically determines when to stop the search-and-repair process [23].
6.3 Workflow
The workflow, using Cadence Encounter, to produce a layout from a verilog netlist is presented
in figure 6.1. The script that implements this workflow can be found in section B.2.2.
Initial data preparation was performed before running the Encounter software. This step in-
61
Page 82
6. BackEnd - From Verilog netlist to GDS Layout
Pre-Placement Optimization
Floorplanning
Pre-CTS Optimization
Clock Tree Synthesis
Post-CTS Optimization
Routing
Post-Route Optimization
Analysis and Sign-Off
Data Preparation
Layout (GDSII format)
DRC
Repair violations
Placement
Power Planning
Figure 6.1: Design flow for Encounter.
cluded preparing the capacitance table, the standard cells timing library and creating the I/O
assignment file. A capacitance table should be created to achieve better quality of results in the
extraction of RC parameters [22]. To create this capacitance table, the Coyote 3-D field solver was
used. The field solver requires a technology description file, in ICT (IceCaps Technology file) for-
mat, that is supplied by UMC for the adopted process technology. This technology description file
describes the process parameters (e.g. the thickness of the conducting layers, the interlayer pla-
nar dielectric constant and its thickness, the conductors resistance, etc.). The resistance values
are directly defined in the technology description file while the capacitance values are calculated
based on information provided in the same file. This is a one-time operation and the generated
capacitance table can be used for future designs using the same process.
In the adopted cell library, provided by Faraday, the timing files are only available in lib format.
Although Encounter is supposed to read these files, to build the internal cell timing library, it was
not able to do so. Therefore, tlf format files were generated using a Synopsys utility named syn2tlf
that converted lib format files into tlf format files to be used by Encounter.
The I/O assignment file specifies the location of the various I/O cells around the die periphery.
This file was created to implement the disposition of the I/O cells according to the diagram in
section 4.6. It is in this I/O assignment file that the ground and power cells were instantiated and
placed. The corner cells, which provide continuity of the I/O power and ground rings, were also
instantiated and placed using this file.
Besides the library and technology related files, Encounter requires a Verilog netlist with the
design information, which, in this work, is the result of the synthesis process by Design Compiler.
62
Page 83
6.3 Workflow
It also requires a timing constraints file, in Synopsys Design Contraints (SDC) format, for timing
oriented optimizations. This file was also previously generated by Design Compiler during the
frontend phase and contains the timing information of the clock signals.
Encounter uses cell footprint information to determine functionally equivalent cells so it is able
to perform optimizations, such as replacing a given buffer with a higher drive capacity buffer (buffer
resizing). These footprints are set, for each cell, in the standard cell library files. Moreover, En-
counter requires the footprints of buffer, inverter and delay cells to be defined, so it can use them
during the optimizations that are performed in buffer resizing. Nevertheless, the available library
defines equal footprints for buffer and delay cells even though it has specific delay cells [15]. Al-
though a delay cell may be equivalent to a series of buffers, its function should be restricted to
providing delays. With this standard cell library, Encounter will use buffer and delay cells indis-
tinctively. Moreover, the stelected standard cell library also provides dedicated cells (buffers and
inverters) for the clock signals which must also be specified in Encounter in order to be used
during the synthesis of the clock tree.
After reading in the design, a pre-placement optimization is done. This first optimization step
was used to remove buffers that could have been placed by the Design Compiler (the synthesis
tool) in order to comply with the timing constraints. Since Encounter will also perform timing
driven optimizations, it will add the required buffers where they are needed, taking into account
the placement and routing information, which was unavailable during the synthesis phase.
The floorplaning step is performed next. Determining the dimensions of the floorplan can be
an iterative process and, in a normal flow, the occupied area should be minimized. However, since
EUROPRACTICE defines discrete dimensions for the design, the iterative process to minimize the
area was not performed in this work. Instead, the available space, in the die, was used to spread
as much as possible the I/O buffers in order to minimize the dissipated power per area unit.
Due to the number of I/O cells needed, including the pads, and the required pad pitch, a single
sub-block of 1525 x 1525 µm would not be enough to fit all I/O cells using inline I/O cells with inline
pads. However, if staggered pads and the same inline I/O cells were used, all of the I/O cells and
the pads would fit using only one sub-block. Nevertheless, that would reduce the core size up to
a point where the core logic and memory blocks would not fit. Furthermore, if staggered I/O cells
and staggered pads were used, the available core area would be further reduced. Therefore, the
use of two sub-blocks (3240 x 1525 µm) is required.
The floorplan dimensions that were set in the Encounter software are the dimensions of the
area where the core and I/O cells will be placed. According to UMC Topological Layout Rules [24],
a die seal ring must be present in the final layout. EUROPRACTICE will add this die seal ring,
in accordance with UMC rules, outside of the stipulated design area, so the designer does not
need to account for this structure in the design area dimensions. This die seal ring will have
a minimum width of 10µm and a minimum spacing between the pad metal edge and the seal
63
Page 84
6. BackEnd - From Verilog netlist to GDS Layout
ring of 10µm. The pad zone area is not accounted for when setting the floorplan dimensions in
Encounter. Therefore, the dimensions set for the floorplan are the dimensions of the die (two
sub-blocks) deduced of the pad dimensions (2 x 79µm ∼= 160µm). Figure 6.2 shows the die block
and floorplan dimensions (which includes the I/O cells zone).
3240 µm
1525
µm
3080µm
1365
µm
PAD zone
PAD zone
PA
D z
one P
AD
zone
I/O cell zone
I/O cell zone
I/O
ce
ll zo
ne
I/O cell zone
Core
Figure 6.2: Die block size, floorplan and core size.
After defining the floorplan, the previously created I/O cell position file was loaded. The po-
sition of the I/O cells may be altered if it is detected that, after an initial placement, a better
disposition of cells would improve the design routability, or, after power grid analysis, there would
be the need to change the number of power and ground connections.
The memory blocks (hard blocks) were placed before inserting the power structures or any
other cells. The hard blocks can be either manually or automatically placed. The memories
must be placed taking into account their power dissipation, because if they are placed too close
together, the temperature in that area might increase above the recommended values. Never-
theless, placing the memory blocks too far apart could negatively influence the compliance with
the timing constraints. In this work, an initial automatic placement of the cells was made in order
to set the location of the memory blocks, according to the timing constraints. However, this au-
tomatic placement does not take into account the power dissipation of the memories. As such,
this initial placement could be considered a guide to find the optimal position in terms of both
timing constraints and power dissipation. Nevertheless, the initial automatic placement placed
the memories too close. Therefore, these were placed further apart and their status was set to
pre-placed, which indicates that the next placement iteration should keep these blocks in their
pre-set position. A block halo (empty zone around the blocks) was added to the memory blocks,
64
Page 85
6.3 Workflow
to prevent any cell from being placed in this area, in order to reserve a space to add power and
ground rings to these blocks. Furthermore, this block halo also avoids design rule violations that
may occur when standard cells are adjacent to these memories.
After placing the memory blocks, the main power structures were added during the power
planning stage. Designing a power grid can be difficult and reaching a satisfactory result may
involve an iterative process. An initial power structure was implemented and, afterwards, verified.
If needed, the power structures would be redesigned to correct any problems. The initial power
structure is composed of one power and one ground rings, each being 20µm wide and 13 evenly
spaced pairs of stripes (one stripe for the power net and another for the ground net) of 10µm width
each. Additionally, there is, for each of the memory blocks, one power and ground rings (block
power ring) with 10µm width. The global power and ground ring was implemented in the higher
metal layers (metal4 and metal5) because these metal layers have less resistance than the lowest
metal layers [24] and, as this ring will support all the current supply to the chip, it is a probable
candidate for high IR drop. The memory power and ground rings were also implemented using
the higher metal layers, in order to reduce the IR drop. From this point forward, the term ”power
net/ring” refers to both the power and ground net/ring.
Encounter supports designs with more than one power domain. Therefore, Encounter needs
to know to which power net it must connect each cell’s power and ground pins. At this phase, the
global power ring, the memories power rings and the power stripes are implemented. However,
the power connections to each individual cell are not yet implemented, but simply described (using
the globalNetConnect command). The power routing process, performed later, will effectively
connect the existent power structures to the cells and memory power pins using the appropriate
metal tracks.
Timing driven placement with high effort was performed next. This placement strategy is
performed to place the cells (excluding the memories which were pre-placed) in order to achieve
the best timing. During this process several placement and trial routing iterations are automatically
done until a solution is reached.
After the placement and the power planning stages, but before synthesizing the clock tree, an
optimization was performed. In this pre-CTS optimization phase, the Encounter software performs
the replacement of cells with other, functionally equivalent, cells but with different driving capaci-
ties (gate resizing). It also performs global buffer insertion and netlist restructuring to repair setup
time violations and design rule violations and improve the timing slacks (the difference between
the calculated timing value and the timing constraint) [25].
The next step, in the backend design flow, was the synthesis of the clock tree. The Clock Tree
Synthesis (CTS) configuration file includes constraints information about maximum clock skew,
maximum and minimum delays and the maximum depth of logic in the clock tree. The maximum
clock skew value was set at 300ps while the maximum delay was set at 1.5ns and the minimum
65
Page 86
6. BackEnd - From Verilog netlist to GDS Layout
delay at 0ns. The maximum depth of logic levels of the clock tree was set at 8 levels. Several
other options are available to control the synthesis of the clock tree, but are not needed for this
design. To build the clock tree, CTS routes the clock networks, based on the constraints set on the
configuration file, and then optimizes the clock tree to improve the skew including resizing buffers
or inverters, adding buffers, refining placement, and correcting routing [25].
Just after synthesis of the clock tree, a new optimization was done. Post-CTS optimization
repairs remaining design rule violations, setup time and hold time violations (only if the setup time
is not worsened) and corrects the timing information [22].
The addition of filler cells in the core and empty cells in the I/O zone was done before the
routing phase. Filler and empty cells exist in different sizes. The filler and the empty cells were
added, starting with the widest cell and ending with the straightest, in order to occupy empty
spaces, in the core and in the I/O ring respectively, with the widest cells first. This is particularly
relevant with filler cells, due to the fact that the straightest filler cell does not provide decoupling
capacity. Therefore, widest cells should be added in the first place. If there is the need to insert
new cells after routing (e.g. antenna diodes), these filler cells can be automatically removed.
The routing phase starts with the routing of the power structures. The SRoute engine is used
to perform routing of special nets like power nets. After power routing, all cells (including I/Os)
and blocks were connected to the power nets.
The global and detailed routing were performed next by the NanoRoute routing engine. The
first routing iteration was timing driven in order to meet the timing constraints. When performing a
non-timing-driven routing, NanoRoute might detour some nets in order to avoid creating violations
but when performing timing-driven routing it does not detour timing critical nets. Instead, it forces
them to be routed as short as possible, which can create congestion and violate design rules.
Later, when design-rule checking takes precedence, these nets will be detoured [22].
Since after the timing driven routing, there were design rule violations (e.g. a short circuit
between the clock network and a power network), these had to be fixed. Encounter has the
ability to delete the violating nets and then perform routing of the deleted nets. As such, the
violating nets were deleted and Engineering Change Order (ECO) routing used to perform routing
of the changed (deleted) nets. This routing step needs to be non-timing-driven, in order to avoid
creating other violations. Since, after this second routing iteration, there were no violating nets, it
is possible to follow to the next step.
After a design rule violation free design was reached, a post-route optimization was made.
This optimization step fixes the timing problems and, additionally, the design rule violations that
may have been introduced by this optimization process.
At this phase the design was routed and RC extraction and final timing verification was done.
RC parameters were extracted from the final layout in order to perform the delay calculation. The
result of the delay calculation is a back-annotation Standard Delay File (SDF) file that contains
66
Page 87
6.3 Workflow
timing information concerning the nets delays and that can be used in a final simulation of the
design.
A power grid analysis was performed to validate the correct planning of the power structures.
To perform this analysis, Encounter needs a pad location file. This file indicates the source lo-
cation for the power nets. Usually, the source location are the power I/O cells, but this can be
changed if studying other locations for the power cells is necessary. The pad location file was
manually created in order to define the location of the core’s power I/O cells as the location of the
power nets sources. After performing the power analysis, there were no detected violations of
electromigration rules. Additionally, the maximum IR drop was within accepted values (5mV). An
estimate of the consumed power was also performed.
At this stage, the design was saved in DEF format so that it could be imported into the
NanoRoute tool for a final LVS verification. This LVS check certified that the layout connections
corresponded to the netlist connections.
After performing all validations, the design is ready to be exported to GDSII format. This file
contains the geometry of the metal layers, used in routing, as well as the position of the cells
in the layout. To generate this file, a mapping file had to be supplied. This mapping file makes
the correspondence between the layer names used in the Encounter software (e.g. metal1, via,
metal2) and the corresponding GDS layout layer number according to UMC Rules.
This GDSII file containing the layout of the metal layers, used in routing, is sent to EURO-
PRACTICE which will merge this information with the standard cell’s layout. Therefore, a complete
GDSII file containing the complete layout information is now achieved.
This concludes the steps required to obtain a GDSII file, describing the processor’s layout,
using a standard cell library, described in the VHDL source code.
The complete GDSII file is then sent to the foundry that will use it to produce the masks used
in the IC manufacture.
67
Page 88
6. BackEnd - From Verilog netlist to GDS Layout
68
Page 89
7Results
Contents7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 07.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
69
Page 90
7. Results
7.1 Introduction
After performing the synthesis, the placement and the routing steps, the design is complete.
This chapter presents the obtained circuit layout and the results from the final timing analysis of
the circuit. Obtained values from the synthesis process are also presented in order to compare
the cost of inserting test structures. All of the tools were run in a Linux machine (CentOS release
4.4) with a Dual Core AMD Opteron Processor at 2GHz and with 2GB of memory.
7.2 Results
To assess the impact of the test structures on the circuit’s area and performance, four circuits
were initially synthesized: the AMEP processor without any test structures (Basic), with the ad-
ditional memory BIST controllers (Basic + Mem test), with the memory BIST controllers and the
scan chains (Basic + Mem test + scan) and with the memory BIST structures, the scan chains and
the IEEE 1149.1 compliant boundary scan logic (Basic + Mem test + scan + JTAG). The presented
results were obtained by the synthesis tool and are used only for comparing the different costs of
the implemented test structures. The worst conditions defined in the libraries were used to obtain
these values. The presented area values are the sum of the values reported by the synthesis tool
with the value of the area occupied by the three memories (0.67mm2). This is necessary because
the synthesis tool is unaware of the area occupied by the memories, because available memory
libraries do not include a view for the this tool. The resulting areas and the minimum clock period,
after synthesis, are summarized in table 7.1.
Table 7.1: Results from synthesis tool.
OccupiedArea (mm 2)
Relative Area Minimum Pe-riod (ns)
Basic 1.17 100.0% 9.99Basic + Mem test 1.20 102.8% 9.99Basic + Mem test + scan 1.21 103.9% 10.00Basic + Mem test + scan + JTAG 1.90 162.7% 10.00
These results demonstrate that the insertion of dedicated test structures, like the memory
BIST controllers and the scan chains has a relative small impact either on the circuit timing or on
the occupied area. However, the area increase due to the implementation of the boudary scan
logic (IEEE 1149.1) is significant. This increment results mainly from the implementation of the
TAP controller of the IEEE 1149.1 standard. Nevertheless, it will be included in the final circuit
since there is enough area available on the die and it will allow the test of the connections at a
board level.
The results obtained after placement and routing, of the circuit with the memory BIST con-
trollers, the scan chains and the IEEE 1149.1 compliant boundary scan logic, are presented in
70
Page 91
7.2 Results
table 7.2. The power consumption value is an estimate, of the power of the core, performed by
encounter based on a net toggle probability of 45%. Since a 50% probability would indicate that
every net in the circuit would toggle on every positive edge of the clock, a 45% for the overall
net toggle probability is a reasonable estimate, since the test dedicated parts of the circuit will be
disabled in normal function mode, thus reducing the overall net toggle probability.
Table 7.2: Layout results.
Occupied Area 4.9 mm2
Minimum Period 9.8 nsConsumed Power (@100MHz) 14.5 mW
From these results, it can be seen that the initial wire load model used in the synthesis tool
was a good estimate, since the design met the set timing constraints (the circuit is able to run with
a maximum clock frequency of 102MHz). With a power consumption of 14.5mW @ 100MHz, the
AMEP processor meets its requirements for power consumption and therefore it will be capable
of efficiently implementing motion estimation algorithms in battery-supplied devices.
Furthermore, the power analysis performed by Encounter also indicated that the maximum
current density values present at the various layers and vias are within the recommended values
by UMC, in order to comply with electromigration rules at a temperature of 125◦C [24]. Table 7.3
summarizes these values according to the process layer. Note that the current values at the metal
6 layer is zero, since this metal layer was not used for routing.
Table 7.3: Power analysis results.
Layer/Via Maximum [24] ActualMetal 1 0.44 mA/µm 0.22 mA/µmMetal 2 0.53 mA/µm 0.02 mA/µmMetal 3 0.53 mA/µm 0.13 mA/µmMetal 4 0.53 mA/µm 0.15 mA/µmMetal 5 0.53 mA/µm 0.05 mA/µmMetal 6 0.89 mA/µm 0.00 mA/µmVia12 0.21 mA/cut 0.02mA/cutVia23 0.21 mA/cut 0.01mA/cutVia34 0.21 mA/cut 0.01mA/cutVia45 0.21 mA/cut 0.01mA/cutVia56 0.21 mA/cut 0.01mA/cut
The maximum estimated IR drop, calculated by Encounter, is 5mV which is an acceptable
value for this circuit and this technology (it is 0.3% of the 1.8V supply voltage). As a conclusion, the
current density values and the maximum IR drop values indicate that the initial power structures
were adequately sized.
The AMEP final design, including two scan chains and the JTAG boundary scan logic (IEEE
1149.1), has been obtained using the design flow described in Chapter 5 and Chapter 6. The final
layout of the circuit is presented in figure 7.1 where different blocks and cells are identified.
71
Page 92
7. Results
Search Area Memory
Macroblock Memory Instruction MemoryPower Rings
I/O Cell PadCorner Cell
Figure 7.1: AMEP chip layout.
It can be observed that the memory blocks (search area, macroblock and instruction mem-
ories) are distributed through the die to avoid excessive temperature. The I/O cells and their
respective pads, as well as the corner cells, are present at the periphery of the die. The global
power ring is also visible in the space between the I/O cells and the core area.
A final LVS check and simulation were performed and confirmed the validity of this layout.
72
Page 93
8Conclusions
Contents8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5
73
Page 94
8. Conclusions
8.1 Conclusions
Motion Estimation is the most computationally expensive part of a video encoder system.
Therefore, an efficient architecture for motion estimation was proposed in [2].
In this dissertation, a structured workflow was defined, and followed, to implement the Adaptive
Motion Estimation Processor (AMEP) [2] on an Application Specific Integrated Circuit (ASIC) using
a standard cell library. The implementation of the AMEP circuit starts from a VHDL description
and ends with the final physical description of the layout, exported in a GDSII format file, which is
sent to the foundry for manufacture. During this process, several EDA tools were used to perform
the various steps.
The Synopsys Inc. software was used to perform the steps of the frontend phase (synthesis
and insertion of test structures) of this project, while the Cadence SoC Encounter platform was
used for the steps in the backend phase (placement, routing and sign-off analysis).
The UMC CMOS L180 1P6M MM/RFCMOS process technology [24] with the corresponding
standard cell library from Faraday Technology Corporation [15] were chosen to implement this
circuit.
A special attention was put into enhancing the circuit testability in order to validate the circuit
after being manufactured and assist in the detection of eventual design errors. Therefore, a
memory BIST controller was designed and implemented to allow the test of the memories used
in the processor. This controller implements functions that are usefull during the protoype stage
of the processor, such as allowing the address of a failing memory position to be extracted. The
implementation of this controller required to change and augment the VHDL description of the
processor to include this test dedicated hardware.
Furthermore, two scan chains were created, during the synthesis stage, to improve testability,
using the available options of the synthesis tool. Additionally, the IEEE 1149.1 TAP controller
and the associated Boundary Scan Register (BSR) were implemented to provide test capability
of the circuit’s interconnections, when integrated into a board, and allow control of the internal
memory BIST controllers, reducing the number of additional pins dedicated to test structures.
The test patterns, used to verify that the chip is properly manufactured, were generated using the
Synopsys TetraMAX tool.
After completing the steps in the frontend phase, a Verilog netlist was achieved, which repre-
sents the interconnections between the used standard cells that implement the circuit’s functions.
This netlist is the basis for the backend phase. This phase starts with the placement of the cells
inside the available die area. The three memory blocks were manually placed due to temperature
constraints. The remaining cells were automatically placed in order to achieve the best timing.
The power structures necessary to supply the required current to every cell in the chip were
created and, in the final verifications, validated. The synthesis of the clock tree was done by
74
Page 95
8.2 Future Work
Encounter, and assured that the clock skew was less than 300ps which represents approximately
3% of the minimum clock period (9.8ns).
After performing the routing of the signal nets, a final timing analysis concluded that the mini-
mum clock period is 9.8ns, which corresponds to a maximum working frequency of 102MHz. An
LVS check and a simulation with timing details were performed and validated the circuit’s cor-
rect implementation. A power analysis was also performed and revealed that the created power
structures complied with the electromigration rules set by UMC and that the maximum IR drop
is 5mV, which is less than 1% of the 1.8V supply. This analysis also concluded that the core of
the manufactured processor will have a maximum power consumption of 14.5mW @ 100MHz.
Therefore, the low power consumption estimated for the manufactured chip makes it adequate to
perform motion estimation on battery-supplied devices.
8.2 Future Work
In order to produce the AMEP Integrated Circuit, the final layout will be sent to EUROPRAC-
TICE for manufacture in the 22th of October run. Afterwards, it will be encapsulated using a
CLCC68 package. Meanwhile, a connection board to make the interface of the manufactured
AMEP with an already existing video coding platform will be developed.
It is also required to develop the software needed to perform the test of the circuit. This soft-
ware is responsible for managing all the signals necessary to deliver the generated test vectors
through the scan chains and read the resulting output values. This software must also compare
the read values from the circuit with the expected outputs, in order to verify the correct manufac-
ture of the chip.
Moreover, the manufactured AMEP will be used in the video coding platform to perform real-
time motion estimation, while its power consumption is measured, to assess its compliance with
the low power constraints imposed by the battery-supplied devices.
75
Page 96
8. Conclusions
76
Page 97
Bibliography
[1] T. Dias, S. Momcilovic, N. Roma, and L. Sousa, “Adaptive motion estimation processor for
autonomous video devices,” EURASIP Journal on Embedded Systems, special issue on
Embedded Systems for Portable and Mobile Video Platforms, vol. 2007, no. 57234, pp. 1–
10, May 2007.
[2] S. Momcilovic, T. Dias, N. Roma, and L. Sousa, “Application specific instruction set processor
for adaptive video motion estimation,” in Proc. of 9th EUROMICRO Conference on Digital
System Design: Architectures, Methods and Tools - DSD’2006. IEEE Computer Society,
August 2006, pp. 160–167.
[3] N. Roma, “Processadores dedicados para estimacao de movimento em sequencias de
vıdeo,” Master’s thesis, Universidade Tecnica de Lisboa - Instituto Superior Tecnico, Lisboa,
Jan. 2001.
[4] V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms
and Architectures. Kluwer Academic Publishers, 1995.
[5] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems Testing and Testable
Design. COMPUTER SCIENCE PRESS, 1990.
[6] TetraMAX ATPG User Guide (Version Y-2006.06), Synopsys, Inc., June 2006.
[7] DFT Compiler Understanding Test Automation User Guide (DB Mode) (Version X-2005.09),
Synopsys, Inc., September 2005.
[8] IEEE 1149.1-2001 - Standard Test Access Port and Boundary-Scan Architecture, IEEE, June
2001.
[9] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing. Kluwer Academic Pub-
lishers, 2000.
[10] “An optimal march test for locating faults in drams,” in Records of the 1993 IEEE International
Workshop on Memory Testing. IEEE Computer Society, August 1993, pp. 61–66.
[11] I. Koren, “Should yield be a design objective?” in Proc. IEEE 2000 First International Sym-
posium on Quality Electronic Design. IEEE Computer Society, March 2000, pp. 115–120.
77
Page 98
Bibliography
[12] N. Harrison, “A simple via duplication tool for yield enhancement,” in Proc. of the 2001 IEEE
International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01). IEEE
Computer Society, October 2001, pp. 39–47.
[13] H. H. Chen and C. K. Wong, “Wiring for manufacturability and yield maximization in computer-
aided vlsi design,” in Proc. of Technical Papers. 1993 International Symposium on VLSI Tech-
nology, Systems, and Applications. IEEE Computer Society, May 1993, pp. 68–72.
[14] eSi-Route/11TMHigh Performance 0.18µ Standard Cell Library - Part Number: UMCL18U250
(Rev. 2.4), Virtual Silicon Technology, Inc, November 2001.
[15] FARADAY ASIC CELL LIBRARY FSA0A C 0.18µm STANDARD CELL (v1.0), Faraday Tech-
nology Corporation, August 2004.
[16] Bonding Pad Layout Guidelines (Ver. 5 P1), UMC, October 2001.
[17] 0.18µm (FSA0A C) Standard Cell Library ESD Application Note (v1.0), Faraday Technology
Corporation, September 2004.
[18] Ceramic packaging guidelines for UMC technologies (v1.1), EUROPRACTICE IC SERVICE,
December 2003.
[19] Design Compiler User Guide (Version Y-2006.06), Synopsys, Inc., June 2006.
[20] DFT Compiler User Guide Vol. 1: Scan (XG Mode) (Version Y-2006.06), Synopsys, Inc., June
2006.
[21] BSD Compiler User Guide (XG Mode) (Version Y-2006.06), Synopsys, Inc., June 2006.
[22] Encounter User Guide (Product Version 4.1.5), Cadence Design Systems, Inc., May 2005.
[23] NanoRoute Technology Reference (Product Version 4.1.5), Cadence Design Systems, Inc.,
May 2005.
[24] 0.18um Mixed-Mode and RFCMOS 1.8V/3.3V 1P6M Metal Metal Capacitor Process Tech-
nology Layout Rule (Ver. 2.9 P.1), UMC, May 2006.
[25] Encounter Timing Closure Guide (Product Version 4.1.3), Cadence Design Systems, Inc.,
December 2004.
78
Page 99
AVHDL Code
ContentsA.1 Memory Test Controller VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . 80
79
Page 100
A. VHDL Code
A.1 Memory Test Controller VHDL Code
---------------------------------------------------------------------------------------------------------------------------------------------------------------- File : mem_wrapper_1port .vhd-- Author(s) : Nuno Sebastiao-- Date : 14/02/07--------------------------------------------------------------------------------- Description :-- Interface for memory module with integrated BIST controller .-- ADDRESS_WIDTH is the memory address width-- DATA_WIDTH is the memory data width-- BYTEWRITE is different than ’0’ if bytewrite capability exists in the memory.---- Note: DATA_WIDTH must be greater than 8-- If BYTEWRITE is available , DATA_WIDTH must be divisible by 8--------------------------------------------------------------------------------------------------------------------------------------------------------------
library IEEE;use IEEE.STD_LOGIC_1164.all;
entity SU180_1024X8X2BM1_WRAPPER isport(
CLK : in STD_LOGIC;CS : in STD_LOGIC;OE : in STD_LOGIC;nWEl : in STD_LOGIC; -- Write enable signalnWEh : in STD_LOGIC; -- Write enable signalADDR : in STD_LOGIC_VECTOR (9 downto 0);DI : in STD_LOGIC_VECTOR (15 downto 0);DO : out STD_LOGIC_VECTOR (15 downto 0);bisten : in STD_LOGIC;bistgo : in STD_LOGIC;bistrst : in STD_LOGIC;bistrslt : out STD_LOGIC;bistend : out STD_LOGIC
);end SU180_1024X8X2BM1_WRAPPER;
architecture Behavioral of SU180_1024X8X2BM1_WRAPPER is
-- if BIST = 1, the memory bist controller will be synthesized.constant BIST : INTEGER := 1;constant DATA_WIDTH : POSITIVE := 16;constant ADDRESS_WIDTH : POSITIVE := 10;constant BYTEWRITE : INTEGER := 1;
component mem_bist_controller_1portgeneric (
ADDRESS_WIDTH : POSITIVE := 8;DATA_WIDTH : POSITIVE := 8;BYTEWRITE : INTEGER := 0
);port(
clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;go : in STD_LOGIC;bistctr_din : in STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistbwen : out STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);bistaddr : out STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);bistctr_dout : out STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0)
);end component;
component SU180_1024X8X2BM1port(
A0 : IN std_logic;A1 : IN std_logic;A2 : IN std_logic;A3 : IN std_logic;A4 : IN std_logic;A5 : IN std_logic;A6 : IN std_logic;A7 : IN std_logic;A8 : IN std_logic;A9 : IN std_logic;DO0 : OUT std_logic;DO1 : OUT std_logic;DO2 : OUT std_logic;DO3 : OUT std_logic;DO4 : OUT std_logic;
80
Page 101
A.1 Memory Test Controller VHDL Code
DO5 : OUT std_logic;DO6 : OUT std_logic;DO7 : OUT std_logic;DO8 : OUT std_logic;DO9 : OUT std_logic;DO10 : OUT std_logic;DO11 : OUT std_logic;DO12 : OUT std_logic;DO13 : OUT std_logic;DO14 : OUT std_logic;DO15 : OUT std_logic;DI0 : IN std_logic;DI1 : IN std_logic;DI2 : IN std_logic;DI3 : IN std_logic;DI4 : IN std_logic;DI5 : IN std_logic;DI6 : IN std_logic;DI7 : IN std_logic;DI8 : IN std_logic;DI9 : IN std_logic;DI10 : IN std_logic;DI11 : IN std_logic;DI12 : IN std_logic;DI13 : IN std_logic;DI14 : IN std_logic;DI15 : IN std_logic;WEB0 : IN std_logic;WEB1 : IN std_logic;CK : IN std_logic;CS : IN std_logic;OE : IN std_logic
);end component;
component reg_egeneric (
WIDTH : POSITIVE :=32);
port (CLK : in STD_LOGIC;CE : in STD_LOGIC;Din : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);Dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)
);end component;
component mux_2to1generic (
WIDTH : POSITIVE);port (
S : in STD_LOGIC;A : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);B : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);O : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)
);end component;
signal mem_cen , mem_oen : STD_LOGIC;signal mem_wen : STD_LOGIC;signal mem_di : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal mem_adr : STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);signal mem_bwen , mem_bwen_s : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);signal bwen : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);
signal dout_s : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal fault_dtct : STD_LOGIC;
signal bisten_s , bistcen , bistoen , bistwen : STD_LOGIC;signal bistdi : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal bistadr : STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);signal bistbwen : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);
begin
bwen (0) <= nWEl;bwen (1) <= nWEh;
--mem_bwen and mem_wen are active lowMEM_BWEN_GEN : for i in DATA_WIDTH /8-1 downto 0 generate
mem_bwen(i) <= mem_bwen_s(i) or mem_wen;end generate;
BIST_CTRL_GEN_1PORT: if (BIST = 1) generateBIST_CTRL : mem_bist_controller_1port
generic map (ADDRESS_WIDTH => ADDRESS_WIDTH ,DATA_WIDTH => DATA_WIDTH ,
81
Page 102
A. VHDL Code
BYTEWRITE => BYTEWRITE)
port map (clk => CLK ,rst => bistrst ,en => bisten ,go => bistgo ,bistctr_din => dout_s ,fault_dtct => fault_dtct ,bistend => bistend ,bisten => bisten_s ,bistoen => bistoen ,bistwen => bistwen ,bistcen => bistcen ,bistbwen => bistbwen ,bistaddr => bistadr ,bistctr_dout => bistdi
);
-- bisten_s is active lowMEM_ADRA_SEL : mux_2to1
generic map (WIDTH => ADDRESS_WIDTH
)port map (
S => bisten_s ,A => bistadr ,B => ADDR ,O => mem_adr
);
MEM_DINA_SEL : mux_2to1generic map (
WIDTH => DATA_WIDTH)port map (
S => bisten_s ,A => bistdi ,B => DI,O => mem_di
);
--wen/bistwen is active low--mem_wen is active lowMEM_WEN_SEL : process (bisten_s , bistwen)begin
case bisten_s iswhen ’0’ => mem_wen <= bistwen;when others => mem_wen <= ’0’;
end case;end process;
--bistcen is active low--cen is active high--mem_cen is active highMEM_CEN_SEL : process (bisten_s , bistcen , CS)begin
case bisten_s iswhen ’0’ => mem_cen <= not bistcen;when others => mem_cen <= CS;
end case;end process;
--bistoen is active low--oen is active high--mem_oen is active highMEM_OEN_SEL : process (bisten_s , bistoen , OE)begin
case bisten_s iswhen ’0’ => mem_oen <= not bistoen;when others => mem_oen <= OE;
end case;end process;
MEM_BWEN_SEL : mux_2to1generic map (
WIDTH => DATA_WIDTH /8)port map (
S => bisten_s ,A => bistbwen ,B => bwen ,O => mem_bwen_s
);end generate;
NO_BIST_CTRL: if (BIST /= 1) generatemem_adr <= ADDR;mem_di <= DI;mem_bwen_s <= bwen;mem_wen <= ’0’;mem_cen <= CS;
82
Page 103
A.1 Memory Test Controller VHDL Code
mem_oen <= OE;end generate;
MEM_1K_16_BW: if (( DATA_WIDTH = 16) and (ADDRESS_WIDTH = 10) and (BYTEWRITE /= 0))generate
RAM_1K_16 : SU180_1024X8X2BM1port map(
A0 => mem_adr (0),A1 => mem_adr (1),A2 => mem_adr (2),A3 => mem_adr (3),A4 => mem_adr (4),A5 => mem_adr (5),A6 => mem_adr (6),A7 => mem_adr (7),A8 => mem_adr (8),A9 => mem_adr (9),DO0 => dout_s (0),DO1 => dout_s (1),DO2 => dout_s (2),DO3 => dout_s (3),DO4 => dout_s (4),DO5 => dout_s (5),DO6 => dout_s (6),DO7 => dout_s (7),DO8 => dout_s (8),DO9 => dout_s (9),DO10 => dout_s (10),DO11 => dout_s (11),DO12 => dout_s (12),DO13 => dout_s (13),DO14 => dout_s (14),DO15 => dout_s (15),DI0 => mem_di (0),DI1 => mem_di (1),DI2 => mem_di (2),DI3 => mem_di (3),DI4 => mem_di (4),DI5 => mem_di (5),DI6 => mem_di (6),DI7 => mem_di (7),DI8 => mem_di (8),DI9 => mem_di (9),DI10 => mem_di (10),DI11 => mem_di (11),DI12 => mem_di (12),DI13 => mem_di (13),DI14 => mem_di (14),DI15 => mem_di (15),WEB0 => mem_bwen (0),WEB1 => mem_bwen (1),CK => CLK ,CS => mem_cen ,OE => mem_oen
);end generate;
DO <= dout_s;bistrslt <= fault_dtct;
end Behavioral;
---------------------------------------------------------------------------------------------------------------------------------------------------------------- Project : AMEP-- Affiliations : PARSIG - Parallel Structures and Signal Processing-- SIPS - Signal Processing Systems Group-- INESC -ID - Institute for Systems and Computer Engineering:-- Research and Development in Lisbon-- Funding : FCT Project POSI/EEA -CPS /60765 (2005/01/01 -2008/12/31)--------------------------------------------------------------------------------- File : mem_bist_controller_1port.vhd-- Author(s) : Nuno Sebastiao-- Date : 02/07/07--------------------------------------------------------------------------------- Copyright (c) 2005 -8 Signal Processing Systems Group - INESC -ID , Lisbon--------------------------------------------------------------------------------- Description :-- Memory BIST Controller-- ADDRESS_WIDTH is the address width of the memory to be tested-- DATA_WIDTH is the data width of the memory to be tested-- BYTEWRITE is different than ’0’ if bytewrite capability exists in the memory.---- Note: DATA_WIDTH must be greater than 8 and divisible by PATTERN_WIDTH-- If BYTEWRITE is available , DATA_WIDTH must be divisible by 8---- PATTERN_WIDTH is the width of the Pattern bits generated by the state machine-- The state machine must be changed accordingly.--------------------------------------------------------------------------------------------------------------------------------------------------------------
--{{ Section below this comment is automatically maintained
83
Page 104
A. VHDL Code
-- and may be overwritten--{entity { mem_bist_controller} architecture { mem_bist_controller }}
library IEEE;use IEEE.STD_LOGIC_1164.all;
entity mem_bist_controller_1port isgeneric (
ADDRESS_WIDTH : POSITIVE := 8;DATA_WIDTH : POSITIVE := 8;BYTEWRITE : INTEGER := 0
);port(
clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;go : in STD_LOGIC;bistctr_din : in STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistbwen : out STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);bistaddr : out STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);bistctr_dout : out STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0)
);end mem_bist_controller_1port;
architecture Behavioral of mem_bist_controller_1port is
constant PATTERN_WIDTH : POSITIVE := 2;
function sub_gt_zero (n,m : positive) return natural isvariable result : integer;
beginif (n>m) then
result := n-m;else
result := 0;end if;return result;
end sub_gt_zero;
component mem_bist_controller_smgeneric (
DATA_WIDTH : POSITIVE := 8);port (
clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;addr_gen_endcount : in STD_LOGIC;go : in STD_LOGIC;dout_cmp : in STD_LOGIC;bwen_end : in STD_LOGIC;fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistportsel : out STD_LOGIC;bistbwen_gen_en : out STD_LOGIC;bistbwen_gen_reset : out STD_LOGIC;bistbwen_gen_din : out STD_LOGIC;addr_gen_rst : out STD_LOGIC;addr_gen_en : out STD_LOGIC;addr_gen_dir : out STD_LOGIC;cmp_reg_en : out STD_LOGIC;pattern : out STD_LOGIC_VECTOR (1 downto 0)
);end component;
component updown_countergeneric (
WIDTH : POSITIVE := 8);port (
clk : in STD_LOGIC;en : in STD_LOGIC;dir : in STD_LOGIC;rst : in STD_LOGIC;count : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)
);end component;
component shift_reggeneric (
WIDTH : POSITIVE := 8);port (
84
Page 105
A.1 Memory Test Controller VHDL Code
clk : in STD_LOGIC;en : in STD_LOGIC;reset : in STD_LOGIC;din : in STD_LOGIC;dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)
);end component;
component cmp_eqgeneric (
WIDTH : POSITIVE);port (
A : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);B : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);O : out STD_LOGIC
);end component;
component reg_regeneric (
WIDTH : POSITIVE :=32);
port (CLK : in STD_LOGIC;CE : in STD_LOGIC;
RST : in STD_LOGIC;Din : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);Dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)
);end component;
signal bistbwen_s , bistbwen_s_reg , byte_select : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto0);
signal mem_data_to_cmp , mem_patt_to_cmp : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal bistctr_dout_s , bistctr_dout_s_reg : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal pattern : STD_LOGIC_VECTOR(PATTERN_WIDTH -1 downto 0);signal curr_addr , curr_addr_reg , end_up_addr , end_down_addr : STD_LOGIC_VECTOR(
ADDRESS_WIDTH -1 downto 0);signal bistwen_s , bistwen_s_reg : STD_LOGIC_VECTOR (0 downto 0);signal cmp_reg_en : STD_LOGIC;signal c_bwen , c_wen , c_dout , c_din , c_addr : STD_LOGIC;signal addr_gen_en , addr_gen_rst , addr_gen_dir , addr_gen_endcount : STD_LOGIC;signal bistbwen_gen_en , bistbwen_gen_reset , bistbwen_gen_din : STD_LOGIC;signal bwen_end : STD_LOGIC;
begin
SM: mem_bist_controller_sm generic map (DATA_WIDTH => DATA_WIDTH
)port map (
clk => clk ,rst => rst ,en => en,addr_gen_endcount => addr_gen_endcount ,go => go,dout_cmp => c_dout ,bwen_end => bwen_end ,fault_dtct => fault_dtct ,bistend => bistend ,bisten => bisten ,bistoen => bistoen ,bistwen => bistwen_s (0),bistcen => bistcen ,bistportsel => open ,bistbwen_gen_en => bistbwen_gen_en ,bistbwen_gen_reset => bistbwen_gen_reset ,bistbwen_gen_din => bistbwen_gen_din ,addr_gen_rst => addr_gen_rst ,addr_gen_en => addr_gen_en ,addr_gen_dir => addr_gen_dir ,cmp_reg_en => cmp_reg_en ,pattern => pattern
);
ADDR_GEN: updown_counter generic map(WIDTH => ADDRESS_WIDTH
)port map (
clk => clk ,en => addr_gen_en ,dir => addr_gen_dir ,rst => addr_gen_rst ,count => curr_addr
);
DATAOUT_CMP: cmp_eq generic map (WIDTH => DATA_WIDTH
)port map (
A => mem_patt_to_cmp ,
85
Page 106
A. VHDL Code
B => mem_data_to_cmp ,O => c_dout
);
CURR_ADDRESS_REG: reg_re generic map (WIDTH => ADDRESS_WIDTH
)port map (
CLK => clk ,CE => cmp_reg_en ,RST => rst ,Din => curr_addr ,Dout => curr_addr_reg
);
MEM_DATAIN_REG: reg_re generic map (WIDTH => DATA_WIDTH
)port map (
CLK => clk ,CE => cmp_reg_en ,RST => rst ,Din => bistctr_dout_s ,Dout => bistctr_dout_s_reg
);
BISTCTR_DATAOUT: for i in 0 to (DATA_WIDTH/PATTERN_WIDTH -1) generatebistctr_dout_s(PATTERN_WIDTH*i+( PATTERN_WIDTH -1) downto PATTERN_WIDTH*i) <=
pattern;end generate BISTCTR_DATAOUT;
BYTEWRITE_STRUCTS: if (BYTEWRITE /= 0) generateBYTEWRITE_GEN: shift_reg generic map (
WIDTH => DATA_WIDTH /8)port map (
clk => clk ,en => bistbwen_gen_en ,reset => bistbwen_gen_reset ,din => bistbwen_gen_din ,dout => bistbwen_s
);
DATABYTESELECT: for i in 0 to (DATA_WIDTH /8-1) generatebyte_select(i) <= (not bistbwen_s_reg(i)) and en;
end generate DATABYTESELECT;
DATABYTESTOCMP: for i in 0 to (DATA_WIDTH /8-1) generateDATABYTEGEN: for j in 0 to 7 generate
mem_data_to_cmp(i*8+j) <= bistctr_din(i*8+j) and byte_select(i);
mem_patt_to_cmp(i*8+j) <= bistctr_dout_s_reg(i*8+j) andbyte_select(i);
end generate DATABYTEGEN;end generate DATABYTESTOCMP;
BWEN_REG: reg_re generic map (WIDTH => DATA_WIDTH /8
)port map (
CLK => clk ,CE => cmp_reg_en ,RST => rst ,Din => bistbwen_s ,Dout => bistbwen_s_reg
);
bwen_end <= bistbwen_s (0);
end generate BYTEWRITE_STRUCTS;
NOBYTEWRITE_STRUCTS: if (BYTEWRITE = 0) generateBYTEGEN: for i in 0 to (DATA_WIDTH -1) generate
mem_data_to_cmp(i) <= bistctr_din(i) and en;end generate BYTEGEN;
mem_patt_to_cmp <= bistctr_dout_s_reg;
c_bwen <= ’1’;
bwen_end <= ’1’;
bistbwen_s <= (others => ’0’);
end generate NOBYTEWRITE_STRUCTS;
ADDRESSENDCOUNT: process (curr_addr , addr_gen_dir , end_down_addr , end_up_addr)begin
if (( curr_addr = end_down_addr) and (addr_gen_dir = ’1’)) or (( curr_addr =
86
Page 107
A.1 Memory Test Controller VHDL Code
end_up_addr) and (addr_gen_dir = ’0’)) thenaddr_gen_endcount <= ’1’;
elseaddr_gen_endcount <= ’0’;
end if;end process;
bistwen <= bistwen_s (0);bistctr_dout <= bistctr_dout_s;bistaddr <= curr_addr;bistbwen <= bistbwen_s;end_up_addr <= (others => ’1’);end_down_addr <= (others => ’0’);
end Behavioral;
---------------------------------------------------------------------------------------------------------------------------------------------------------------- File : mem_bist_controller_sm .vhd-- Author(s) : Nuno Sebastiao-- Date : 07/02/07--------------------------------------------------------------------------------- Description :-- Memory BIST Controller State Machine for memory with bytewrite-- This controller implements the following MARCH test:-- {up(w01); up(r01 ,w10); up(r10 ,w01); down(r01 ,w10); down(r10 ,w01); up(r01);-- up(w00); up(r00 ,w11); down(r11 ,w00); up(r00)}-- The first 10 steps are done using the entire memory word lenght and using-- port A while the last 6 steps are done for every byte and using port B.--------------------------------------------------------------------------------------------------------------------------------------------------------------
library IEEE;use IEEE.STD_LOGIC_1164.all;
entity mem_bist_controller_sm isgeneric (
DATA_WIDTH : POSITIVE := 8);port(
clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;addr_gen_endcount : in STD_LOGIC;go : in STD_LOGIC;dout_cmp : in STD_LOGIC;bwen_end : in STD_LOGIC;fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistportsel : out STD_LOGIC;bistbwen_gen_en : out STD_LOGIC;bistbwen_gen_reset : out STD_LOGIC;bistbwen_gen_din : out STD_LOGIC;addr_gen_rst : out STD_LOGIC;addr_gen_en : out STD_LOGIC;addr_gen_dir : out STD_LOGIC;cmp_reg_en : out STD_LOGIC;pattern : out STD_LOGIC_VECTOR (1 downto 0)
);end mem_bist_controller_sm;
architecture Behavioral of mem_bist_controller_sm is
component reg_regeneric (
WIDTH : POSITIVE :=32);
port (CLK : in STD_LOGIC;CE : in STD_LOGIC;
RST : in STD_LOGIC;Din : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);Dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)
);end component;
type STATE_TYPE is (Idle , Init , Step1a , Step1 , Step2 , Step3 , Step4 , Step5 , Step6 , Step7, Step8 ,
Step9 , Step10 , Step11 , Step12 , Step13 , Step14 ,Step15 , Step16 ,
Init_1to2 , Init_3to4 , Init_5to6 , Init_7to8 ,Init_9to10 , Init_10to11 ,
Init_11to12 , Init_13to14 , Init_15to16 , Pause ,Finala , Final);
signal curr_state , next_state : STATE_TYPE;
87
Page 108
A. VHDL Code
signal return_state , next_return : STATE_TYPE;signal n_fault , addr_gen_en_s : STD_LOGIC;signal output_preserv : STD_LOGIC_VECTOR (3 downto 0);signal output_preserv_en : STD_LOGIC;signal addr_gen_dir_s , bistbwen_gen_din_s : STD_LOGIC;signal pattern_s : STD_LOGIC_VECTOR (1 downto 0);signal outp_to_preserve : STD_LOGIC_VECTOR (3 downto 0);
begin
n_fault <= dout_cmp;addr_gen_dir <= addr_gen_dir_s;bistbwen_gen_din <= bistbwen_gen_din_s;pattern <= pattern_s;
STATE_UPDATE: process(clk , rst , en, next_state , next_return)begin
if (rst = ’1’) thencurr_state <= IDLE;return_state <= IDLE;
elsif (clk = ’1’ and clk ’event) thenif (en = ’1’) then
curr_state <= next_state;return_state <= next_return;
end if;end if;
end process;
NEXT_STATE_EVAL: process(curr_state , return_state , go, addr_gen_endcount , bwen_end ,n_fault)
begin
next_state <= curr_state;next_return <= return_state;
case curr_state is
when Idle => if (go = ’1’) thennext_state <= Init;
end if;when Init => next_state <= Step1a;
when Step1a => next_state <= Step1;next_return <= Step1;
when Step1 => if addr_gen_endcount = ’1’ thennext_state <= Init_1to2;next_return <= Init_1to2;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_1to2 => next_state <= Step2;next_return <= Step2;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step2 => next_state <= Step3;next_return <= Step3;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step3 => if addr_gen_endcount = ’1’ thennext_state <= Init_3to4;next_return <= Init_3to4;
elsenext_state <= Step2;next_return <= Step2;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_3to4 => next_state <= Step4;next_return <= Step4;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step4 => next_state <= Step5;next_return <= Step5;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step5 => if addr_gen_endcount = ’1’ thennext_state <= Init_5to6;next_return <= Init_5to6;
elsenext_state <= Step4;
88
Page 109
A.1 Memory Test Controller VHDL Code
next_return <= Step4;end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_5to6 => next_state <= Step6;next_return <= Step6;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step6 => next_state <= Step7;next_return <= Step7;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step7 => if addr_gen_endcount = ’1’ thennext_state <= Init_7to8;next_return <= Init_7to8;
elsenext_state <= Step6;next_return <= Step6;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_7to8 => next_state <= Step8;next_return <= Step8;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step8 => next_state <= Step9;next_return <= Step9;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step9 => if addr_gen_endcount = ’1’ thennext_state <= Init_9to10;next_return <= Init_9to10;
elsenext_state <= Step8;next_return <= Step8;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_9to10 => next_state <= Step10;next_return <= Step10;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step10 => if addr_gen_endcount = ’1’ thennext_state <= Init_10to11;next_return <= Init_10to11;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_10to11 => next_state <= Step11;next_return <= Step11;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step11 => if (addr_gen_endcount = ’1’ and bwen_end = ’1’) thennext_state <= Init_11to12;next_return <= Init_11to12;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_11to12 => next_state <= Step12;next_return <= Step12;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step12 => next_state <= Step13;next_return <= Step13;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step13 => if (addr_gen_endcount = ’1’ and bwen_end = ’1’) thennext_state <= Init_13to14;next_return <= Init_13to14;
89
Page 110
A. VHDL Code
elsenext_state <= Step12;next_return <= Step12;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_13to14 => next_state <= Step14;next_return <= Step14;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step14 => next_state <= Step15;next_return <= Step15;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step15 => if (addr_gen_endcount = ’1’ and bwen_end = ’1’) thennext_state <= Init_15to16;next_return <= Init_15to16;
elsenext_state <= Step14;next_return <= Step14;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Init_15to16 => next_state <= Step16;next_return <= Step16;if n_fault = ’0’ then
next_state <= Pause;end if;
when Step16 => if (addr_gen_endcount = ’1’ and bwen_end = ’1’) thennext_state <= Finala;next_return <= Finala;
end if;if n_fault = ’0’ then
next_state <= Pause;end if;
when Finala => next_state <= Final;next_return <= Final;if n_fault = ’0’ then
next_state <= Pause;end if;
when Final => if go = ’1’ thennext_state <= Idle;
end if;
when Pause => if (go = ’1’) thennext_state <= return_state;
end if;
when others => next_state <= Idle;
end case;end process;
----------------------------------------------------------------------------- Output preserve---------------------------------------------------------------------------
outp_to_preserve <= addr_gen_dir_s & bistbwen_gen_din_s & pattern_s;
OUTPUTPRESERVE_REG: reg_re generic map (WIDTH => 4
)port map (
CLK => clk ,CE => output_preserv_en ,RST => rst ,Din => outp_to_preserve ,Dout => output_preserv
);
----------------------------------------------------------------------------- Signal assignment statements for combinatorial outputs---------------------------------------------------------------------------
addr_gen_en_s <= not addr_gen_endcount and n_fault;
addr_gen_en_assignment:addr_gen_en <= ’1’ when (curr_state = Step1a) else
addr_gen_en_s when (curr_state = Step1) elseaddr_gen_en_s when (curr_state = Step3) elseaddr_gen_en_s when (curr_state = Step5) elseaddr_gen_en_s when (curr_state = Step7) elseaddr_gen_en_s when (curr_state = Step9) else
90
Page 111
A.1 Memory Test Controller VHDL Code
addr_gen_en_s when (curr_state = Step10) elseaddr_gen_en_s and bwen_end when (
curr_state = Step11) elseaddr_gen_en_s and bwen_end when (
curr_state = Step13) elseaddr_gen_en_s and bwen_end when (
curr_state = Step15) elseaddr_gen_en_s and bwen_end when (
curr_state = Step16) else’1’ when (
curr_state = Init_7to8) else -- forcounter wrap around
go and not addr_gen_endcount when (curr_state = Pause and return_state = Step1) else
go and not addr_gen_endcount when (curr_state = Pause and return_state = Step2) else
go and not addr_gen_endcount when (curr_state = Pause and return_state = Step4) else
go and not addr_gen_endcount when (curr_state = Pause and return_state = Step6) else
go and not addr_gen_endcount when (curr_state = Pause and return_state = Step8) else
go and not addr_gen_endcount when (curr_state = Pause and return_state =Step10) else
go and not addr_gen_endcount and bwen_endwhen (curr_state = Pause and
return_state = Step11) elsego and not addr_gen_endcount and bwen_end
when (curr_state = Pause andreturn_state = Step12) else
go and not addr_gen_endcount and bwen_endwhen (curr_state = Pause and
return_state = Step14) elsego and not addr_gen_endcount and bwen_end
when (curr_state = Pause andreturn_state = Step16) else
’0’;
addr_gen_rst_assignment:addr_gen_rst <= ’1’ when (curr_state = Init) else
’1’ when (curr_state = Init_1to2) else’1’ when (curr_state = Init_3to4) else’1’ when (curr_state = Init_9to10) else’1’ when (curr_state = Init_10to11) else’1’ when (curr_state = Init_11to12) else’1’ when (curr_state = Init_15to16) else’0’;
addr_gen_dir_s_assignment:addr_gen_dir_s <= ’1’ when (curr_state = Step7) else
’1’ when (curr_state = Init_7to8) else’1’ when (curr_state = Step9) else’1’ when (curr_state = Step15) else’0’ when (curr_state = Step1a) else’0’ when (curr_state = Step1) else’0’ when (curr_state = Step3) else’0’ when (curr_state = Step5) else’0’ when (curr_state = Step10) else’0’ when (curr_state = Step11) else’0’ when (curr_state = Step13) else’0’ when (curr_state = Step16) elseoutput_preserv (3) when (curr_state = Pause)
else’X’;
cmp_reg_en_assignment:cmp_reg_en <= ’1’ when (curr_state = Init) else
’1’ when (curr_state = Step1a) elsego when (curr_state = Pause) elsen_fault;
pattern_s_assignment:pattern_s <= "01" when (curr_state = Step1a) else
"01" when (curr_state = Step1) else"01" when (curr_state = Init_1to2) else"01" when (curr_state = Step2) else"10" when (curr_state = Step3) else"10" when (curr_state = Init_3to4) else"10" when (curr_state = Step4) else"01" when (curr_state = Step5) else"01" when (curr_state = Init_5to6) else"01" when (curr_state = Step6) else"10" when (curr_state = Step7) else"10" when (curr_state = Init_7to8) else"10" when (curr_state = Step8) else"01" when (curr_state = Step9) else"01" when (curr_state = Init_9to10) else"01" when (curr_state = Step10) else"01" when (curr_state = Init_10to11) else
91
Page 112
A. VHDL Code
"00" when (curr_state = Step11) else"00" when (curr_state = Init_11to12) else"00" when (curr_state = Step12) else"11" when (curr_state = Step13) else"11" when (curr_state = Init_13to14) else"11" when (curr_state = Step14) else"00" when (curr_state = Step15) else"00" when (curr_state = Init_15to16) else"00" when (curr_state = Step16) elseoutput_preserv (1 downto 0) when (curr_state =
Pause) else"XX";
bistwen_assignment:bistwen <= ’0’ when (curr_state = Step1a) else
’0’ when (curr_state = Step1) else’0’ when (curr_state = Step3) else’0’ when (curr_state = Step5) else’0’ when (curr_state = Step7) else’0’ when (curr_state = Step9) else’0’ when (curr_state = Step11) else’0’ when (curr_state = Step13) else’0’ when (curr_state = Step15) else’1’;
bisten_assignment:bisten <= ’1’ when (curr_state = Idle) else
’0’;
bistoen <= ’0’;bistcen <= ’0’;
bistbwen_gen_en_assignment:bistbwen_gen_en <= ’1’ when (curr_state = Init_10to11) else
’1’ when (curr_state = Step11) else’1’ when (curr_state = Step13) else’1’ when (curr_state = Step15) else’1’ when (curr_state = Step16) else’0’;
bistbwen_gen_reset_assignment:bistbwen_gen_reset <= ’1’ when (curr_state = Init) else
’0’;
bistbwen_gen_din_s_assignment:bistbwen_gen_din_s <= bwen_end when (curr_state = Step11) else
bwen_end when (curr_state = Step13)else
bwen_end when (curr_state = Step15)else
bwen_end when (curr_state = Step16)else
’1’ when (curr_state = Init_10to11)else
output_preserv (2) when (curr_state =Pause) else
’X’;
fault_dtct_assignment:fault_dtct <= ’1’ when (curr_state = Pause) else
’0’;
output_preserv_en_assignment :output_preserv_en <= ’0’ when (curr_state = Idle) else
’0’ when (curr_state = Pause) else’1’;
bistportsel_assignment:bistportsel <= ’1’ when (curr_state = Step11) else
’1’ when (curr_state =Init_11to12) else
’1’ when (curr_state =Step12) else
’1’ when (curr_state =Step13) else
’1’ when (curr_state =Init_13to14) else
’1’ when (curr_state =Step14) else
’1’ when (curr_state =Step15) else
’1’ when (curr_state =Init_15to16) else
’1’ when (curr_state =Step16) else
’0’;
bistend_assignment:bistend <= ’1’ when (curr_state = Final) else
’0’;
end Behavioral;
92
Page 113
BScripts and Configuration Files
ContentsB.1 Synopsys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4B.2 Cadence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
93
Page 114
B. Scripts and Configuration Files
B.1 Synopsys
B.1.1 Configuration Files
### ". synopsys_dc.setup" Initialization File for## Dc_Shell and Design_Analyzer#
.....
## Site -Specific Variables#
# from the System Variable Groupset link_force_case "check_reference"
set synthetic_library ""
set target_library { /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys/fsa0a_c_sc_wc.db /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_wc.db }
set link_library [concat * $target_library]
set physical_library ""
set search_path [list . ${synopsys_root }/ libraries/syn ${synopsys_root }/dw/sim_ver ${synopsys_root }/dw/syn_ver /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys ]
set command_log_file "./ command.log"set designer "Nuno Sebastiao"set company "INESC -ID"set find_converts_name_lists "false"
set symbol_library { /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys/fsa0a_c_sc.sdb /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io.sdb }
.....
B.1.2 Scripts
B.1.2.A Design Compiler Script file
################################################ Synopsys Design Compiler Script File##############################################
set reanalyze 1set jtag 1set scan 1
set version "v3.4 _io_tryB"
set design_directory "~/ synopsys/work/amepv3.4_io"
set log_directory "~/ synopsys/work/syn/log"set db_directory "~/ synopsys/work/syn/db"set report_directory "~/ synopsys/work/syn/reports"set sclib_directory "/home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys
"set iolib_directory "/home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys
"
set hdlin_enable_dft_drc_info true
set_min_library $sclib_directory/fsa0a_c_sc_wc.db -min_version $sclib_directory/fsa0a_c_sc_bc.db
set_min_library $iolib_directory/fsa0a_c_io_wc.db -min_version $iolib_directory/fsa0a_c_io_bc.db
if {$reanalyze} {
94
Page 115
B.1 Synopsys
analyze -library WORK -format vhdl "$design_directory/amep_config_pack.vhd$design_directory/amep_alias_pack.vhd $design_directory/functions_pack.vhd$design_directory/misc_logic/and.vhd $design_directory/misc_logic/or.vhd$design_directory/misc_logic/xor.vhd $design_directory/misc_logic/decoder_5bit.vhd$design_directory/misc_logic/pencoder_4bit.vhd $design_directory/misc_logic/buffer_tristate.vhd $design_directory/misc_logic/arithmetic/adder_half.vhd$design_directory/misc_logic/arithmetic/adder_full.vhd $design_directory/misc_logic/arithmetic/adder_cla_pack.vhd $design_directory/misc_logic/arithmetic/adder_cla_blockA.vhd $design_directory/misc_logic/arithmetic/adder_cla_blockB.vhd$design_directory/misc_logic/arithmetic/adder_cla.vhd $design_directory/misc_logic/arithmetic/adder_csa.vhd $design_directory/misc_logic/arithmetic/PrefixAnd.vhd$design_directory/misc_logic/arithmetic/Incrementer .vhd $design_directory/misc_logic/arithmetic/multiplier.vhd $design_directory/misc_logic/comparators/cmp_eq.vhd $design_directory/misc_logic/comparators/cmp_lt.vhd $design_directory/misc_logic/counters/cntr_re.vhd $design_directory/misc_logic/counters/cntr_circular.vhd $design_directory/misc_logic/counters/cntr_circular_ld.vhd $design_directory/misc_logic/multiplexers/mux_2to1.vhd $design_directory/misc_logic/multiplexers/mux_4to1.vhd $design_directory/misc_logic/multiplexers/mux_8to1.vhd$design_directory/misc_logic/multiplexers/mux_16to1.vhd $design_directory/misc_logic/multiplexers/mux_32to1.vhd $design_directory/misc_logic/multiplexers/muxb_4to1.vhd $design_directory/misc_logic/registers/reg_e.vhd $design_directory/misc_logic/registers/reg_re.vhd $design_directory/misc_logic/registers/reg_le.vhd$design_directory/misc_logic/registers/reg_le_const.vhd $design_directory/misc_logic/registers/reg_se.vhd $design_directory/memories/updown_counter.vhd$design_directory/memories/shift_reg.vhd $design_directory/memories/mem_bist_controller_sm.vhd $design_directory/memories/mem_bist_controller_1port.vhd$design_directory/memories/mem_bist_controller_2port.vhd $design_directory/
memories/SU180_1024X8X2BM1_WRAPPER.vhd $design_directory/memories/SJ180_2048X8X1BM1_WRAPPER.vhd $design_directory/memories/SJ180_512X8X1BM1_WRAPPER.vhd $design_directory/amep_units/sadu/cout_detector_B_block.vhd $design_directory/amep_units/sadu/cout_detector_A_block.vhd $design_directory/amep_units/sadu/cout_detector.vhd $design_directory/amep_units/sadu/sad_cmp.vhd $design_directory/amep_units/sadu/amep_sadu_lp.vhd $design_directory/amep_units/sadu/amep_sadu_parallel_adder.vhd $design_directory/amep_units/sadu/amep_sadu_parallel.vhd $design_directory/amep_units/agu_2/amep_agu_multiplier.vhd $design_directory/amep_units/agu_2/amep_agu_addr_decoder.vhd $design_directory/amep_units/agu_2/amep_agu_controller_ld_sm.vhd $design_directory/amep_units/agu_2/amep_agu_controller_ld.vhd $design_directory/amep_units/agu_2/amep_agu_controller_sad_sm.vhd $design_directory/amep_units/agu_2/amep_agu_controller_sad.vhd $design_directory/amep_units/agu_2/amep_agu.vhd$design_directory/amep_units/amep_alu.vhd $design_directory/amep_units/amep_ad_unit.vhd $design_directory/amep_core_id_decoder_sm.vhd $design_directory/amep_core_id_decoder.vhd $design_directory/amep_core_if.vhd $design_directory/amep_core_id.vhd $design_directory/amep_core_exe.vhd $design_directory/amep_core.vhd $design_directory/io_cells.vhd $design_directory/ amep_core_iocells.vhd"
if {$jtag} {
analyze -library WORK -format vhdl "$design_directory/jtag_io_cells.vhd$design_directory/amep_core_iocells_jtag.vhd"
}
}
elaborate amep_core_iocells -architecture Behavioral -library WORK
set compile_delete_unloaded_sequential_cells false
set_operating_conditions -min BCCOM -min_library fsa0a_c_sc_bc -max WCCOM -max_libraryfsa0a_c_sc_wc
set_wire_load_mode top
set_wire_load_model -name G30K -library fsa0a_c_sc_wc
set_critical_range 1 amep_core_iocells
create_clock -name "CLK_IN" -period 10 -waveform { 0 5 } { CLK }
set_clock_uncertainty 0.1 CLK_IN
set_drive 0 CLK
set_max_dynamic_power 9.5 mW
if {$scan} {
set_scan_configuration -style multiplexed_flip_flop
set test_default_strobe 40.0set test_default_strobe_width 1.0set test_default_bidir_delay 0.0set test_default_delay 0.0set test_default_period 100.0
set_dft_signal -view existing_dft -type ScanClock -port CLK -timing [list 1 21]set_dft_signal -view existing_dft -type Reset -port RST -active_state 1set_dft_signal -type ScanEnable -port test_se -hookup_pin IO_CELLS_INST/test_se_i -
active_state 1
95
Page 116
B. Scripts and Configuration Files
set_dft_signal -type ScanEnable -port test_se2 -hookup_pin IO_CELLS_INST/test_se2_i -active_state 1
set_dft_signal -type TestMode -port test_mode -hookup_pin IO_CELLS_INST/test_mode_i -active_state 1
set_dft_signal -type ScanDataOut -port test_so1 -hookup_pin IO_CELLS_INST/test_so1_iset_dft_signal -type ScanDataOut -port test_so2 -hookup_pin IO_CELLS_INST/test_so2_iset_dft_signal -type ScanDataIn -port test_si1 -hookup_pin IO_CELLS_INST/test_si1_iset_dft_signal -type ScanDataIn -port test_si2 -hookup_pin IO_CELLS_INST/test_si2_i
create_test_protocol
dft_drc
set_dft_configuration -fix_clock enable -fix_set enable -fix_reset enable
}
if {$scan} {
compile -scan -map_effort high -area_effort medium -power_effort high
} else {
compile -map_effort high -area_effort medium -power_effort high
}
report_constrain -all_violators
if {$scan} {
create_test_protocol
dft_drc
set_scan_configuration -replace false
set_scan_path chain1 -view spec -include_elements {AMEP_CORE_INST/FETCH/CODE_RAM/BIST_CTRL/CURR_ADDRESS_REG AMEP_CORE_INST/EXECUTE/AGU/SA_MEMORY/BIST_CTRL/CURR_ADDRESS_REG AMEP_CORE_INST/EXECUTE/AGU/MB_MEMORY/BIST_CTRL/CURR_ADDRESS_REG} -complete true -scan_enable test_se -scan_data_in test_si1 -scan_data_out test_so1
set_scan_path chain2 -scan_enable test_se2 -scan_data_in test_si2 -scan_data_outtest_so2
set_dft_configuration -fix_clock enable -fix_set enable -fix_reset enable
set_dft_signal -type TestData -port CLKset_dft_signal -type TestData -port RST
set_autofix_configuration -type reset -test_data RST
preview_dft -show all
dft_drc -v
insert_dft
report_constrain -all_violators
dft_drc -v
report_scan_path
estimate_test_coverage
set test_stil_netlist_format verilog
set version "scan_$version"
if {$jtag} {
set version "jtag_$version"
remove_cell {IO_CELLS_INST/ram_bisten0_iocell IO_CELLS_INST/ram_bisten1_iocellIO_CELLS_INST/test_se_iocell IO_CELLS_INST/test_se2_iocell IO_CELLS_INST/test_si1_iocell IO_CELLS_INST/test_si2_iocell IO_CELLS_INST/test_so1_iocellIO_CELLS_INST/test_so2_iocell}
remove_net {IO_CELLS_INST/RAM_BISTEN_i [0] IO_CELLS_INST/RAM_BISTEN_i [1]IO_CELLS_INST/test_se_i IO_CELLS_INST/test_se2_i IO_CELLS_INST/test_si1_iIO_CELLS_INST/test_si2_i IO_CELLS_INST/test_so1_i IO_CELLS_INST/test_so2_i}
connect_net IO_CELLS_INST/RAM_BISTEN [0] IO_CELLS_INST /RAM_BISTEN_i [0]connect_net IO_CELLS_INST/RAM_BISTEN [1] IO_CELLS_INST /RAM_BISTEN_i [1]connect_net IO_CELLS_INST/test_se IO_CELLS_INST/test_se_iconnect_net IO_CELLS_INST/test_se2 IO_CELLS_INST/test_se2_iconnect_net IO_CELLS_INST/test_si1 IO_CELLS_INST/test_si1_iconnect_net IO_CELLS_INST/test_si2 IO_CELLS_INST/test_si2_iconnect_net IO_CELLS_INST/test_so1 IO_CELLS_INST/test_so1_iconnect_net IO_CELLS_INST/test_so2 IO_CELLS_INST/test_so2_i
96
Page 117
B.1 Synopsys
elaborate amep_core_iocells_jtag -architecture Structural -library WORK
current_design ./ amep_core_iocells_jtag.db:amep_core_iocells_jtag
set_dft_configuration -bsd enable -scan disable
set_dont_touch amep_core_iocells
set synthetic_library {dw_foundation.sldb}set link_library [ concat $target_library $synthetic_library *]
set_operating_conditions -min BCCOM -min_library fsa0a_c_sc_bc -max WCCOM -max_library fsa0a_c_sc_wc
set_wire_load_mode top
set_wire_load_model -name G30K -library fsa0a_c_sc_wc
set_critical_range 1 amep_core_iocells
create_clock -name "CLK_IN" -period 10 -waveform { 0 5 } { CLK }
set_clock_uncertainty 0.1 CLK_IN
set_drive 0 CLK
set_dft_signal -view existing_dft -type TCK -port tck -timing {10 30}
set_max_dynamic_power 9.5 mW
disconnect_net -all *Logic0*
set_dft_signal -view spec -type tck -port tckset_dft_signal -view spec -type tdi -port tdiset_dft_signal -view spec -type tdo -port tdoset_dft_signal -view spec -type tms -port tmsset_dft_signal -view spec -type trst -port trst
read_pin_map $design_directory/amep_package1.map
set_bsd_configuration -style synchronous -instruction_encoding binary -ir_width 4 -asynchronous_reset true -check_pad_designs all -control_cell_max_fanout 3
set_bsd_instruction {IDCODE} -register DEVICE_ID -capture_value {16’ h13333111}-code {0110}
set_bsd_instruction {HIGHZ} -code {0111}
set_bsd_instruction {SELECTSAMEM} -register BYPASS -code {1001} -inst_enable {AMEP_CORE_IOCELLS_INST/RAM_BISTEN [0]}
set_bsd_instruction {SELECTMBMEM} -register BYPASS -code {1010} -inst_enable {AMEP_CORE_IOCELLS_INST/RAM_BISTEN [1]}
set_bsd_instruction {SELECTINSTMEM} -register BYPASS -code {1011} -inst_enable{AMEP_CORE_IOCELLS_INST/RAM_BISTEN [1] AMEP_CORE_IOCELLS_INST/RAM_BISTEN [0]}
set_bsd_instruction {EXTEST} -code {0010}
set_bsd_instruction {SAMPLE PRELOAD} -code {0011}
preview_dft -bsd allinsert_dft
write_bsdl -output $db_directory/amep_compiled_$version.bsdl
check_bsd
create_bsd_patterns -effort high -type functional
report_constrain -all_violators
}
write_test_protocol -output $db_directory/amep_compiled_$version.spf
}
write -hierarchy -format ddc -output $db_directory/amep_compiled_${version}_clk_corrected.ddc
write_sdc $db_directory/amep_compiled_$version.sdc
report_power -analysis_effort high > $report_directory/ power_$version.rpt
97
Page 118
B. Scripts and Configuration Files
report_area -hierarchy -nosplit > $report_directory/area_$version.rpt
report_constraint -significant_digits 2 > $report_directory/constraints_$version.rpt
report_timing -path full -delay max -nworst 1 -max_paths 1 -significant_digits 2 -sort_by group> $report_directory/timing_$version.rpt
change_names -hierarchy -rules verilog
write -hierarchy -format verilog -output $db_directory/amep_compiled_$version.v
exit
B.1.2.B Tetramax Script file
######################################## TetraMAX Script File######################################
read netlist /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tetramax/fsa0a_c_sc_tmax.lib -library
read netlist /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/tetramax/fsa0a_c_io_tmax.lib -library
read netlist /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1.tmax -library
read netlist /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1.tmax -library
read netlist /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1.tmax -library
read netlist /home/ncas/synopsys/work/syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.v
run build_model amep_core_iocells_jtag
set drc /home/ncas/synopsys/work/syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.spfrun drc
set atpg -full_seq_time 0set atpg -abort_limit 30
remove faults -all
add faults -all
set faults -fault_coverage
run atpg basic_scan -ndetects 2
run atpg fast_sequential_only
run atpg full_sequential_only
report faults -summary
write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.bin -internal -format binary
write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.vhdl -internal -format vhdl93 -serial
write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.parallel.stil99 -internal -format stil99 -nopatinfo -parallel 0 -nocore
write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.serial.stil99 -internal -format stil99 -nopatinfo -serial -nocore
write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.parallell.v -internal -format verilog_single_file -parallel 0
write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.serial.v -internal-format verilog_single_file -serial
set patterns external /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.serial.stil99
remove faults -all
add faults -all
run fault_sim -sequential -nodrop_faults -ndetects 1
98
Page 119
B.2 Cadence
B.2 Cadence
B.2.1 Configuration Files
B.2.1.A Configuration file for importing the design to Encoun ter
################################################# ## FirstEncounter Input configuration file ## ################################################## Created by First Encounter v04.10- s374_1 on Fri May 11 14:18:51 2007global rda_Inputset cwd /home/ncas/synopsys/work/cadenceset rda_Input(import_mode) {-treatUndefinedCellAsBbox 0 -keepEmptyModule 1 }set rda_Input(ui_netlist) "../ syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB_uniquified.v"set rda_Input(ui_netlisttype) {Verilog}set rda_Input(ui_ilmlist) {}set rda_Input(ui_ilmspef) {}set rda_Input(ui_settop) {1}set rda_Input(ui_topcell) {amep_core_iocells_jtag}set rda_Input(ui_celllib) {}set rda_Input(ui_iolib) {}set rda_Input(ui_areaiolib) {}set rda_Input(ui_blklib) {}set rda_Input(ui_kboxlib) {}set rda_Input(ui_gds_file) {}set rda_Input(ui_oa_oa2lefversion) {}set rda_Input(ui_timelib ,min) "/home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/
SU180_1024X8X2BM1/SU180_1024X8X2BM1_BC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1_BC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1_BC.lib /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tlf/fsa0a_c_sc_bc.tlf /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_bc.lib"
set rda_Input(ui_timelib ,max) "/home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1_WC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1_WC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1_WC.lib /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tlf/fsa0a_c_sc_wc.tlf /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_wc.lib"
set rda_Input(ui_timelib) "/home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1_TC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1_TC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1_TC.lib /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tlf/fsa0a_c_sc_tc.tlf /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_tc.lib"
set rda_Input(ui_smodDef) {}set rda_Input(ui_smodData) {}set rda_Input(ui_dpath) {}set rda_Input(ui_tech_file) {}set rda_Input(ui_io_file) {}set rda_Input(ui_timingcon_file) "../ syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.sdc"set rda_Input(ui_latency_file) {}set rda_Input(ui_scheduling_file) {}set rda_Input(ui_buf_footprint) {}set rda_Input(ui_delay_footprint) {}set rda_Input(ui_inv_footprint) {}set rda_Input(ui_leffile) "/home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/BackEnd/lef
/header6_V55.lef /home2/ncas/asiclibs/UMC18/faraday/ mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1.lef /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1.lef /home2/ncas/ asiclibs/UMC18/faraday/mem_files /7june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1.lef /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c/2005 Q4v1 .2/SC/BackEnd/lef/fsa0a_c_sc.lef /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005Q4v1 .2/IO/BackEnd/lef/fsa0a_c_io.lef"
set rda_Input(ui_cts_cell_footprint) {}set rda_Input(ui_cts_cell_list) {}set rda_Input(ui_core_cntl) {aspect}set rda_Input(ui_aspect_ratio) {0.441368}set rda_Input(ui_core_util) {0.20228}set rda_Input(ui_core_height) {935.0}set rda_Input(ui_core_width) {2649.44}set rda_Input(ui_core_to_left) {40.56}set rda_Input(ui_core_to_right) {40.0}set rda_Input(ui_core_to_top) {40.0}set rda_Input(ui_core_to_bottom) {40.0}set rda_Input(ui_max_io_height) {0}set rda_Input(ui_row_height) {5.04}set rda_Input(ui_isHorTrackHalfPitch) {0}set rda_Input(ui_isVerTrackHalfPitch) {1}set rda_Input(ui_ioOri) {R0}set rda_Input(ui_isOrigCenter) {0}set rda_Input(ui_exc_net) {}set rda_Input(ui_delay_limit) {1000}set rda_Input(ui_net_delay) {1000.0 ps}set rda_Input(ui_net_load) {0.5pf}set rda_Input(ui_in_tran_delay) {0.1ps}set rda_Input(ui_captbl_file) "/home2/ncas/asiclibs/UMC18/UMC18_1P6M_MMC/umc18MMC.capTbl"set rda_Input(ui_defcap_scale) {1.0}set rda_Input(ui_detcap_scale) {1.0}
99
Page 120
B. Scripts and Configuration Files
set rda_Input(ui_xcap_scale) {1.0}set rda_Input(ui_res_scale) {1.0}set rda_Input(ui_shr_scale) {1.0}set rda_Input(ui_time_unit) {none}set rda_Input(ui_cap_unit) {}set rda_Input(ui_oa_reflib) {}set rda_Input(ui_oa_abstractname) {}set rda_Input(ui_oa_layoutname) {}set rda_Input(ui_sigstormlib) {}set rda_Input(ui_cdb_file) {}set rda_Input(ui_echo_file) {}set rda_Input(ui_xilm_file) {}set rda_Input(ui_qxtech_file) {}set rda_Input(ui_qxlib_file) {}set rda_Input(ui_qxconf_file) {}set rda_Input(ui_pwrnet) {VCC}set rda_Input(ui_gndnet) {GND}set rda_Input(flip_first) {1}set rda_Input(double_back) {1}set rda_Input(assign_buffer) {1}set rda_Input(ui_pg_connections) ""set rda_Input(ui_gen_footprint) {0}
B.2.1.B I/O assignment file
####################################################### ## Silicon Perspective Corp. ## FirstEncounter IO Assignment ## #######################################################
Version: 1
### NORTH SIDE ###
Orient: R180Offset: 140.12
Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_3 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_4 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_5 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_6 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_7 NSkip: 104.78Pad: IO_VCC_0 N VCC3IODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_8 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_9 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_10 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_11 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_12 NSkip: 104.78Pad: IO_GND_0 N GNDIODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_13 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_14 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_15 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_16 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_17 N
### WEST SIDE ###
Orient: R270Offset: 140.12
Pad: JTAG_IO_CELLS_INST/tdi_iocell WSkip: 227.54Pad: CORE_VCC_0 W VCCKDSkip: 130.82Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/clk_iocell WSkip: 130.82Pad: CORE_GND_0 W GNDKDSkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_0 WSkip: 34.1
100
Page 121
B.2 Cadence
Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_1 WSkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_2 W
### SOUTH SIDE ###
Orient: R0Offset: 140.12
Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/test_mode_iocell SSkip: 104.78Pad: JTAG_IO_CELLS_INST/trst_iocell SSkip: 104.78Pad: JTAG_IO_CELLS_INST/tms_iocell SSkip: 104.78Pad: JTAG_IO_CELLS_INST/tdo_iocell SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/gnt_iocell SSkip: 104.78Pad: IO_GND_1 S GNDIODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/req_iocell SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/done_iocell SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_0 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_1 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_2 SSkip: 104.78Pad: IO_VCC_1 S VCC3IODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_3 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_4 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_5 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_6 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_7 S
### EAST SIDE ###
Orient: R90Offset: 140.12
Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/rst_iocell ESkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/en_iocell ESkip: 130.82Pad: CORE_GND_1 E GNDKDSkip: 130.82Pad: JTAG_IO_CELLS_INST/tck_iocell ESkip: 130.82Pad: CORE_VCC_1 E VCCKDSkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/oe_nwe_iocell ESkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_19 ESkip: 38.44Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_18 E
Orient: R0Pad: NE_CORNER NE CORNERD
Orient: R90Pad: NW_CORNER NW CORNERD
Orient: R180Pad: SW_CORNER SW CORNERD
Orient: R270Pad: SE_CORNER SE CORNERD
B.2.1.C Clock Tree Synthesis configuration file
## FirstEncounter(TM) Clock Synthesis Technology File Format#
AutoCTSRootPin CLKRouteClkNet YESNoGating NO
DetailReport YESSetDPinAsSync NOPostOpt YES
101
Page 122
B. Scripts and Configuration Files
OptAddBuffer YES
MaxDelay 1.5nsMinDelay 0ns # default value
MaxDepth 8Buffer BUF12CK BUF8CK BUF6CK BUF4CK BUF3CK BUF2CK BUF1CK INV12CK INV8CK INV6CK INV4CK INV3CK
INV2CK INV1CK DELA DELB DELC DLY1 DLY2 DLY3 DLY4
MaxSkew 300psSinkMaxTran 150psBufMaxTran 150ps
ExcludedPin+ amep_core_iocells_jtag_BSR_top_inst/amep_core_iocells_jtag_data_in_1
ThroughPin
End
B.2.2 Scripts
############################################# Cadence Encounter Script File###########################################
loadConfig /home/ncas/synopsys/work/cadence/amep_compiled_jtag_scan_v3 .4 _io_tryB.conf 0
setUIVar rda_Input ui_cts_cell_list {BUF1CK BUF2CK BUF3CK BUF4CK BUF6CK BUF8CK BUF12CK INV1CKINV2CK INV3CK INV4CK INV6CK INV8CK INV12CK}
setUIVar rda_Input ui_delay_footprint I+OIsetUIVar rda_Input ui_buf_footprint I+OIsetUIVar rda_Input ui_inv_footprint I+O!I
commitConfig
floorPlan -d 3029.94 1314.40 50 50 50 50fit
loadiofile "../ syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.io"
createRouteBlk -box 140.74 140.74 2889.20 1173.60 -layer 6
removeBufferTree
addRing -spacing_bottom 0.8 -width_left 20 -width_bottom 20 -width_top 20 -spacing_top 0.8 -layer_bottom metal5 -center 1 -stacked_via_top_layer metal5 -width_right 20 -around core -jog_distance 0.8 -offset_bottom 0.8 -layer_top metal5 -threshold 0.8 -offset_left 0.8 -spacing_right 0.8 -spacing_left 0.8 -offset_right 0.8 -offset_top 0.8 -layer_right metal4 -nets {GND VCC } -stacked_via_bottom_layer metal1 -layer_left metal4
clearGlobalNetsglobalNetConnect GND -type pgpin -pin GND -inst *globalNetConnect VCC -type pgpin -pin VCC -inst *globalNetConnect VCC -type tiehiglobalNetConnect GND -type tielo
amoebaPlace -timingdrivencheckPlace
setObjFPlanBox Instance AMEP_CORE_IOCELLS_INST/AMEP_CORE_INST/EXECUTE/AGU/MB_MEMORY/RAM_512_8577.641 230.099 959.561 589.699
setObjFPlanBox Instance AMEP_CORE_IOCELLS_INST/AMEP_CORE_INST/EXECUTE/AGU/SA_MEMORY/RAM_2048_8849.956 704.641 1815.396 1088.421
setObjFPlanBox Instance AMEP_CORE_IOCELLS_INST/AMEP_CORE_INST/FETCH/CODE_RAM/RAM_1K_16 2173.549230.06 2656.269 589.04
setBlockPlacementStatus -name AMEP_CORE_IOCELLS_INST/ AMEP_CORE_INST/FETCH/CODE_RAM/RAM_1K_16 -status preplaced
setBlockPlacementStatus -name AMEP_CORE_IOCELLS_INST/ AMEP_CORE_INST/EXECUTE/AGU/SA_MEMORY/RAM_2048_8 -status preplaced
setBlockPlacementStatus -name AMEP_CORE_IOCELLS_INST/ AMEP_CORE_INST/EXECUTE/AGU/MB_MEMORY/RAM_512_8 -status preplaced
addRing -spacing_bottom 2 -width_left 10 -width_bottom 10 -width_top 10 -spacing_top 2 -layer_bottom metal5 -stacked_via_top_layer metal5 -width_right 10 -around each_block -jog_distance 0.44 -layer_top metal5 -threshold 0.44 -spacing_right 2 -spacing_left 2 -offset_bottom 3 -offset_left 3 -offset_right 3 -offset_top 3 -type block_rings -layer_rightmetal4 -nets {GND VCC } -stacked_via_bottom_layer metal1 -layer_left metal4
addStripe -block_ring_top_layer_limit metal4 -max_same_layer_jog_length 0.88 -padcore_ring_bottom_layer_limit metal4 -set_to_set_distance 200 -stacked_via_top_layer
102
Page 123
B.2 Cadence
metal5 -padcore_ring_top_layer_limit metal4 -spacing 1 -xleft_offset 100 -xright_offset 100-merge_stripes_value 0.44 -layer metal4 -block_ring_bottom_layer_limit metal4 -width 10 -
nets {GND VCC } -stacked_via_bottom_layer metal1
amoebaPlace -timingdriven -highEffortcheckPlacesetDrawMode place
reclaimArea
timeDesign -preCTSoptDesign -preCTS
setCTSMode -useCTSRouteGuide
specifyClockTree -clkfile amep_compiled_jtag_scan_v3 .4 _io_tryB.ctstch
createSaveDir amep_core_iocells_ctsckSynthesis -rguide amep_core_iocells_cts/amep_core_iocells_cts.guide -report
amep_core_iocells_cts/amep_core_iocells_cts.ctsrpt -forceReconvergentsaveClockNets -output amep_core_iocells_cts/amep_core_iocells_cts.ctsntfsaveNetlist amep_core_iocells_cts/amep_core_iocells_cts.vsavePlace amep_core_iocells_cts/amep_core_iocells_cts.place
timeDesign -postCTSoptDesign -postCTS -setup -hold
addFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLERaddFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLERaddFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLERaddFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLER
addIoFiller -cell EMPTY16D -prefix IOFILLERaddIoFiller -cell EMPTY8D -prefix IOFILLERaddIoFiller -cell EMPTY4D -prefix IOFILLERaddIoFiller -cell EMPTY2D -prefix IOFILLERaddIoFiller -cell EMPTY1D -prefix IOFILLER
sroute -jogControl { preferWithChanges differentLayer }
getNanoRouteMode -quietgetNanoRouteMode -quiet envSuperThreadingsetNanoRouteMode -quiet -drouteFixAntenna truesetNanoRouteMode -quiet -routeInsertAntennaDiode falsesetNanoRouteMode -quiet -routeReInsertFillerCellList filler_cell_list.txtsetNanoRouteMode -quiet -timingEngine CTEsetNanoRouteMode -quiet -routeWithTimingDriven truesetNanoRouteMode -quiet -routeWithEco falsesetNanoRouteMode -quiet -routeWithSiDriven truesetNanoRouteMode -quiet -routeTdrEffort 5setNanoRouteMode -quiet -routeSiEffort normalsetNanoRouteMode -quiet -routeWithSiPostRouteFix falsesetNanoRouteMode -quiet -drouteAutoStop truesetNanoRouteMode -quiet -routeSelectedNetOnly falsesetNanoRouteMode -quiet -drouteStartIteration defaultsetNanoRouteMode -quiet -envNumberProcessor 1setNanoRouteMode -quiet -drouteEndIteration default
setNanoRouteMode -drouteUseViaOfCut 4setNanoRouteMode -drouteUseBiggerOverhangViaFirst truesetNanoRouteMode -drouteOptimizeUseMultiCutVia true
trialRoute -handlePreroutesetCteReportwriteDesignTiming .timing_file.tiffreeTimingGraphglobalDetailRoute
clearDrcverifyGeometryeditDeleteViolations
setNanoRouteMode -quiet -routeWithTimingDriven falsesetNanoRouteMode -quiet -routeWithEco truesetNanoRouteMode -quiet -routeWithSiDriven falseglobalDetailRoute
clearDrcverifyGeometry
timeDesign -postRouteoptDesign -postRoute -setup -hold
setOpCond -maxLibrary fsa0a_c_sc_wc -max WCCOM -minLibrary fsa0a_c_sc_bc -min BCCOM
setExtractRCMode -detail -rcdb amep_core_iocells.rcdb -relative_c_t 0.00999999977648 -total_c_t5.0 -reduce 5 -noise
extractRC -outfile amep_core_iocells.capsetDelayCalMode -signalStormdelayCal -sdf amep_core_iocells.sdf
setAnalysisMode -setup -async -skew -clockTreebuildTimingGraph
103
Page 124
B. Scripts and Configuration Files
reportSlacks -outfile amep_core_iocells.slkrptSlackClkDomain -infile amep_core_iocells.slk
autoFetchDCSources VCCautoFetchDCSources GNDsavePadLocation -outfile /home/ncas/synopsys/work/cadence/amep_core_iocells_jtag.pp
saveToggleProbability -outfile /home/ncas/synopsys/work/cadence/amep_core_iocells_jtag.pp {CLK_IN 100.000 0.450}
updatePower -irDropAnalysis average -postCTS -toggleFile amep_core_iocells_jtag.tg -padamep_core_iocells_jtag.pp -report power -reportInstanceVoltage instance.voltage -reportInstancePower instance.power -reportRailAnalysis power.graph -mode floorplan VCC
saveDesign amep_core_iocells_jtag.encstreamOut amep_core_iocells_jtag -mapFile /home2/ncas/ asiclibs/UMC18/GDSstreamOut.map -libName
DesignLib -structureName amep_core_iocells_jtag -stripes 1 -units 1000 -mode ALLdefOut -floorplan -netlist -routing amep_core_iocells_jtag.defsaveNetlist amep_core_iocells_jtag_final .v
104