Dissertaç ˜ao para obtenç ˜ao do Grau de Mestre em ...

Motion Estimation Processor for Last Generation

Mobile Devices

Nuno Carlos Andr e Sebasti ao

Dissertacao para obtencao do Grau de Mestre em

Engenharia Electrot ecnica e de Computadores

JuriPresidente: Doutor Jose Antonio Beltran GeraldOrientador: Doutor Paulo Ferreira Godinho FloresVogais: Doutor Leonel Augusto Pires Seabra de Sousa

Mestre Nuno Filipe Valentim Roma

Outubro 2007

Acknowledgments

I could not have accomplished this project without the help and support of some people to

whom I want to express my gratitude. First of all, my supervisor, Prof. Paulo Flores, for all of

his work, help and suggestions, that made this project possible, and also for the time spent into

reviewing this dissertation. I also want to thank my co-supervisor, MSc Nuno Roma, for his time,

encouragement and suggestions throughout this project and on reviewing this dissertation.

I also want to thank MSc Tiago Dias for his support during the project, and all of the remaining

researchers of the SIPS group at INESC-ID, Lisboa, that in someway or another helped me in

performing my work.

Finally, I want to thank my family for all of their support and understanding throughout my entire

life, but especially in these last few months.

Agradecimentos

Nao teria conseguido realizar este projecto sem a ajuda e apoio de algumas pessoas, as

quais quero expressar a minha gratidao. Em primeiro lugar, ao meu orientador, Prof. Paulo

Flores, pelo seu trabalho, ajuda e sugestoes, que tornaram este projecto possıvel, e ainda pelo

tempo dispendido a rever esta dissertacao. Quero tambem agradecer ao meu co-orientador,

Mestre Nuno Roma, pelo tempo, encorajamento e sugestoes durante o decorrer deste projecto e

ao rever esta dissertacao.

Quero tambem agradecer ao Mestre Tiago Dias pelo seu apoio durante este projecto, e a

todos os restantes investigadores do grupo SIPS no INESC-ID, Lisboa, que de alguma forma me

ajudaram a desempenhar o meu trabalho.

Por fim, quero agradecer a minha famılia por todo o apoio e compreensao ao longo da minha

vida, mas especialmente ao longo destes ultimos meses.

iii

Abstract

The use of video coding in battery-supplied platforms has lead to the development of efficient

low-power video coding systems. Moreover, the most computationally expensive part of video

coding is motion estimation. Therefore, an efficient processor for motion estimation (the Adaptive

Motion Estimation Processor (AMEP)) was previously proposed and implemented in an FPGA.

This dissertation focus on the Application Specific Integrated Circuit (ASIC) implementation of the

AMEP processor and the required design changes so it can be efficiently tested after manufac-

turing. The processor was described using the VHDL language and implemented in the UMC

CMOS 0.18µm 1P6M technology process using a standard cell library from Faraday Technology

Corporation.

Dedicated test structures were added to the circuit to allow the verification of the manufactured

circuit. A dedicated test controller, to efficiently test the included memories in the processor, was

developed and implemented. Two scan chains were built, to provide an efficient way to input test

vectors to verify the correct operation of the circuit’s internal logic. The IEEE 1149.1 standard

(JTAG) was also implemented to allow the test of the circuit’s interconnections when it is included

in a board and to provide a standard interface for circuit testing. In particular, it allows the control

of the internal memory Built-In Self Test (BIST) controller that was developed.

The layout of the circuit was obtained using EDA tools for the tasks of synthesis, placement,

routing, clock tree generation, power planning and others that are required to achieve a fully

functional layout that implements the function described in the VHDL source. Analysis of the

obtained layout indicated that the AMEP is able to work at a maximum clock frequency of 100MHz

consuming only 14.5mW which makes it suitable for motion estimation in battery-supplied devices

Keywords

Motion Estimation Dedicated Processor; Application Specific Intgerated Circuit (ASIC) Imple-

mentation; Standard Cell Library; Design for Test (DFT) Techniques; Memory Test (BIST)

v

Resumo

A utilizacao de codificacao de vıdeo em dispositvos alimentados por baterias tem levado ao

desenvolvimento de sistemas eficientes de codificacao de vıdeo de baixo consumo. Adicional-

mente, a parte computacional mais exigente da codificacao de vıdeo e a estimacao de movi-

mento. Por esse motivo, um processador eficiente para estimacao de movimento (“AMEP”)

foi anteriormente proposto e implementado numa FPGA. Esta dissertacao incide sobre a

implementacao em ASIC deste processador e nas necessarias alteracoes para permitir que seja

eficientemente testado depois da sua fabricacao. O processador foi descrito usando a linguagem

VHDL e implementado na tecnologia CMOS 0.18µm 1P6M da UMC usando uma biblioteca de

celulas padrao da Faraday Technology Corporation.

Estruturas dedicadas para teste foram adicionadas ao circuito para permitirem a verificacao

do circuito fabricado. Um controlador de teste dedicado, para testar eficientemente as memorias

incluıdas no processador, foi desenvolvido e implementado. Duas cadeias de “scan” foram con-

struıdas, por forma a permitirem um processo eficiente para introduzir os vectores de teste e

verificar o correcto funcionamento da logica interna do circuito. A norma IEEE 1149.1 (JTAG)

foi tambem implementada para permitir o teste das interligacoes quando este circuito estiver

integrado numa placa de sistema e para proporcionar um interface padronizado para teste de cir-

cuitos. Em particular, permite comandar os controladores de teste interno (“BIST”) das memorias,

que foram desenvolvidos.

O desenho final do circuito foi obtido usando ferramentas EDA para as tarefas de sıntese,

colocacao, encaminhamento, geracao da arvore de relogio, planeamento da alimentacao e outras

que sao necessarias para obter um desenho totalmente funcional que implementa as funcoes

descritas no VHDL. A analise do desenho obtido indicou que o AMEP consegue funcionar a uma

frequencia de relogio de 100MHz consumindo apenas 14.5mW o que o torna adequado para a

estimacao de movimento em dispositivos alimentados por baterias.

Palavras Chave

Processador Dedicado para Estimacao de Movimento; Implementacao em ASIC; Biblioteca

de celulas padrao; Tecnicas de Projecto para Teste; Teste de Memorias

vii

Contents

1 Introduction 1

1.1 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Processor Arquitecture 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Motion Estimation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Design for Test 13

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Circuit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Automatic Test Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 Observability and Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5 Scan Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6 JTAG Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.7 Memory Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 ASIC Design 27

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Foundry and Technology Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Library and Technology Characterization . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4.1 Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.5 Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.6 Pin Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 FrontEnd - From Behavioral VHDL code to Verilog netlist 41

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Design Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.2 DFT Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2.3 BSD Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.4 TetraMAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

ix

Contents

5.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3.1 Basic workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3.2 Workflow with insertion of scan chains . . . . . . . . . . . . . . . . . . . . . 50

5.3.3 Workflow with JTAG insertion . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3.4 Workflow for test generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 BackEnd - From Verilog netlist to GDS Layout 59

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2.1 First Encounter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2.2 NanoRoute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7 Results 69

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

8 Conclusions 73

8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A VHDL Code 79

A.1 Memory Test Controller VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . 80

B Scripts and Configuration Files 93

B.1 Synopsys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

B.1.1 Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

B.1.2 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

B.1.2.A Design Compiler Script file . . . . . . . . . . . . . . . . . . . . . . 94

B.1.2.B Tetramax Script file . . . . . . . . . . . . . . . . . . . . . . . . . . 98

B.2 Cadence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B.2.1 Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B.2.1.A Configuration file for importing the design to Encounter . . . . . . 99

B.2.1.B I/O assignment file . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

B.2.1.C Clock Tree Synthesis configuration file . . . . . . . . . . . . . . . . 101

B.2.2 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

x

List of Figures

2.1 Composition of a macroblock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Current and previous frames used in motion estimation. . . . . . . . . . . . . . . . 8

2.3 AMEP Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 AMEP external interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Video coding platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1 D type Flip-Flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 JTAG Basic Boundary Scan Cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 JTAG Boundary Shift Register and TAP controller connections. . . . . . . . . . . . 21

3.4 TAP state machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Implemented March Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6 Simplified Memory BIST Controller architecture. . . . . . . . . . . . . . . . . . . . 25

4.1 Generic workflow for ASIC design. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 I/O cell and pad combinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Bonding pad layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4 Power rings for I/O buffers and core cells. . . . . . . . . . . . . . . . . . . . . . . . 36

4.5 Memory interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.6 Diagram of I/O cells position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 Top level design structure required by BSD Compiler. . . . . . . . . . . . . . . . . . 46

5.2 Synopsys Basic Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3 Synopsys Workflow with scan structures. . . . . . . . . . . . . . . . . . . . . . . . 51

5.4 AMEP interface after inserting scan chains. . . . . . . . . . . . . . . . . . . . . . . 53

5.5 Synopsys JTAG Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.6 AMEP interface for JTAG insertion by BSD Compiler. . . . . . . . . . . . . . . . . . 55

5.7 Synopsys TetraMAX ATPG Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.1 Design flow for Encounter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Die block size, floorplan and core size. . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.1 AMEP chip layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

xi

List of Figures

xii

List of Tables

2.1 AMEP Instruction Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 AMEP Instruction Set Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Faraday’s FSA0A C Standard Cell Library General Characteristics. . . . . . . . . . 33

4.2 I/O cell dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Cost Function default priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1 Cadence tools versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.1 Results from synthesis tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.2 Layout results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.3 Power analysis results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xiii

List of Tables

xiv

List of Acronyms

AGU : Address Generation Unit

AMEP : Adaptive Motion Estimation Processor

ASIC : Application Specific Integrated Circuit

ASIP : Application Specific Instruction Set Processor

ATE : Automatic Test Equipment

ATPG : Automatic Test Pattern Generation

BIST : Built-In Self Test

BSDL : Boundary Scan Description Language

BSR : Boundary Scan Register

CAD : Computer Aided Design

CLCC : Ceramic Leadless Chip Carrier

CMOS : Complementary Metal Oxide Semiconductor

CTS : Clock Tree Synthesis

DFT : Design For Test

DRC : Design Rule Check

ECO : Engineering Change Order

EDA : Electronic Design Automation

ESD : Electrostatic Discharge

FAN : Fanout Oriented

FPGA : Field Programmable Gate Array

FSBM : Full Search Block Matching

GDS : Graphic Data System

GTECH : Generic Technology

GUI : Graphical User Interface

xv

List of Acronyms

HDL : Hardware Description Language

IC : Integrated Circuit

IDDQ : quiescent supply current

IEEE : Institute of Electrical and Electronics Engineers

ISA : Instruction Set Architecture

I/O : Input/Output

JTAG : Joint Test Action Group

LEF : Library Exchange Format

LSSD : Level Sensitive Scan Design

LVS : Layout versus Schematic

ME : Motion Estimation

MPEG : Moving Picture Experts Group

MVFAST : Motion Vector Field Adaptive Search Technique

PIOS : Programmable I/O on Silicon

PODEM : Path Oriented Decision Making

P&R : Place and Route

PCB : Printed Circuit Board

RAM : Random Access Memory

RAPS : Random Path Sensitization

RTL : Register Transfer Level

RISC : Reduced Instruction Set Computer

SRAM : Static RAM

SAD : Sum of Absolute Differences

SADU : SAD Unit

SDC : Synopsys Design Contraints

SDF : Standard Delay File

SSF : Single Stuck-at Fault

STIL : Standard Test Interface Language

TAP : Test Access Port

TCK : Test Clock input

xvi

List of Acronyms

TDI : Test Data Input

TDO : Test Data Output

TMS : Test Mode Select

TRST : Test Reset

VHDL : VHSIC Hardware Description Language

VHSIC : Very High Speed Integrated Circuit

VLSI : Very Large Scale Integration

xvii

List of Acronyms

xviii

1Introduction

Contents1.1 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1

1. Introduction

The demand on consumer products to include realtime video communications and video

recording capabilities has been increasing over the last years. This trend is mainly possible due

to the use of digital video. However, using uncompressed digital video would not allow these

capabilities to exist in consumer products due to the requirements of high bandwith and storage

space. Video coding and compression techniques are essential to reduce the bandwidth and

space requirements of video communication and video storage. As a consequence, video coding

systems have been assuming an increasingly important role in personal communications, wire-

less multimedia and remote video-surveillance. Meanwhile, the MPEG-4 and the H.264 video

coding standards were established, to face these requirements in terms of image quality and

bandwidth. However, with such a wide range of target applications imposing quite different con-

straints, such as power consumption or computational resources, the specific implementation of

these standards has been carried out either through pure-software, pure hardware or a mixture of

both. As an example, the low-power constraints are mandatory requirements in battery-supplied

portable devices, such as 3G mobile phones, PDA’s and remote assistance devices.

The high compression rates involved in these new technologies impose the use of prediction

techniques to minimize temporal redundancy. In particular, the motion compensation prediction

mechanism, constructs a prediction of the current frame by using the blocks from previous frames.

Basically, a block of the current picture is predicted by translating a block, from the previous im-

age, by a given motion vector. Motion estimation is the process by which this motion vector is

determined. This is the most computationally expensive part in most of the current compres-

sion formats and the use of highly optimized and dedicated hardware, to determine the motion

vector in battery supplied platforms, is often necessary [1]. Such dedicated hardware structures

usually play the role of a co-processor, that is tightly interconnected with the main video coding

system. Furthermore, not only should such co-processor allow an efficient way of controlling the

power consumption but should be flexible enough to allow the implementation of most present

and upcoming block matching algorithms. In [1] it was proposed one efficient architecture of such

co-processors, the Adaptive Motion Estimation Processor (AMEP), specially optimized for the

implementation of fast block-matching or even data-adaptive motion estimation algorithms.

A first prototype of the proposed motion estimation processor was implemented using a Field

Programmable Gate Array (FPGA) [1], to prove the processor functionality and validate its archi-

tecture. An ASIC based on this architecture is going to be implemented, to demonstrate not only

that it is able to efficiently perform motion estimation but it is also suitable for battery-supplied

platforms.

The main objective of this work is the implementation of the AMEP in an ASIC, using a stan-

dard cell library based on the UMC CMOS 0.18µm 1P6M technology process. In this dissertation

all the steps given to reach a final GDSII description of the circuit layout are described. Afterwards,

this layout will be sent for an ASIC foundry to be manufactured.

2

1.1 Dissertation Outline

In this work, special attention was given, in the design phase, to the test capabilities of the

circuit after being manufactured. For an implementation in a FPGA device such test procedures

are unnecessary, due to the nature of the device.

However, the implementation of the desired circuit in an ASIC requires the manufactured circuit

to be submitted to a set of test procedures, in order to validate the correct manufacture of the

whole chip. Most often, inserting hardware structures dedicated to test is required, to improve the

testability of the circuit. This might require a change of the architecture (to explicitly include these

structures), the use of the synthesis tool to automatically insert them or both. Furthermore, the

test procedures and the inserted test structures can also be used to validate the correct design of

the circuit and possibly helping in determining any design flaws. This is particularly helpful while

the design is in the prototyping stage.

The description of the circuit was done using the VHSIC Hardware Description Language

(VHDL), allowing the description of this complex system in a technology independent way using

a programming-like language. The synthesis process will then translate this description into a

technology-dependent netlist, that implements the circuit’s function using the standard cells avail-

able in the library. This netlist represents the connections between the several standard cells, at

a logical level.

After having the representation of the circuit’s function in a gate-level technology-dependent

netlist, it is possible to initiate the physical implementation of the circuit. The first phase is the

placement, which consists on determining the actual location of each of the used standard cells in

a bidimensional floorplan. The second phase is routing, which consists on interconnecting all of

the cells inputs and outputs using the available metal layers, according to the generated gate-level

netlist.

A completely routed design should then pass the Layout versus Schematic (LVS) and Design

Rule Check (DRC) verifications to ensure the connections done in the routing phase correspond

to those in the netlist and that the design complies with the design rules set by the foundry where

the Integrated Circuit (IC) will be manufactured. The routed design (which includes the connec-

tion’s layout) is then merged with the used cells layout, to obtain the required information for

manufacture. This information is then used to produce the required masks for IC manufacture.

1.1 Dissertation Outline

This dissertation is organized in eight chapters and three appendixes. Besides this introduc-

tion, Chapter 2 describes the processor that will be manufactured and its architecture. It also

summarizes some algortihms that may be programmed to implement the motion estimation.

The description of generic test structures, of the design for test techniques and of the test

pattern generation process is done in Chapter 3. More detailed description about the particular

3

1. Introduction

implementation of these structures and the adopted techniques are discussed in the chapter that

covers the frontend phase (Chapter 5). Memory testing is also addressed in this chapter, as well

as the description of the architecture of the specifically designed memory BIST controller.

The description of a generic workflow for the ASIC implementation is presented in Chapter 4.

It starts by explaining the main motivations for choosing a given technology and by characterizing

the adopted technology and the standard cell library that were actually selected, including the

available memory devices. Some constraints imposed by the adopted technology and standard

cell library are also discussed. Some of these constraints will restrict the available options during

the design implementation.

Chapter 5 describes the frontend stage taken for the AMEP circuit. It shows the synthesis

process and describes the tool’s capabilities and the used workflow. The technological constraints

and design options that were taken into account regarding the test structures are also explained

in this chapter.

The backend stage is described in Chapter 6, including the used tool’s capabilities and the

followed workflow. Some generic options and techniques that are referred in Chapter 4 are par-

ticularized for the implementation of the AMEP circuit.

In Chapter 7 the results of this work with the final layout are presented. It also presents the

results concerning the simulation-based timing and power consumption values of the different

implementations, to assess the validity of the included test structures.

Chapter 8 states the main conclusions of this work and addresses possible trends for future

work.

The developed VHDL code for the designed memory BIST controller is presented in Ap-

pendix A. In Appendix B all the command files used in the several tools are presented.

4

2Processor Arquitecture

Contents2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Motion Estimation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1

5

2. Processor Arquitecture

2.1 Introduction

Motion estimation is a fundamental operation in video encoding, to exploit temporal correlation

in sequences of images. It is, however, the most computationally costly part of video encoding

systems. With the increasing demand of video encoding in portable battery supplied devices,

the use of dedicated hardware, with low power consumption, to achieve the most computationally

costly part of video encoding is often necessary [2].

Several algorithms can be used to implement the Motion Estimation (ME) procedure. The Full

Search Block Matching (FSBM) algorithm provides the optimal solution but it is also the most

computationally expensive. Nevertheless, other non-optimum, faster and adaptive algorithms ex-

ist. These algorithms, like the Motion Vector Field Adaptive Search Technique (MVFAST), exploit

the temporal correlations by considering information about past motion vectors and previously

computed error values in order to predict and adapt the actual search space.

To efficiently implement these adaptive ME algorithms a new Application Specific Instruction

Set Processor (ASIP) was proposed in [2]. A minimum and specialized instruction set, specific for

ME and composed of only eight different instructions, was defined. To support this instruction set,

a simple and efficient micro-architecture was designed and implemented [2].

2.2 Motion Estimation

A video stream can be defined as a sequence of bidimensional images, ordered in time, rep-

resenting motion scenes. Each image is composed by a set of picture elements (pixels) with

discrete intensity levels. The image pixels are distributed in a rectangular matrix. In a colored

video stream, each image is composed of three components: Red, Green and Blue (RGB). Due

to being highly correlated, most of the video coding systems transform this RGB color space in

a less correlated space: a luminance component (Y ) and two color-difference or chrominance

components (Cb and Cr). Usually these components are processed as three independent im-

ages [3]. Furthermore, the chrominance components are subsampled in relation to the luminance

component because the human eye is less sensitive to color. This is a technique often applied in

digital image compression and allows the use of less bits to represent a given image.

In the majority of current video compression standards, the image is divided into blocks. A

macroblock is defined as the fundamental unit of information for motion compensation and con-

sists of a 16 X 16 matrix of luminance (Y ) pixels (4 blocks of 8 X 8 pixels) and two matrixes of

chrominance (Cb and Cr) pixels. The number of chrominance pixels (CbCr) varies according to

the chrominance pixel structure defined in the video sequence header and usually has three pos-

sible formats: the 4:2:0, 4:2:2 and 4:4:4 formats. In the 4:2:0 subsampling format, the resolution of

the chroma components is half of the luminance resolution in both the horizontal and the vertical

dimensions (4 Y blocks, 1 Cr block and 1 Cb block). In the 4:2:2 format, the chroma components

6

2.3 Motion Estimation Algorithms

have the same vertical resolution of the luminance component, but the horizontal resolution is

halved (4 Y blocks, 2 Cr blocks and 2 Cb blocks). In the 4:4:4 format all components have identi-

cal resolutions (4 Y blocks, 4 Cr blocks and 4 Cb blocks) [4]. Figure 2.1 shows the composition of

the macroblocks in these different formats [4].

Y Cb Cr

16

168 8

88

(a) 4:2:0 format.

Y Cb Cr

16

16

8 8

1616

(b) 4:2:2 format.

Y Cb Cr

16

16

16 16

1616

(c) 4:4:4 format.

Figure 2.1: Composition of a macroblock.

To achieve a compression ratio in video streams as high as possible, the time correlation

between images is exploited. A common technique that is used to exploit time correlation is motion

compensation, which uses past (and in some cases also the “future”) image’s macroblocks, as

well as the calculated motion vectors, to construct a prediction of the current image. Since the

information contained in a motion vector is far less than the information required to encode a

macroblock, the compression ratio is higher when such technique is used.

Despite the high compression ratios provided by motion compensation, the corresponding

increase in computational effort is significant. In fact, to find the current macroblock’s motion

vector, a search procedure (within a defined search area) must be carried out in another (past or

future) image area, to find the best matching candidate macroblock. This involves the calculation

of a distortion measure for every candidate macroblock of the search area. To accomplish this,

block matching algorithms are usually applied to find the best match for each macroblock in a

reference frame, according to a search algorithm and a given distortion measure.

Nevertheless, Motion Estimation is a computationally expensive task. It can take more than

80% of the operations required to implement a MPEG-4 video encoder [2]. Although general pur-

pose processors can be used to accomplish this task, they tend to be very inefficient, especially

in battery supplied devices, where high power consumption is not supported. As a consequence,

the use of a specialized processor to efficiently implement the ME algorithms with low power

consumption is advisable in such environments.

2.3 Motion Estimation Algorithms

Several Motion Estimation algorithms have been proposed in the literature [4]. They try to find

the best match for each macroblock in a reference frame according to a search algorithm and a

given distortion measure. Most of the algorithms that have been proposed use the Sum of Abso-

7


lute Differences (SAD) as the distortion measure [1]. In figure 2.2 it is represented the reference

macroblock in the current frame and the corresponding search area, for the block matching al-

gorithm, in a previous frame. The best candidate macroblock is also represented in the previous

frame, as well as the respective motion vector.

motion vector

search area

(a) Previous frame. (b) Current frame.

Figure 2.2: Current and previous frames used in motion estimation.

The optimum Full Search Block Matching (FSBM) algorithm is an exhaustive search algorithm

that obtains the best match for a given candidate block within a search area, by examining all

possible displaced candidates within that search area. Nevertheless, it requires a large amount of

computations, which makes it difficult to implement in most real-time portable or battery supplied

encoding systems.

Meanwhile, faster, sub-optimum algorithms have also been proposed. These algorithms re-

duce the search space by guiding the search pattern according to general characteristics of the

motion, as well as the computed values for distortion. These algorithms can be grouped into two

main classes: regular search pattern algorithms, that treat each macroblock independently as-

suming that the distortion decreases monotonically as the search moves towards the best match

direction; and algorithms that also exploit interblock correlations, both in space and time, to adapt

the search patterns. The three step search, the four step search and the diamond search algo-

rithms are examples of fast regular search pattern algorithms. These algorithms have a predeter-

mined possible sequence of locations that are considered along the search procedure. Adaptive

algorithms, like the MVFAST, potentially use information from adjacent macroblocks to obtain an

initial prediction of the motion vector.

The sub-optimum algorithms require much less computations than the FSBM [1], which makes

them particularly well suited for low-power applications. Among these, the data-adaptive algo-

rithms usually provide the best performance, not only in terms of the involved amount of compu-

tations, but also in what concerns the provided performance levels, both in terms of video quality

and bit rate [1]. Consequently, the use of data-adaptive algorithms with dedicated hardware struc-

8

2.4 Instruction Set

tures, for implementing these ME algorithms, is often the best option for use in battery-supplied

devices.

2.4 Instruction Set

The designed ASIP to implement data-adaptive ME algorithms is characterized by a special-

ized data-path and a minimum and optimized instruction set to meet the requirements of most

ME algorithms, including adaptive ones [2]. In fact, this AMEP follows a Reduced Instruction Set

Computer (RISC) philosophy and has the instruction set shown in table 2.1.

Table 2.1: AMEP Instruction Set.

Instruction category Instruction DescriptionControl J JumpRegister data transfer MOVR Move register to registerRegister data transfer MOVC Move immediate to registerArithmetic DIV2 Integer division by 2Arithmetic ADD Add two register valuesArithmetic SUB Subtract two register valuesGraphics SAD16 Compute sum of absolute differencesMemory data transfer LD Load local memory with pixel data

This specialized instruction set enables the processor to implement several ME algorithms. It is

composed of only eight instructions, with the appropriate encoding, that enables the determination

of the motion vectors with low power consumption and with a small implementation area [2].

The Instruction Set Architecture (ISA) is based on a register-register architecture, due to its

simplicity and efficiency, and its reduced number of operations focuses the most widely executed

instructions in ME algorithms. The AMEP register file consists of 24 general purpose registers

and eight special purpose registers capable of storing one 16-bit word each [2].

The operations supported by the AMEP ISA are divided into five categories, as shown in

table 2.1. These instructions are encoded using a 16-bit fixed-format, according to table 2.2 [1, 2].

Each instruction has an opcode and up to three operands, depending on the instruction’s category.

Table 2.2: AMEP Instruction Set Architecture.

Instruction 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0LD 000 t -J 001 cc - Address

MOVR 010 Rd - RsMOVC 011 t Rd ConstantSAD16 100 - Rd Rs1 Rs2DIV2 101 - Rd Rs1 -ADD 110 - Rd Rs1 Rs2SUB 111 - Rd Rs1 Rs2

The memory data transfer operation, LD, allows loading of macroblock and search area pix-

9


els into the corresponding local memories. The loading itself is performed independently of the

instruction, using a special unit for address generation. Whether it is the macroblock or search

area data that should be loaded is defined by the 1-bit control field of the LD operation. The jump

control operation, J, allows a change in a program’s control-flow, by updating the program counter

with an immediate value that corresponds to an effective address. The register data transfer op-

erations, MOVR and MOVC, allow the data loading into a general purpose register or a special

purpose register of the register file. In case of a MOVR operation, the data to be moved is the

content of another register. In case of a MOVC operation, the data is an 8-bit width immediate

value which can be loaded into the register’s high or low order byte, depending on the control

field (t). The graphics operation SAD16 allows the computation of the similarity measure between

a reference macroblock and a candidate macroblock by computing the SAD value for two sets

of sixteen pixels (the minimum amount of pixels for a macroblock in the MPEG-4 video coding

standard), accumulating the result in the content of a special purpose register. The arithmetic op-

erations ADD, SUB and DIV2 perform, respectively, the addition, subtraction and integer division

by two [2].

2.5 Microarchitecture

The designed microarchitecture for the AMEP follows strict power and area driven policies to

support its implementation in portable and mobile platforms. It presents a modular structure and

is composed by simple and efficient units to optimize the data processing. Figure 2.3 shows the

processor architecture.

R2 R3

R6 R7

... ...

R22 R23

R26 R27

R30 R31

Σ

...

...

ASR

SADUAGUΣ

‘0’

‘1’

Negative

Zero

RAM(Firmware)

Instruction Decoding

R0 R1

R4 R5

... ...

R20 R21

R24 R25

R28 R29

IR

PC

10

10

16

4

5

16

16

16

5

10

8

8

6

16

16

16

8

8

16

5

...

MBMEM

SAMEM

MUX

MUX

MUX

MUX

MUX

MUX

MUX

Figure 2.3: AMEP Architecture.

The datapath includes the hardware needed to implement the arithmetic operations included

in the instruction set. For the most complex and specific instructions, such as the SAD16 and

LD instructions, the datapath also includes specialized units to improve the efficiency of such

operations: the SAD Unit (SADU) and the Address Generation Unit (AGU), respectively [2].

The SADU calculates the SAD value between two macroblocks. It can be implemented using

10

2.6 Interface

several possible architectures. The choice for a specific arquitecture has influence on the circuit

area, consumed power and number of cycles needed to compute the SAD value, which ranges

from one clock cycle, using a parallel processing architecture, up to sixteen clock cycles, using

a serial processing architecture [2]. Since this is a processor to be used in low power (mobile)

platforms, the prototype will use the serial SADU due to its reduced power consumption.

The AGU generates the necessary addresses to fetch all the pixels for both a macroblock and

an entire search area. This unit is capable of working in parallel with the remaining functional

units, to maximize the efficiency of data processing [2].

The AMEP architecture also includes three memory blocks: program memory (firmware),

search area memory (SA MEM) and macroblock memory (MB MEM). The macroblock and search

area memories are dual port Static RAM (SRAM) memories with 512 8-bit words and 2048 8-bit

words, respectively. The program memory is a single port SRAM memory with 1024 16-bit words.

These memories are used to locally store the current macroblock’s and all of the search area’s

pixels to allow the efficient execution of the SAD16 operation. Dual port configuration for the

search area and macroblock SRAM memories enables writing of new data to be processed while

the SADU is executing. This allows the SADU to be continuosly processing data, improving the

efficiency. The use of these memories in the architecture will impose some care when designing

the test procedures and planning the layout.

2.6 Interface

The external interface of the implemented processor is shown in figure 2.4 [1].

AMEP

done

req

gnt

clk en rst

data

addr

#oe_we

8

20

Figure 2.4: AMEP external interface.

This interface is used in normal operation mode. For testing purposes, additional inputs and

outputs will be used. In this mode of operation the AMEP works as a coprocessor that is inter-

connected with the main processor of the video encoding platform (in this case, a Power-PC), as

illustrated in figure 2.5.

According to [1], the interface with the external frame memory was designed to allow 8 bits

data transfers from a 1MB memory address space. The interface with the external memory bank

11


AMEPcore

done

req gnt

enrst

data

addr

#oe_we

8

20RAMPower-PC

Memory Controller

data

addr

reqgnt

Figure 2.5: Video coding platform.

is done using three I/O ports: a 20 bits output port that specifies the memory address for the

data transfers (addr), an 8 bits bidirectional port for transferring data (data) and a 1-bit output

port that sets whether it is a load or store operation (#oe we). Since the external frame memory

is shared with the video encoder, the interface also has two extra 1-bit control ports to implement

the required handshake protocol with the bus master: the req port allows requesting the control

of the bus, while the gnt port allows the bus master to grant such control. The coordinates of the

best matching motion vectors are also outputted through the data port. This operation requires

two distinct clock cycles to complete: one to output the motion vector’s low-order 8 bits (horizontal

coordinate) and a second to output its high-order 8 bits (vertical coordinate). In addition, every

time a new value is outputted through the data port, the status of the done output port is toggled,

to signal the video encoder that new data awaits to be read at the data port.

The processor firmware, corresponding to the compiled assembly code of the considered ME

algorithm, is also downloaded into the program memory through the data port. To do so, the

processor must be in the programming mode, which it enters whenever a high level is simultane-

ously set into the rst and en input ports. In this operating mode, after having acquired the bus

ownership, the master processor supplies memory addresses through the addr port and loads the

corresponding instructions into the internal program RAM. The processor exits this programming

mode as soon as the last memory position of the 1K 16bit-word program memory is filled in. Each

of these 16 bits instruction takes two clock cycles to be loaded into the program memory, which is

organized in the little-endian format.

12

3Design for Test

Contents3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 43.2 Circuit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 43.3 Automatic Test Pattern Generation . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Observability and Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Scan Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.6 JTAG Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.7 Memory Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2

13

3. Design for Test

3.1 Introduction

Testing is one of the most important steps when manufacturing a chip. A defective component

integrated into a system will most probably result in an unusable system. The cost of replacing

or repairing such system is many times much higher than the cost of testing the component

before its integration. Testing can be done at chip, board or system level. The decision to test

every chip, board or system or just a sample of each is influenced by several factors, namely

the test cost per unit, the yield and the repair/substitution cost. Testing procedures are also

important at prototyping, since they can be used to verify the correct implementation of the design

and to assure that the circuit performs as intended by the designer, while potentially helping in

determining the cause of a potential flaw.

3.2 Circuit Testing

A fault is a physical defect that occurs in a circuit and that may cause the change of the circuit’s

logic function. An error is a wrong value that is present at a defective circuit’s output.

An error is a consequence of a fault and a fault is only observable by the error it causes. Faults

might exist in a circuit and never cause an error (e.g. when redundancy exists). A complete test

would verify the circuit outputs for every input combination at every state (in sequential circuits) at

full clock speed. Such test is impractical for today’s dimensions of Very Large Scale Integration

(VLSI) designs, but would assure a functional circuit (at test time).

At its most basic use, testing is used to make a ”good/defective” decision after manufacturing,

either of a chip, a board or an entire system. Additionally, the information that is acquired by

testing a system can also be used for debugging and to make a diagnostic of the malfunction,

either being caused by a manufacturing defect or by a design error.

When integrated circuits had a reduced complexity level, testing was relatively simple, as

almost all of the internal circuit nodes could easily be controlled by simply changing the primary

inputs of the circuit, and the logical value of a certain node could also be easily observed by

propagating it to the primary outputs. As the complexity of integrated circuits increased, controlling

and observing the logic values became more and more challenging, and with the inherent increase

of possible faults, testing became extremely cumbersome and time consuming.

To reduce the complexity of the testing procedure, new techniques were developed to increase

controllabillity and observability and to provide incircuit testing. These techniques have the ad-

vantage of increasing fault coverage and reduce the test complexity and test time. However, they

often present the disadvantages of increasing the development time, chip area, I/O pin count,

power dissipation and even the number of possible faults in the circuit. Due to the additional hard-

ware, these techniques also decrease, even marginally, the circuit’s performance (because of the

increase in the number of logic levels in the design).

14

3.2 Circuit Testing

There are several techniques to increase the fault coverage, but the used methods must take

into account the additional costs that are incurred when new hardware for testing purposes is

added. In sequential circuits, testing can be done either at the nominal clock frequency or at a

reduced clock frequency. At nominal clock frequency, additional faults can be detected (e.g. faults

that are timing related due to charging capacities), but most of the times it is hard to make an

efficient test at full clock speed using only the primary inputs. To overcome such difficulties, a Built-

In Self Test (BIST) could be performed. This requires the use of a BIST controller that generates

pre-determined test vectors and analyzes the circuit responses to determine its correctness. This

way, test vectors can be delivered at nominal clock frequency with the advantage of eliminating

the need for external test equipment. However, the additional cost, in circuit area and complexity,

often makes this type of test procedure unviable. Therefore, tradeoffs between costs and benefits

must be made and the most adequate techniques to the circuit under development should be

applied.

Ad-hoc techniques to improve testability include the insertion of test points, sequential circuit

initialization and avoiding redundant logic. Meanwhile, test methods have evolved and instead

of using only ad-hoc test strategies, there are several techniques that allow a structured Design

For Test (DFT). Structured DFT techniques include boundary scan and internal scan chains [5].

DFT techniques are supported by most major synthesis tools. Moreover, the use of both DFT

techniques and ad-hoc techniques are sometimes useful to overcome certain design difficulties.

A multitude of physical defects may occur during chip manufacture. Ideally the chip should be

tested for all of these possible defects. Some of these defects are equivalent, in their nature, and

as such can be modeled in the same way with regard to the effects they produce.

Logical faults represent the effect of physical faults on the behavior of the modeled system. By

modeling physical faults as logical faults, the analysis complexity is reduced since many different

physical faults can be modeled by the same logical fault and the analysis is performed in a logical

rather than a physical level. Additionally, some logical fault models are technology independent.[5]

The most used fault model is the Single Stuck-at Fault (SSF) model. This model assumes that

only one fault exists in the circuit under test and the faulty node is stuck at the logical value 1 or

0. In the case of a circuit having n nodes, then there will be 2n possible faults (a stuck-at-0 and

a stuck-at-1 for each node). Some of these faults are said to be equivalent. Faults f1 and f2 are

equivalent if all of the test vectors that detect one of them also detects the other.

Besides this model, the transition delay fault model is used for detection of single node slow-

to-rise or slow-to-fall faults. The path delay fault model is used to detect timing faults in the circuit’s

critical paths, at nominal working frequency, and is thus used to detect manufacturing defects or

process variations that have a negative impact on the circuit’s timing.

Some physical defects induce logical faults only under certain conditions which might not

happen at test time. To test some of these physical defects that might not result in logical faults at

15

3. Design for Test

test time (e.g. a drain source short circuit in the p-MOSFET at a CMOS gate), a quiescent supply

current (IDDQ) test might also be used. This type of test measures variations in the supply current

and is able to detect some defects at the physical level.

Hence, it should be noted that the SSF model does not take into account many of the possible

faults that may occur in today’s submicron VLSI designs. However, it is a technology independent

fault model that represents many different physical faults. Experience shows that tests that detect

SSFs also detect many non classical faults [5]. Moreover, Automatic Test Pattern Generation

(ATPG) tools widely support this fault model and the test vectors generated using the SSF model

can usually be applied to a circuit without using expensive Automatic Test Equipment (ATE).

3.3 Automatic Test Pattern Generation

ATPG is a method to automatically generate an input vector that will enable the detection of

a given fault based on the different circuit output, in the presence of that fault. Test generation is

a complex problem which is influenced by various factors. Among these factors, the cost of the

test generation, the quality of the generated test and the cost of applying the test are the most

important. A low cost method for generating test patterns is a random pattern generator. However,

the cost for either determining the test quality (by fault simulation) and the cost of test application

(due to the high amount of test data) may be too high. On the other hand, deterministic test

generation produces test vectors by processing a model of the circuit. Although the generating

cost is more expensive than random generation, the test quality is usually higher and the cost of

test application may be significantly lower due to a smaller amount of test data.

The quality of the generated test vectors is measured by the fault coverage. The definitions

of fault coverage are different according to each author. In [5], the fault coverage for detectable

faults is a relative measure that indicates the number of detected faults in relation to the number of

detectable faults, according to the used fault model (the number of faults in the design subtracted

by the number of undetectable faults). In [6] the fault coverage is defined as the relation between

detected faults and all faults. According to this author, the definition of test coverage is the relation

between the detected faults and the number of detectable faults according to the used fault model.

The nomenclature adopted in this work follows the definitions given in [6]. Equation 3.1 is used to

calculate the test coverage [6] and equation 3.2 is used to calculate fault coverage.

testcoverage =#detectedfaults

#totalfaults − #undetectablefaults(3.1)

faultcoverage =#detectedfaults

#totalfaults(3.2)

According to equation 3.1 all proven redundant faults (included in the undetectable faults) are

excluded from the fault universe. This requires the test generation algorithm to be able to identify

16

3.4 Observability and Controllability

redundant faults [5].

Test generation can be fault oriented or fault independent. Fault oriented algorithms include

the D-algorithm, the 9-V algorithm, the Path Oriented Decision Making (PODEM) algorithm and

the Fanout Oriented (FAN) algorithm and aim to generate a test for a specific fault [5]. These

algorithms belong to a class of test generation algorithms referred to as path-sensitization algo-

rithms and require the determination of an initial set of faults, the selection of a target fault and the

maintenance of a set of remaining undetected faults [5]. To detect a certain fault, it is required to

set each node logic value to the opposite of the value produced by the fault under analysis (fault

activation). It is then required to propagate the resulting value by sensitizing a path from that node

up to a primary output (fault propagation). Fault independent algorithms aim to compute a set of

test vectors that detect a large set of SSFs, without targeting any individual fault. Having in mind

that half of the SSFs along a critical path of a test vector are detected by that test, it is desirable

to generate tests that produce long critical paths. The critical-path test generation algorithm does

this [5].

The advantage of the random test generation is the simplicity of vector generation. The main

disadvantage is that the set of randomly generated vectors that detect a given set of faults is much

larger than the set of deterministically generated test vectors. There are combined test genera-

tion methods, like Random Path Sensitization (RAPS), that attempt to merge the advantages of

deterministic and random test generation methods [5].

All the previously mentioned algorithms are meant for combinational logic. Test vector gen-

eration for sequential circuits is significantly more difficult because the test of a certain fault may

require the input of various test vectors in sequential order. Some test generation methods for se-

quential circuits use iterative array models, in which each array element represents a time frame.

This way, sequential circuits test generation is done by converting the sequential circuits into com-

binational circuits, where previous test generation methods can then be used. Simulation based

test generation can also be used to generate test vectors in sequential circuits, by generating and

simulating trial vectors. Based on the simulation results, these trial vectors are evaluated using a

predefined cost function and the best trial vector is added to the test sequence. Other methods

exist to generate test vectors for sequential circuits that use Register Transfer Level (RTL) models

or random test generators [5].

3.4 Observability and Controllability

Observability may be defined as the ability to observe the changes at the internal nodes

through the primary outputs. On the other hand, controllability is the ability to control the internal

nodes using the primary inputs.

To evaluate the value present at a given circuit node, there has to be a path through the logic

17

3. Design for Test

circuit up to a primary output in such way that a change in the node’s logic value induces an

equivalent change in an output’s logic value. This way, the node’s value can be observed on a

primary output. Similarly, to force a node to a certain logic value, the primary inputs must be

defined in such way that the node’s logic value can be set.

To test a circuit, specific values must be set at the primary inputs (test vectors) to control the

logic value of some internal nodes and to allow its propagation to the primary outputs, in order

to observe and certify the correctness of the logic values at those nodes. To achieve both these

tasks, the internal nodes must be simultaneously controllable and observable.

To control and observe a certain node in a sequential circuit, logic values may need to be

passed through several memory elements (such as latches or flip-flops). This will naturally in-

crease the complexity of generating test vectors for a certain fault. It is also often required that the

control signals of those memory elements (set, reset, enable and clock signals) are controllable

during the entire testing time. In certain situations, some of these control signals are generated

by the logic inside the circuit and, therefore, are difficult to control from the primary inputs. To

avoid such situation, additional hardware (such as multiplexers) are added to these control lines,

so that, when in test mode, these values can be controlled from a primary input. Additionally, it

is also advisable not to use gated clocks, as they can be harder to control during test mode, and

many of the ATPG tools do not support them during test vector generation.

Enhancing observability can be accomplished by inserting observation points in the circuit.

The observation points are dedicated outputs that are directly connected to an internal node and

allow the observation of that node’s logic value. If these points are carefully chosen, they can

increase the ability to detect faults. Although this technique requires additional I/O pins, this is

often the only available method to increase the observability and to increase the fault coverage.

3.5 Scan Structures

As explained, testing requires the delivery of test patterns to the circuit inputs, in order to

control the logic value of a given node. It also requires the propagation of the resulting logic value

present at that node to a primary output, so that it can be observed. In sequential circuits, this

task is significantly more complex, due to the memory elements present at the circuit. One way

to simplify the delivery of the test patterns is to rearrange, at test time, these memory elements in

order to form a shift register (called a scan chain). With this shift register, the controllabillity and

the observability are greatly increased as it becomes much easier to set and capture the logic

value of a node which is deep inside the circuit.

In order to implement this scan chain, the ordinary flip-flops used in the design must be re-

placed by scan flip-flops. These scan flip-flops have additional inputs that enable them to either

function in normal mode or in test mode. When in test mode, they are usually connected to form

18

3.6 JTAG Boundary Scan

a shift register. Figure 3.1 shows the additional hardware that is needed to transform a normal

flip-flop into a multiplexed scan D flip-flop.

CLK

D QD

Clock

Q

(a) non-scan flip-flop.

CLK

D Q

SEL

0

1

Out

Scan_Enable

D

Scan _in

Clock

(b) multiplexed scan flip-flop.

Figure 3.1: D type Flip-Flop.

The scan chain is used to shift in the test vectors into the circuit and after one or more clock

cycles, to allow the propagation of the values through the combinational and/or sequential logic,

it is used to capture the resulting values and shift them out. During these shift operations, data

propagates between the flip-flops that form the scan chain (shift register). To capture the values

of the combinational logic, the normal inputs of the flip-flops are selected (using a control signal)

and then the flip-flops are set back to scan mode to allow shifting out of the captured values.

The addition of scan chains and of the corresponding scan flip-flops increases the circuit area,

the consumed power and has impact in circuit timing, because of the additional logic elements.

Different scan styles have been proposed [5], and in general they differ in the penalty incurred in

each of these factors and the complexity of generating the test control signals.

The scan styles (with the associated scan flip-flops) available for the designer to choose from

are, among those most commonly used and supported by ATPG tools, the Multiplexed Flip-Flop,

the Clocked-Scan, Level Sensitive Scan Design (LSSD) and the Auxiliary-Clock LSSD [7]. The

choice for the most suitable scan style for a given circuit can be made based on each style’s

advantages and disadvantages. However, when standard cell libraries are used, the available

types of scan flip-flops must also be taken into account, since each of these methods requires

different flip-flops that may not be available. Nevertheless, even if the standard cell library does

not include any type of scan flip-flops, it is still possible to assemble a multiplexed flip-flop using

discrete cells (normal flip-flops and multiplexers) to enable the implementation of the multiplexed

flip-flop scan style. However, this strategy incurs a larger penalty in timing and area. The adopted

standard cell library in this work only contains multiplexed scan flip-flops. As a consequence, the

multiplexed flip-flop scan style was chosen.


The test of component interconnection at board level has become more complex with the

advent of multilayer PCBs and non-lead-frame ICs. To overcome this difficulty, the Joint Test

19

3. Design for Test

Action Group (JTAG) proposed a process to test interconnection between board components

(ICs) that included a Test Access Port (TAP) controller and special I/O cells in every chip. These

special I/O cells (boundary scan cells) are controlled by the TAP controller and can be serially

connected, at test time, to implement a Boundary Scan Register (BSR). Figure 3.2 shows a basic

boundary scan cell that is used to build the BSR. Other cells are used according to the function

of the pin. This cell can be used on input and output pins but not on three-state pins.

CLK

D Q

SEL

0

1

Out

CLK

D Q

SEL

0

1

Out

Clock-DR Update-DRShift In

Shift Out

Mode

Shift DR

IN

OUT

Figure 3.2: JTAG Basic Boundary Scan Cell.

Boundary scan cells can be classified between observe-only, control-only and control-and-

observe cells. Observe-only cells are typically used with the clock signal, since no control should

be exerted. A control-only cell can be used for the enable signal of three-state buffers, while

control-and-observe cells can be used on all the two-state inputs and outputs. A three-state driver

usually has a more complex cell, composed by two control-and-observe cells (one for the input

and another for the output) and might include a control-only cell for the enable signal.

The BSR can be used to shift in and out the values at the various chip’s I/O pins and thus

set and capture the signals propagated through the Printed Circuit Board (PCB)’s pathways. Fig-

ure 3.3 represents a possible interconnection between several chip’s I/O cells to implement a BSR

and the required signals for the TAP controller.

With the already available hardware inside every chip, the TAP controller could be modified to

control additional test functions, such as BIST, scan chains and other user defined hardware. This

interface can also be used to load programming values into programmable devices like FPGAs.

The JTAG proposal became IEEE Standard 1149.1 [8] in 1990.

By implementing the JTAG interface, the IC is not only easier to test using already available

test equipment that complies with the IEEE 1149.1 Standard, but also allows the testing of the

circuit and its connections when included in a larger system.

As described in the standard, the TAP includes Test Clock input (TCK), Test Mode Select

(TMS), Test Data Input (TDI) and Test Data Output (TDO) connections and, when a power-up

reset of the test logic is not performed, it also provides a Test Reset (TRST) connection. All of the

20


Boundary Scan Cell

Shift Out(to next cell)

Shift In(from previous cell)

IN OUT

Core Logic

TAP

Core Logic

TAP

Core Logic

TAP

TCK

TMS

TDI TDO

Figure 3.3: JTAG Boundary Shift Register and TAP controller connections.

TAP inputs and outputs are dedicated connections and should not be used for any other purpose.

In order to be compliant with the standard the TMS, TDI and TRST inputs must behave like if a

logic 1 is applied when the input is undriven (an internal pull-up must be present at these inputs).

The JTAG TAP controller includes a state machine that is controlled by the TMS signal. By

driving this signal with the appropriate values, the control of the internal TAP state machine is

performed according to the state diagram in figure 3.4

According to the IEEE standard, the implementation of the BYPASS, EXTEST, SAMPLE and

PRELOAD instructions is mandatory. All other instructions that may be implemented are either

optional instructions, defined in the standard, or user specified instructions.

For every IEEE 1149.1 compliant device, there has to be a Boundary Scan Description

Language (BSDL) file associated with it. This file describes the nature of the IC pins (input,

output or bidirectional pin), the logical correspondence between signal names and physical pins,

the identification of the pins used by the JTAG interface, the description of the instruction register,

the implemented instructions and their opcodes, the identification of each data shift register that

is accessed in each of the instructions and a description of the BSR, listing all the cells in it and

their functionality. This file allows IEEE 1149.1 compliant test equipment to know the capabilities

of the circuit’s test logic and, if an additional file containing the description of IC interconnections

in a system board is given, allows it to perform test procedures on the system board using the

assembled BSR, as represented in figure 3.3.

21

3. Design for Test

Test Logic Reset

Select DRRun Test / Idle Select IR

Capture DR

Shift DR

Exit1 DR

Pause DR

Exit2 DR

Update DR

Capture IR

Shift IR

Exit1 IR

Pause IR

Exit2 IR

Update IR

1

111

1 1

1 1

1 1

1 1

1 1

1 1

0

0

0 0

0 00 0

0 000

0 0

0 0

Figure 3.4: TAP state machine.

3.7 Memory Test

When memory blocks are present in a circuit, they also need to be tested. However, memory

cells are usually tested using a more complex fault model than the SSF model, because memories

have more physical faults that can not be modeled by stuck-at lines. Hence, bridging faults and

coupling faults need to be taken into account. A rather exhaustive memory testing could be

performed, at nominal speed, by writing a bit and verifying that it was written correctly and that

neither of the remaining bits had their value changed. Then, the complementary value should

be written on the same bit and verified its correctness and that neither of the remaining bits had

their value changed. Although this is a thorough test, it would take too much time to complete.

Consequently, instead of testing, for every changed bit, the remaining memory bits, common

testing procedures only consider the surrounding bit cells, as these are the most likely to be

affected with transitions on a given bit cell. Although this drastically reduces the test time, it

requires information about the physical memory cell layout.

A march test is composed by a set of march elements. An ascending (descending) march

element is a finite sequence of read or write operations that are repeated in each memory cell in

ascending (descending) address order. The march test is applied to each cell in memory before

proceeding to the next cell, which means that if a pattern is applied to one cell then it must be

applied to all cells. All operations of a march element are done before proceeding to the next

22

3.7 Memory Test

address [9]. The faults that may exist are detected in the read operations, when the read values

are compared with the values defined in the test. The read and write operations are denoted by

the r and w symbols. The read and write notation is complete when the value to be read or written

is presented after the r and w symbols (e.g. r0 or w1). A march element can contain several read

or write operations for the same address. This is represented like (w0,r0,w1) in which, for every

address, a write 0 followed by a read 0 and by a write 1 operation is performed. An ascending

march element is denoted by the ⇑ notation while the ⇓ notation denotes a descending march

element. The m notation denotes an either ascending or descending march element [10].

An example: the march test {⇑(w0,r0,w1);⇓(r1)} would start at the lowest address and per-

form a write 0, followed by a read 0 and a write 1 to that addres. It would then increase the

address by one position and perform the same operations. When the last address is reached and

all the operations are done, the first march element is concluded. The next element starts at the

highest address and performs a read 1 operation. Then it decreases the address by one position

and repeats the read 1 operation. If the read value is not 1, a fault is detected. When the address

reaches its lowest value, the march element is concluded as well as the entire march test.

The previous notation assumes that individual bit cells are addressed. In word oriented mem-

ories, however this is not the case and words are written into the memories and not individual bits.

The notation adopted in this work, replaces the 0 and 1 values in the march elements with the

values written in a word (repressented in hexadecimal base).

To test the memory blocks embedded in the designed processor, a march test was produced.

Addressing these memory blocks and observing their outputs is not possible using the chip’s

inputs and outputs that are used under normal operation. However, since the chip includes scan

chains, these memories can be addressed and their outputs observed using these chains. This

is a low cost option, as it requires no additional hardware to the already included scan flip flops.

However, it is a poor testing method, as it requires a large amount of time to conclude (which might

not be a problem when prototyping) and it does not allow the memories to be tested at full clock

speed. To enable a full clock speed testing with a higher fault coverage test, a BIST controller was

designed. This controller provides the testing options required by the memories and enables the

designer to gather information that could help in diagnosing the design and, possibly, still use a

partially defective memory.

As stated in section 4.4, due to the adopted library the memory layout is not available and

thus the memory physical structure is not known. Therefore, these patterns were chosen having

in mind the possibility that adjacent word bits correspond to adjacent memory cells and thus

achieve a higher fault coverage. The march test that was produced for these memories, as seen in

figure 3.5, detects all transition faults, all stuck at faults, and all address decoding faults. However,

since the memories structure is not known and the test is not exhaustive, only some coupling faults

and some state coupling faults will be detected. The used memory test patterns are 01010101b

23

3. Design for Test

(55h), 10101010b (AAh), 00000000b (00h) and 11111111b (FFh).

�� )00();00,();;00();00(

);55();55;();,55();55,();,55();55(

hrhwrFFhwFFhhrhw

hrhwrAAhwAAhhrhwrAAhwAAhhrhw

Figure 3.5: Implemented March Test.

The test is done by comparing the values that are read from the memory with the values

supposed to be stored at the various addresses. In the event of a mismatch, the BIST controller

will stop its operation. By means of a scan chain, the controller has the capability to shift out

the address of the failing cell. Furthermore, the controller also has the capability to resume the

test sequence (from the failed address), in order to complete the test sequence. For prototyping

purposes this has the advantage of returning more information than a simple good/defect test

result. The designer can then use this information and still is able to partially operate a defective

circuit (e.g. if a program memory cell is defective, the designer could make an assembly code that

would avoid that particular address and still be able to use all of the remaining circuit).

As mentioned before, in section 2.5, the memories used in the AMEP are two dual-port SRAMs

with 8-bit words each and one single-port SRAM with 16-bit words. The memories available in

the adopted technology will be described, in detail, in section 4.4.1. Since the single-port SRAM

uses the bytewrite capability, the BIST controller will also need to test this feature.

Since these memories do not provide a dedicated interface for test data, multiplexers have to

be added in order to control the data applied into the memory inputs: test data from the memory

BIST controller or normal data from the implemented processor.

The architecture of the implemented memory BIST controller is shown in figure 3.6. This con-

troller is composed of one comparator (with one of the inputs registered) for error detection, an

up/down counter for sequential address generation and a shift register for bytewrite enable signal

generation (if the memory has bytewrite). The controller’s state machine has 31 states and is re-

sponsible for implementing the march test. The controller interface with the outer circuitry includes

three input control signals and two output result signals. The input control signals are the bisten

(enable), bistrst (reset) and bistgo (start/resume test sequence) and the output result signals

are the bistrslt (fault detected) and bistend (end of test sequence). While performing the test

sequence, the controller’s enable signal, bisten, is high. To actually start the test sequence, the

bistgo signal must be high during one clock cycle. The bistrslt signal indicates the test result

(logic value 0 if no error was detected; logic value 1 if an error was detected). The bistend signal

indicates the end of the test sequence. If no error is detected, the bistend signal is set high and

the bistrslt signal will remain low. However, if during the test sequence, an error is detected,

the bistrslt signal will be set to high, while the bistend signal remains low, and the controller

will enter into a pause state. At this state, the controller will wait for the bistgo signal to go high,

24

3.7 Memory Test

indicating that the result has been read and the memory address has been scanned out (if desired

by the user), and thus the test sequence may be resumed. The controller also includes output

signals to address the memory (bistaddr), to set the memory data inputs (bistctr dout) and to

control the memory write enable signals (bistbwen and bistwen). The bistctr din input of the

BIST controller is driven by the memory data output.

Considering that this controller is part of a power efficient processor, it should also minimize its

power consumption. As a consequence, the controller should be deactivated during the normal

operation mode of the processor, by deactivating the controller’s enable signal. This guarantees

that no transitions occur in the sequential elements and, consequently, the memory BIST con-

troller reduces its power consumption to a minimum. Nevertheless, the bistctr din input, which

is driven by the memory output, will naturally present some switching activity during normal oper-

ation mode. Since this input directly drives the combinational comparator, some power would be

consumed by the comparator logic. Therefore, an array of AND gates is placed between the mem-

ory output and the comparator input to disable the propagation of any switching activity during the

normal operation mode, thus minimize the inherent power consumption.

The VHDL code used to describe and synthesize the memory BIST controller can be found

in Appendix A. This VHDL description allows for the synthesis of memory BIST controllers for

State Machine

bisten

bistrst

CLK

Shift RegisterN

2

Up/Down Counter

M

DIR

RSTEN

Comparator

RegisterEN

N

EN

bistctr_dout

bistctr_din

N

N

bistwen bistend bistrslt bistbwen bistaddr

to memory to memory

from memory

bistgo

EN

RST

Figure 3.6: Simplified Memory BIST Controller architecture.

25

3. Design for Test

several memory configurations.

To avoid routing congestions and unnecessary added BIST controller complexity due to dif-

ferent memory configurations, a dedicated memory BIST controller was implemented for each of

the three memory blocks of the processor. Since each of the memory BIST controllers needs

three control and two output signals, the total number of additional pins to control and observe the

test result is fifteen. Nevertheless, to avoid the unnecessary use of I/O pins, the enable signals

(bisten) of the memory BIST controllers are encoded using a two bit signal. Moreover, and since

only one controller is active at a time, all the individual input (bistgo) signals can be driven by the

same global (bistgo) signal. Furthermore, the (bistend) and the (bistrslt) output signals, of

the three controlers, can be multiplexed to reduce the need for extra output pins. By using this ap-

proach, the required number of Input/Outputs (I/Os) is reduced from fifteen to five I/Os exclusively

assigned to the memory BIST structures.

26

4ASIC Design

Contents4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 84.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Foundry and Technology Selection . . . . . . . . . . . . . . . . . . . . . . . . . 324.4 Library and Technology Characterization . . . . . . . . . . . . . . . . . . . . . 334.5 Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.6 Pin Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

27

4. ASIC Design

4.1 Introduction

An ASIC design usually begins with its description using a Hardware Description Language

(HDL). As soon as the functional or structural description has been validated by simulation, the

design can be synthesized using a standard cell library (the frontend stage). This implementation

step translates the design into basic design blocks (standard cells). These standard cells must

then be placed inside the available die area (placement) and the connections between these cells

should be made (routing) using the metal layers available in the chosen manufacturing process

(the backend stage). Although the design may be simulated during the several stages, a final

simulation with timing information should be done to validate the layout, prior to fabrication, in

order to assure a high probability of first time success.

In this chapter, the several steps that were required to manufacture the processor, using a

standard cells library, from a given HDL description are explained. The selection of the particular

technology that was adopted is discussed and the used standard cell library is characterized.

Some design options that were taken are also explained and some general guidelines that should

be observed during implementation are discussed.

4.2 Design Flow

In figure 4.1, the generic workflow to achieve a layout, starting with a HDL description, is

shown.

The description of a digital circuit often is performed using a given HDL which allow the de-

scription of a digital circuit’s function, structure or behavior using text-based programming-like

syntax. Furthermore, HDLs usually allow a given circuit to be described using a mixture of struc-

tural, behavioral or RTL descriptions. With these capabilities, complex systems can be easily

described and simulated, in order to verify that they have the desired functionalities, in a technol-

ogy independent way.

When a full structural description of the circuit is adopted, using the netlist format, the designer

individually instantiates the required cells from the library and assembles them into a circuit that

performs as intended. This allows full control of the implementation, but drastically increases

the design time. On the other hand, when a behavioral description of the circuit is adopted,

the designer describes the intended functions and algorithms of the circuit. Nevertheless, a full

behavioral description (using an algorithmic description) of a circuit may not be synthesizable.

Therefore, the circuit has to be described in a way that is implementable in hardware and that is

understood by synthesis tools. This is usually called a RTL description. In a RTL description, the

circuit is described as a set of register elements and a set of transfer functions that describe the

data flow between the register elements. The structure of the description is very much alike the

model of a sequential circuit (sequential elements + combinational logic). Therefore, the designer

28

4.2 Design Flow

Test Insertion

Synthesis

Place&Route

Sign- Off Timing Analysis

LVS/DRC

Frontend

Backend

HDL description

Layout

Figure 4.1: Generic workflow for ASIC design.

needs no adhere to a given description style that will be correctly interpreted by the synthesis tool.

Between RTL and structural descriptions, the RTL description is much simpler for the designer

but might not always achieve the desired results in terms of performance. Consequently, some

designers adopt a mix of both RTL and structural descriptions, making a gate-level description

only for the critical blocks. After being correctly described, the RTL or structural descriptions

can be implemented using the available cells in the library. This translation process between the

RTL or structural descriptions and the standard cells is called synthesis and is performed by the

synthesis tool. Despite possible, behavioral descriptions do present additional problems when

being interpreted by synthesis tools.

Among the several HDLs, Verilog and VHDL are the most widely used and supported lan-

guages. In particular, the description of the implemented ME processor was done in the VHDL

language using a RTL and structural description style.

As soon as it is described, the circuit is simulated in order to verify if it performs as intended

by the designer. After this simulation, the design is synthesized against a given target technology.

This can either be an FPGA or a standard cell library. In this work, the target technology is a

standard cell library for UMC 0.18µm 1P6M process.

A standard cell is a group of transistors and interconnecting structures that implement a simple

29

4. ASIC Design

logic gate (e.g. NAND, NOR), a combinational logic function (e.g. 1-bit full adder, multiplexer), a

sequential element (e.g. D type flip-flop) or even a memory cell. A set of standard cells, which

implement several different logic functions, is usually named as a standard cell library. Which cells

and, consequently, which logic functions compose this library is determined by the standard cell

manufacturer.

Each standard cell has a logic description (the logic function it implements) that corresponds

to a physical implementation (layout). The logic description of the cell is usually named as the

logical view and the physical implementation is named the layout view. The logical view is an

abstraction level that has the cell’s truth table (for combinational logic) or the state transition table

(for a sequential element). This view allows automatic synthesis tools to implement a complex

system by interpreting a circuit’s logic function without being aware of the physical implementation

details. Additional views are available, to characterize other attributes of the cell (e.g. timing,

power, interface), that are used in the different phases of circuit design.

Constraints can be given at this phase in order to guide the synthesis and produce a circuit

that meets the designer’s expectations. These constraints are used by the synthesis algorithms to

select the most adequate architecture and choose the appropriate cell in the library. In general, a

library has cells with the same logical function but with different characteristics (e.g. area, power

consumption and propagation delays). These constraints are not mandatory and do not need to

be set in order for a tool to synthesize a circuit. Nevertheless, if no constraints are explicitly set

by the designer, the outcome of the synthesis process is the result of default constraints. Usually,

most synthesis tools tend to synthesize a circuit for minimum area when no other constraints

are set, which may lead to results quite different from the designer’s expectations. To obtain a

better result, an iterative process shall be followed, in which the constraints are introduced and

their values properly tuned in each cycle, until a satisfactory result is obtained. At the end of the

synthesis process, a gate-level netlist representing the interconnections between the standard

cells that compose the design is obtained.

A simulation of this synthesized circuit should then be performed to ensure that the synthesized

circuit still performs as intended. This simulation may already take into account some timing

information regarding the cell delays and, if it exists, an estimate of the interconnection delays

between the cells. Even though it is not a complete timing simulation, it is usefull enough to detect

some design errors.

At this stage, most synthesis tools also allow the insertion in the circuit of test related structures

(e.g. scan chains) using their interfaces. Therefore, these structures do not have to be described

using the HDL. Furthermore, synthesis tools are usually able to perform testability checks before

automatically assembling the test structures. These capabilities simplify the DFT step, removing

a very significant part of the designer’s workload.

After having the circuit synthesized, the steps to physically implement the generated netlist

30

4.2 Design Flow

are performed. Usually before performing cell placement, the power planning is done. The power

planning phase consists on defining the power rings and power stripes (for power and ground)

that will take VDD and GND to the entire chip. An initial estimate of the current requirements

must be made to define the geometry and number of stripes. After the chip is completely routed,

information regarding the power consumption can be extracted and analyzed, to determine if the

initial power structures are correctly sized. If power constraints are not satisfied the power and

ground nets must be resized in order to meet the required values and another iteration, which

may include placement and/or routing, must be done.

Placing the cells and routing signals are the next steps in the design flow. In the placement

phase, the standard cells are placed inside the available silicon area. Typically, the standard cells

have a constant size in one of its dimensions (e.g. all cells can have the same height but different

widths), allowing them to be distributed in rows by the placement tool, certifying that none of them

overlap and eventually leaving extra intra-cell spacing. This procedure is extremely important,

since it has a direct impact in circuit timing, routing congestion and feasibility. For better results,

constraints should be given to the placement engine, so it can have information about the required

timing of the circuit.

In synchronous digital circuits, the clock signal should arrive, at the same time, at all of the

synchronous cells. Therefore, this requirement implies that all of the clock paths should have the

same propagation time. Nevertheless, this is extremely difficult to be accomplished using only

the delay imposed by the propagation on the signal lines, since it would require that all paths

should had the exact same length and load. Clock skew is defined as the maximum difference of

the clock arrival times at sequential elements. The maximum allowable clock skew is such that

no data signal transition, consequence of a given clock transition, will arrive at the next clocked

element in its path before that clock transition (considering a setup time of zero). One approach

to reducing clock skew consists in the insertion of delay buffers in the shortest paths so that the

arrival times of the clock signals are approximately the same. Such procedure is automatically

conducted during the synthesis of the clock trees, which analyses the several paths and inserts

buffers in order to reduce or eliminate the clock skew.

After the placement, the power planning and the clock tree synthesis are done, the design is

ready to be routed. Routing is the procedure that implements the interconnection of the various

cells inputs and outputs using the available metal layers. Today’s routing engines not only try

to avoid congestion and comply with the given timing constraints, but also try to reduce adverse

effects that may also happen, like cross-talk. The routing process also connects power and ground

structures to all of the cells. The routing of power structures is usually performed before the routing

of signal lines.

After the design has been routed, it is possible to make a parasitic extraction and obtain timing

information from the resulting layout. This information can be incorporated into a more detailed

31

4. ASIC Design

simulation model to validate the final layout where cell and interconnection delays are considered.

Moreover, an electrical simulation, using an extracted electrical model, could also be performed

on the clock network tree to verify if it performs according to specifications.

An important aspect of the manufacturing process is the yield. If the yield is too low, the design

could become economically condemned due to the high costs of fabrication per unit. This could

make a working chip too expensive to be viable. As a consequence, the designer should take this

into account and Computer Aided Design (CAD) tools should also provide the means to increase

the yield [11]. One possible alternative is the usage of additional logic. As an example, when

memories are fabricated, they usually have additional cells (built-in redundancy) that can be used

to replace defective cells. Unfortunately, this is not the case in this work, since the used memories

do not possess any additional cells. As mentioned in [12], via duplication also improves yield.

Wire widening and spreading are other factors that improve the manufacturing yield [13]. In this

work, via duplication was the only used method to improve the yield.

4.3 Foundry and Technology Selection

Several different technologies are available to implement a given circuit. Complementary Metal

Oxide Semiconductor (CMOS) is currently the most used technology for IC manufacturing, due to

its low static power consumption. Within these, there are several foundries with various process

dimensions and their own set of design rules.

As a consequence, foundry and technology selection is a crucial and very important aspect

of an ASIC design. It influences the area, the power consumption, the delays and operating

frequency, the available cells and memories, the manufacturing costs and the manufacturing dates

(runs). Support for the standard cell library and the availability of the corresponding configuration

files may also constraint the set of usable Electronic Design Automation (EDA) tools (e.g. if the

library does not have characterization files for a given tool) or the other way around (the available

tools constraining the choice of standard cell libraries).

The manufacturing of the considered circuit is done through EUROPRACTICE, including the

acquisition of the standard cell library. EUROPRACTICE IC service allows the production of pro-

totypes at relative low costs, by using Multi Project Wafer runs. Each wafer is composed by de-

signs coming from several participants, thus distributing the cost of mask production through the

various participants (proportionally to the occupied area). Furthermore, universities and other re-

search institutes, which usually have small prototype designs, have access to EUROPRACTICE’s

mini@SIC program. This program reduces the fabrication costs of small designs by reducing the

minimum design area imposed to each participant and thus decreasing the cost of small designs.

Among the supported processes and foundries under EUROPRACTICE’s mini@SIC program,

the UMC foundry, with its 0.18µm CMOS process with 1 poly and 6 metal layers (UMC L180 1P6M

32

4.4 Library and Technology Characterization

MM/RFCMOS), is the available implementation technology with the most stable libraries and with

a financial cost covered by the budget of the project. In this process, the general Multi Project

Wafers are divided in blocks of 5 x 5 mm each. The mini@SIC program further subdivides each

of these 5 x 5 mm blocks in 9 regular square sub-blocks. Designs may occupy one, two, three,

four, six or nine of these sub-blocks. Nevertheless, using nine of these sub-blocks is economically

discouraged, since using a complete 5 x 5 mm block (the equivalent of the nine sub-blocks)

on the general program is less expensive. A design that occupies one sub-block may have a

maximum size of 1525 x 1525 µm, while a two sub-block design may have a maximum size of

3240 x 1525 µm.

During a preliminar phase of this project, the Standard Cell Library from Virtual Silicon [14]

was used. Since UMC has discontinued the support to this Standard Cell Library, an alternative

Standard Cell Library from Faraday Technology [15] was used. This change of the adopted library

implied a susequent change on the used memories architecture and interface, which required an

adaptation of the processor, including the memory BIST controller. It also changed the capabilities

of I/O cells which led to a new selection of these cells. Furthermore, the available core cells also

changed, which led to different synthesis results.


The FSA0A C library [15] is a 0.18µm standard cell library tailored for the UMC 0.18µm logic

process. The nominal supply voltage is 1.8V for the core cells, and 3.3V for the I/O cells, with

some I/O cells being 5V tolerant. Table 4.1 shows the general characteristics of this library [15].

Table 4.1: Faraday’s FSA0A C Standard Cell Library General Characteristics.

Characteristic DescriptionTechnology UMCs 0.18µm 1.8V / 3.3V 1P6M logic processMinimum drawn channel length 0.18µmSupply voltage 1.62V to 1.98V for core cells

2.97V to 3.63V for 3.3V I/O cellsPerformance Td = 27.5ps / stage (measured with a 101 stage

inverter ring and a typical process operated un-der 1.8V, 25◦C)

Gate density 110K gates / mm2

Power consumption 29 nW / MHz / gate (measured with a 2-inputNAND, output load = 2 standard load, and a typ-ical process operated under 1.8V, 25◦C)

Reference cell area 9.794µm2 (2-input NAND with normal drivingstrenght - ND2)

This Standard Cell Library is composed of core cells and I/O cells. The core cells include all

logic function cells like AND, NAND, OR, NOR, XOR, NXOR, Multiplexers, Flip-Flops, Latches,

1-bit full and half adders and other cells. These cells are used to build the logic core. The I/O cells

33

4. ASIC Design

come in two formats: Inline and Staggered. The Inline format is recommended for core limited

designs, while the Staggered format is recommended for I/O limited designs. The dimensions of

the I/O cells available in this library are presented in table 4.2 [15]. Both I/O cell formats can be

combined with inline or staggered pads. This would make four possible combinations between

I/O cells and pads, as shown in figure 4.2.

Table 4.2: I/O cell dimensions.

Height ( µm) Width ( µm) Bonding pad positionInline I/O cell 140.12 62.62 Outside I/O cellStaggered I/O cell 235.60 34.10 Outside I/O cell

Figure 4.2: I/O cell and pad combinations.

The I/O cells available in this library, do not have a physical pad included in their description

and there is no pad cell defined in the library. As such, a custom made pad must be used. Since

the physical layout of the I/O cells is not available when these libraries are supplied by EURO-

PRACTICE, designing a custom pad to connect with the library cells is not possible. EURO-

PRACTICE made available a generic bondpad, which was specially designed to properly connect

with these I/O cells and that complies with UMC Bonding Pad Layout Rules [16]. This pad has

69x69µm with a passivation window of 65x65µm. The pad and the connecting metal layers to the

standard I/O cells have 69x79µm, as shown in figure 4.3.

The I/O cells of this library offer the possibility to be programmed, after being implemented

on silicon. The use of Programmable I/O on Silicon (PIOS) allows the user to enable pull-up or

pull-down resistances, as well as Schmitt trigger control for inputs. It also allows programming

of slew rate and driving capacity for outputs. These features are controlled by additional control

pins in these cells. Hence, although these features could be of interest, they do require additional

input pins to achieve the desired configurations. However, since there is no need for such features

in this project, the configuration of these cells was done by hardwiring the control inputs to the

desired values. The input and bidirectional I/O cells were configured to not use Schmitt trigger

nor pull-up or pull-down resistances, with the exception of the TMS, TRST and TDI inputs of the

JTAG TAP controller, which were programmed to include pull-up resistances. The output and

34


Passivation window

69m

69

m65m

79

m

Metal track

Metal track

Figure 4.3: Bonding pad layout.

bidirectional I/O cells were programmed to have a 2mA output driving capacity (the minimum

possible value) with a fast slew rate.

This library, requires the usage of three power supplies: one power supply net for the core

(VCCK at 1.8V) and two power supply nets for I/O (VCC3I and VCC3O, both at 3.3V). Figure 4.4

shows a representation of the power supplies for these cells library [15]. The VCCK power net

supplies the internal cells, the 1.8V input drivers and the output pre-drivers. The VCC3I net

supplies the 3.3V input receivers and the I/O control logic. Additionally, the VCC3O net supplies

the 3.3V output buffers. Every power net has its ground counterpart net. Therefore, there are

also the GNDK, GNDI and GNDO ground nets. The connection to these power and ground

nets is done through special I/O cells. These I/O cells provide the connection between the pad

and the internal power and ground nets. Hence, to use separate power and ground I/O cells to

individually connect to all of these power nets, it is required a minimum of 6 power and ground

pads in the design. The adopted library also provides power and ground I/O cells (named VCC3IO

and GNDIO) that simultaneously supply the VCC3I and VCC3O power nets and the GNDI and

GNDO ground nets, thus reducing the minimum number of power pads to 4. However, these cells

have less current driving capacity and should only be used if the expected current needed by the

I/O cells is reduced.

Other cells, with specific functions, are also included in the library. Corner cells, for instance,

are provided to allow continuity of I/O power rings in the corners of the die. Since a chip can have

only inline I/O cells or only staggared I/O cells or a mixture of inline I/O cells on one of its sides

and staggared I/O cells on the other, there are three different types of corner cells (inline-only,

staggered-only or inline-staggered). There are also empty cells to be added to the I/O ring that

35

4. ASIC Design

Figure 4.4: Power rings for I/O buffers and core cells.

provide continuity of the well and also power and ground rails for I/O. Usually, I/O power rings

would have to be placed by the designer. However, in this library the I/O power rings are already

included in the I/O and empty cells layout. Therefore the only thing the designer has to be certain

of is that all I/O cells and empty cells are placed adjacently (abut) (are adjacent).

Faraday’s Standard Cell Library also includes Electrostatic Discharge (ESD) protection cir-

cuitry in I/O cells, to prevent an ESD event from damaging the circuit. All I/O cells include these

components to provide current paths for ESD events. According to [17], when this library is used,

the designer only has to make sure that the pads that supply the VCC3I and VCC3O power nets

are connected to the same pin in the package. The same is required for the pads that supply the

GNDI and GNDO ground nets. If the designer uses the VCC3IO and GNDIO cells, this rule can

be ignored, because such a connection is already assured in the I/O cell.

Another special cell that is included in the library, is the filler cell. This cell can be used to fill

in the empty spaces between standard core cells, in order to provide continuity of the well and,

if determined by the designer’s choice, to provide decoupling capacitance. Tie1 and Tie0 cells

are also provided to allow connection of nets to power and ground, respectively. It is advisable to

connect all the nets with fixed logic values to these cells, instead of directly connecting them to

power or ground nets in order to keep ESD robustness (this rule does not apply to I/O cells inputs

which can be directly connected to power or ground) [15].

Due to being acquired through EUROPRACTICE, the library does not include the layout view.

As a consequence, the designer is unaware of the cells and memories layout. This lack of layout

information inhibits some types of analysis as they require this information.

The adopted standard cell library is designed for the UMC L180 1P6M GII Logic process.

However, the available process in the mini@SIC program is the UMC L180 1P6M MM/RFCMOS.

The basic difference between these two processes is the thickness of the top level metal layer. In

the GII Logic process the metal 6 layer is 8kA thick while in the MM/RFCMOS process the metal

6 layer is 20kA thick. Therefore, a different set of topological layout rules exists for the metal

6 layer. Since the layout rules for the thick top level metal process (20kA) are stricter, designs

36


that follow the layout rules for the 8kA process will fail on DRC checks of the 20kA process.

Consequently, if the metal 6 layer is used, the designs that are implemented using the adopted

standard cell library will fail the DRC checks of the 20kA process. Therefore, the metal 6 layer

will not be used for routing in this project, in order to avoid DRC violations.

4.4.1 Memories

The memory devices available in this library include single and dual port SRAM and have the

interfaces shown in figure 4.5. The single-port SRAM, used in the program memory, supports both

word write and byte write operations (the WEB port includes the write enable signals for each of the

word’s bytes). This is particularly useful since the processor’s program memory loading procedure

is done using a one byte interface. Data is input through port DI and stored in the memory

position addressed by A. The write operation is performed in a given byte of the memory word

if the respective byte-write enable signal, in port WEB, is low. Memory read and write operations

are only performed if the CS signal is high. The three-state output buffers are only active if the OE

signal is high.

Dual-port SRAMs allow independent read and write access to the memory contents through

both ports (portA and portB). Each port has its own clock signal (CLKA and CLKB). It is up to

the designer to assure that accesses made through both ports maintain data coherence. The

dual-port SRAMs, used in the macroblock and search area memories, are 8-bit (1 byte) word

memories. Therefore, in these two memories, the byte-write capability is not used. Nevertheless,

the generic model for these memories also supports byte-write in both ports through the WEAN and

WEBN ports. These memories also possess two pairs of access ports, DIA and DIB for data input,

and DOA and DOB for data output. Address ports A and B specify the address for each port, while

the OEA and OEB signals control operation of the three-state buffers of each port. The chip select

signal CSA and CSB allows control over the operation of each port.

These memories are also represented in the design as the standard cells, using several dif-

ferent views. The memory views are usually generated by a memory compiler, which is capable

of generating a predetermined set of memories. When using the EUROPRACTICE services,

these memories are generated on request. However, the supplied memories do not have all the

necessary views for the synthesis tools. Such absence does not compromise the resultant circuit,

because memories are defined using a structural description (they are explicitly instantiated in the

VHDL code and not inferred by the synthesis tool), but impairs the analysis by the synthesis tools.

For instance, timing, power and area analysis do not take into account the memory elements.

37

4. ASIC Design

CK CS

DODI

OE

AWEBM

N N

K

(a) Single Port SRAM.

A

B

DIA

DIB

OEA

OEBCKA CKB CSA CSB

DOA

DOB

WEAN WEBN

K K

N

N

M

M

N

N

(b) Dual Port SRAM.

Figure 4.5: Memory interfaces.

4.5 Packaging

After being manufactured, the chip has to be encapsulated. Encapsulation protects the silicon

die from environmental aggressions and assures a mechanically robust interface. Packages for

IC encapsulation are available in several materials, pin count and form factors. Some packages

are meant for permanent placement while others are designed to be connected using sockets and

sustain the mechanical stress of being inserted and removed from the socket.

While at prototyping stage, the AMEP package should be socket oriented, because the test

platform is unique and, as such, multiple prototypes will need to be tested using a single socket.

Depending on the package manufacturer and product line, the available pin counts may vary

significantly but they are usually available at discrete values. By using the EUROPRACTICE

Packaging service, several ceramic packages are available. Among these, the Ceramic Leadless

Chip Carrier (CLCC) provides a square package with socket connection capability. The available

pin count for this type of package is 44 or 68 pins (in the required range for the AMEP).

The AMEP functional interface requires 35 signal pins. With the additional power and test pins,

the required number of pins in the package will be greater than 44. As a consequence, the CLCC

package with 68 pins was chosen.

The package area, where the die is placed is usually square. If the die is also square, then

bonding should offer no difficulties. However, if the die has a rectangular shape, some package

pins may not be available for connection, as they may violate the maximum and minimum angle

between the bondwire and the package [18]. Since the AMEP die shape is rectangular (as will

38

4.6 Pin Positioning

be seen in Chapter 7), and due to the adopted pad pitch (distance between the centers of two

adjacent pads), die dimensions and packaging rules, a maximum of 56 pins on the CLCC 68

package are available for bonding.

4.6 Pin Positioning

The pad pitch must be such that it complies with the minimum requirements of the technology

and of the bonding process. The minimum pad pitch required by the technology is 60µm [16]. The

recommended pad pitch by EUROPRACTICE for bonding is 90µm [18]. A pad pitch lower than

this recommended value will incur in extra costs. Since the design is not I/O limited, the 90µm

value is adequate and was adopted as the minimum pitch in this work.

The distribution of the signals through the various pins on the package has impact on cross-talk

effects, IR (voltage) drop inside the chip, clock and signal delays and routing congestion (either

inside the chip and outside the board). For instance, clock pads have to be relatively distant from

power and ground pads, to avoid interference on these lines.

In order to have a balanced power distribution, two pairs of VCC/GND power cells were used

to supply enough current to the core and these were positioned on opposite sides of the chip.

Two pairs of VCC and GND pads for I/O cells (that simultaneously supply VCC3I and VCC3O)

were also added to the design. Figure 4.6 shows the considered disposition of the I/O cells in the

AMEP layout.

Address[2]

Address[1]

Address[0]

GNDK_1

CLK

VCCK_1

TDI

Address[18]

Address[19]

#oe_we

VCCK_2

TCK

GNDK_2

en

rst

Add

res

s[3

]A

ddre

ss[

4]

Add

res

s[5

]A

ddre

ss[

6]

Add

res

s[7

]V

CC

3IO

_1

Add

res

s[8

]A

ddre

ss[

9]

Ad

dre

ss[1

0]A

dd

ress

[11]

Ad

dre

ss[1

2]

Ad

dre

ss[1

3]A

dd

ress

[14]

Ad

dre

ss[1

5]A

dd

ress

[16]

Ad

dre

ss[1

7]

GN

DIO

_1

tes

t_m

ode

TR

ST

TM

ST

DO

gn

tG

ND

IO_

2re

qdo

ne

Da

ta[0

]D

ata

[1]

Da

ta[2

]V

CC

3IO

_2

Da

ta[3

]D

ata

[4]

Da

ta[5

]D

ata

[6]

Da

ta[7

]

OUTPUT

Power/Ground

INPUT

Bidirectional

Legend

CHIP CORE

Figure 4.6: Diagram of I/O cells position.

39

4. ASIC Design

40

5FrontEnd - From Behavioral VHDL

code to Verilog netlist

Contents5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 25.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

41

5. FrontEnd - From Behavioral VHDL code to Verilog netlist

5.1 Introduction

In this section, it is described the process to perform the compilation of the VHDL source

code into a gate-level Verilog netlist, mapped in the selected technology (the frontend stage).

Three distinct workflows are used at this stage: a simple flow without any changes to the design

structure, a second flow which provides insertion of scan structures and a third flow which includes

the insertion of JTAG structures. The flow that inserts the scan structures is an extension of the

basic flow. On the other hand, the flow for JTAG structures insertion is an independent flow, that

can be performed after any of the previous two flows. The principal characteristics and capabilities

of the used tools in the frontend are also described.

5.2 Tools

In this work, the Synopsys Inc. software package was used to perform the synthesis of the

HDL code, as well as the insertion of the test structures in the circuit. Although several soft-

ware manufacturers provide synthesis tools, the Synopsys package was chosen because it is the

software with the best support and is the industry’s de facto reference software for synthesis.

The software package is composed of several tools. These tools are instantiated by Design

Compiler, the main application of this package, whenever they are needed. This significantly

reduces the designer’s workload because all the functions can be integrated into one single tool,

using a single interface. The Design Compiler can be accessed either through a command-line

interface or through the Graphical User Interface (GUI). Although, in certain situations, the use

of the command-line is useful, the GUI is more user-friendly but it is also able to convey more

information to the designer, which is especially useful when the design is under a development

stage. There are two available graphical interfaces: Design Analyzer and Design Vision. Since

Design Vision is the most complete and functional GUI and is recommended by Synopsys, it was

chosen as the main interface.

The version of the Synopsys tools used in this work is Version Y-2006.06 for Linux – May 25,

2006. This software was supplied by the EUROPRACTICE software package.

5.2.1 Design Compiler

Design Compiler is the synthesis and timing analysis application from Synopsys Inc.

The designs described using a HDL are compiled and mapped into a Generic Technology

(GTECH) of cells. These GTECH cells are technology independent cells that describe the func-

tions of certain blocks that can then be implemented in any technology. For instance, a generic

sequential cell could be implemented as a D Flip-Flop with associated logic to perform a syn-

chronous reset.

42

5.2 Tools

The Design Compiler allows a designer to set optimization constraints, which guide the pro-

gram to find the closest solution of the designer’s objectives. Besides these optimization con-

straints, design rule constraints are also used by Design Compiler. These include maximum

fanout, maximum capacitance, maximum transition time and other constraints. The design rule

constraints are set in the technology libraries and take precedence over optimization constraints.

Moreover, a designer may even override the technology design rules, by making them more re-

strictive. However, setting too many constraints or setting unrealistic values for the constraints

may have an adverse effect and guide the algorithms into a solution that is far from the designer’s

objectives and from the solution that could be achieved without such tight constraints.

In the compile phase, Design Compiler optimizes the design and translates the GTECH cells

into the target technology cells. It is during this optimization and translation process that the

defined constraints are taken into account. These constraints are used to calculate cost functions

and are prioritized according to table 5.1 [19].

Table 5.1: Cost Function default priority

Priority (descending order) Constraint typemaximum transition time Design Rule Constraintmaximum fanout Design Rule Constraintmaximum capacitance Design Rule Constraintcell degradation Design Rule Constraintmaximum delay Optimization Constraintminimum delay Optimization Constraintmaximum power Optimization Constraintmaximum area Optimization Constraint

Two cost functions are calculated during the gate-level optimization of the compile phase.

These are the Design Rule Cost Function and the Optimization Constraints Cost Function, which

group the design rule constraints and the optimization constraints, respectively. The cost functions

are calculated based on the differences between the values set for the constraints and their actual

values. The objective is to set these cost functions to zero. The Design Compiler evaluates each

component independently, in order of importance, and accepts an optimization step if it decreases

the cost of one component without increasing higher priority costs [19].

The design rules cost function is calculated according to equation 5.1.

Cost design =∑

∆max transition +∑

∆max fanout +∑

∆max capacitance (5.1)

The optimization constraints cost function is calculated according to equation 5.2

Cost optimization =∑

∆max delay +∑

∆min delay +∑

∆max power +∑

∆max area

(5.2)

Design Compiler uses the concept of path groups, to perform time related optimizations and,

consequently, to calculate the cost functions. A path group is a set of paths that can be implicitly or

43


explicitly set. Path groups are implicitly set when a clock signal is defined. All the paths between

clocked elements by that clock signal are automatically added to the same path group. A user can

explicitly define other path groups, according to his needs. The path groups can be used to guide

the Design Compiler in performing timing optimizations in circuit regions set by the designer. In

this work only one clock signal exists, and since there was no need for particular optimizations,

no additional path groups were defined.

Among the various cost functions components, the maximum delay has a particular impor-

tance, since it influences an important goal: the maximum working frequency. The maximum

delay cost function can be determined using two methods: the Worst Negative Slack Method or

the Critical Range Negative Slack Method. The Worst Negative Slack Method takes into account

only the delays of the worst violating path in each path group (the critical path). The Critical Range

Negative Slack Method takes into account the violating paths of each path group that are within

a specified delay margin (referred to as the critical range) of the worst violator [19]. The latest

method, although more computationally intensive, has the advantage of optimizing not only the

critical path but also the near critical paths that might become critical after Place and Route (P&R),

because complete timing information, at this phase, is not yet available.

5.2.2 DFT Compiler

The DFT Compiler is responsible for determining the architecture of scan structures and their

insertion into the design. This tool is integrated with the Design Compiler and DFT commands are

passed and processed by DFT Compiler. Therefore, it provides integrated design-for-test capabil-

ities, including constraint-driven scan insertion during compile. The DFT Compiler is responsible

for the replacement of normal cells with scan cells and for the interconnection between them, to

form the scan chains. During this process, additional signals and input pins are inserted, to allow

the scan chains to be controlled from the primary inputs. DFT Compiler is also responsible for

generating the appropriate output files for ATPG and for ATE operation.

Several scan styles are supported by DFT Compiler, namely: the Multiplexed Flip-Flop Scan

Style, the Clocked-Scan Scan Style, the LSSD Scan Style and the Auxiliary-Clock LSSD Scan

Style [7]. The designer may choose among these scan styles the one that best fits his require-

ments and that can be supported by the cells available in the target technology.

DFT Compiler is capable of performing scan chain insertion either on unmapped designs

(from a HDL source), or on mapped designs without scan structures (from a netlist) or on mapped

designs with scan structures (in this case, DFT Compiler only optimizes the netlist). Inserting

scan structures on unmapped designs achieves the best results, since DFT Compiler and Design

Compiler can work simultaneously on the same design to perform constraint-driven scan insertion

(this is named by Synopsys as a Test-Ready Compile) [20].

A test protocol must be also created during the DFT Compiler session. The test protocol de-

44

5.2 Tools

fines test signals and their timing and initialization sequences. The test protocol can either be

automatically generated, based on the signal definitions given to DFT Compiler, or by reading a

Standard Test Interface Language (STIL) file. Test initialization sequences are patterns that must

be sequentially set in a circuit’s inputs so that it may enter in test mode. These initialization se-

quences must be given to DFT Compiler when the design includes certain custom test structures

that are already defined in the source files and that may have relatively complex or non usual

forms of entering test mode. If the design requires a test initialization sequence, it has to be de-

scribed in the STIL file, since DFT Compiler does not support this type of definitions using internal

commands.

By using the defined test protocol, DFT Compiler is capable of performing DFT DRC analysis

to determine which, if any, test rules violations occur. DFT DRC checks for violations that prevent

scan insertion, data capture or that reduce fault coverage. For instance, an uncontrollable clock

or an uncontrollable asynchronous control signal of a flip-flop prevents that flip-flop from being in-

cluded in a scan chain. If the asynchronous control signals of a given flip-flop are asserted during

the test procedure, that will also prevent the flip-flop from being inserted in the scan chain. A data

capture violation is reported if the clock signal drives the data input or more than one input pin of

the same flip-flop, or if a black box component drives the clock or an asynchronous control signal

of a flip-flop. If a three-state bus contention occurs a data capture violation is also reported [20].

The use of black boxes in the designs, as is the case of the processor memory blocks, reduces

fault coverage as the outputs of such blocks cannot be determined by DFT Compiler.

Violations that may be present in the design should be corrected in the HDL description. Nev-

ertheless, DFT Compiler offers the possibility to automatically correct some of the violations using

a feature called AutoFix. This feature automatically fixes scan rule violations associated with un-

controllable clocks, uncontrollable asynchronous set and reset signals and three-state signals.

The AutoFix feature is able to fix violations in LSSD and Multiplexed Flip-Flop Scan Style de-

signs [20]. Autofix adds multiplexers to the violating flip-flops signal inputs to allow them to be

controlled during test mode. Besides the signal needed to control these multiplexers (the test

mode signal), additional signals and ports may be added to the design by AutoFix.

To control the scan chains operation, scan enable and test mode signals are used. The scan

signal controls the multiplexer of the multiplexed scan flip-flops in the design. It allows the flip-flops

to select their data input between the regular circuit connection (in normal mode) or the output

of the previous flip-flop in the serial scan chain (in serial shift mode). The test mode signal is

responsible for maintaining all the flip-flop’s control signals (reset, preset) deasserted during the

test procedure. This is necessary, because if the control signals are generated by internal logic,

they may be asserted during test, which will make the generation of test vectors more complex

and may even impair the test procedure. The scan enable signal is active when in scan mode

(serial shift) and the test mode signal is active during the entire test procedure.

45


5.2.3 BSD Compiler

The BSD Compiler is responsible for implementing the IEEE 1149.1 standard and for veri-

fying that the design complies with it. It is able to the insert boundary scan cells and the TAP

controller, as well as producing the necessary files to make the device interoperable with IEEE

1149.1 compliant test equipments. This tool is also integrated into the Design Compiler synthesis

environment.

BSD Compiler requires the definition of a top level design in which all inputs to the core logic

and the inputs for IEEE 1149.1 functionality are defined and have I/O pad cells associated with

them. Only the inputs to the core logic should be connected to the core design. Figure 5.1 shows

the required interface [21].

ENBTDI

TMS

TRST

TCK

test_si

test_se

i 1

i n

test _si

test_se

i 1

i n

test _so

o1

o2

on

test_so

Top Level Design

Core Design

.

.

.

.

.

.

.

.

.

o1

o2

on

.

.

.

TDO

Figure 5.1: Top level design structure required by BSD Compiler.

According to the IEEE 1149.1 standard, the TMS, TRST and TDI input lines have to behave like

if a logic 1 was applied to it when that input is undriven. In this work, this can be accomplished by

using pull-up resistors that are enabled by configuring the respective PIOS cells available in the

library.

The BSD Compiler is also responsible for generating the Boundary Scan Description

Language (BSDL) file that contains the following information: the nature of the pins in the de-

sign (input, output or bidirectional pin), the logical correspondence between signal names and

physical pins, the identification of the pins used by the IEEE 1149.1 TAP interface, the description

of the instruction register, the implemented instructions and their opcodes and which data shift

register is accessed by each instruction and a description of all BSR cells and their functionality

(e.g. observe-only, observe-and-control).

The generation of test vectors to test the boundary scan logic and the TAP controller is per-

formed by the BSD Compiler. The generated test vectors can then be simulated by TetraMax.

46

5.2 Tools

These test vectors are generated by BSD Compiler, instead of using TetraMax ATPG capabilities,

because the BSD Compiler has an architectural knowledge of the inserted logic that TetraMax is

unaware of. Therefore, the BSD Compiler is capable of generating the test vectors without us-

ing generic algorithms that would require more computational effort and the definition of complex

initialization patterns.

5.2.4 TetraMAX

TetraMAX is the ATPG tool from Synopsys. It is capable of generating test patterns that max-

imize fault/test coverage using a minimum number of test vectors in various design types and

flows. Functional and stuck-at testing are the traditional circuit testing methods. Functional test-

ing exercises the device as it would actually be used in the target application. However, this type

of testing has only a limited ability to test the integrity of the devices internal nodes. With scan

testing, the sequential elements of the device are connected into chains and used as primary in-

puts and primary outputs for testing purposes. By using ATPG techniques, a much larger number

of internal faults may be tested than with functional testing alone [6].

This tool has three different ATPG modes: Basic-Scan ATPG, Fast-Sequential ATPG and Full-

Sequential ATPG. In Basic-Scan mode, TetraMAX works as a full-scan, combinational-only ATPG

tool. By using this mode, all sequential elements have to be included in a scan chain in order

to achieve a high-fault coverage. Fast-Sequential mode provides limited support for partial-scan

designs (designs where not all sequential elements belong to scan chains). This mode allows

multiple capture procedures (clock transitions) between scan load and scan unload, allowing data

to be propagated through nonscan sequential elements like nonscan flip-flops and Random Ac-

cess Memorys (RAMs). In this case, all clock and reset signals of these nonscan elements must

be controllable at a primary input. Full-Sequential ATPG is similar to Fast-Sequential ATPG, al-

though in this case the clock and reset signals of the nonscan sequential elements do not need

to be controlled at a primary input [6].

TetraMAX is capable of generating test patterns for the following fault models: SSF, IDDQ,

transition delay, path delay and bridging. Among these, the SSF is the adopted fault model in

this work. The only required files for generating test vectors is the design netlist and the models

(described in Verilog) of the used cells. For complex designs, TetraMAX also requires a test

protocol file, where specific information about test structures and how to properly use them is

given [6]. This file contains test initialization procedures, capture procedures, shift procedures

and others that allow TetraMAX to set proper values at the test structure’s inputs to effectively use

them.

For all designs TetraMAX needs to have information identifying the clock ports, asynchronous

set and reset ports, scan chain inputs and outputs, ports that place the design in test mode,

that enable shifting of scan chains and that globally control bidirectional drive and their active

47


states [6]. In simple designs, some of this information could be given directly by using TetraMAX

commands. On more complex designs, a STIL test protocol file must be provided which includes

all the necessary information.

TetraMAX also performs design checks that verify, among other aspects, the connection of the

scan chain’s inputs and outputs, if all clocks and asynchronous set and reset signals connected

to scan chain flip-flops are only controlled by primary input ports and if any internal multiple-driver

net can be in contention.

5.3 Workflow

5.3.1 Basic workflow

The basic workflow, that was implemented in this work using the Synopsys Design Compiler

tool, is outlined in figure 5.2. The figure includes the commands used to accomplish each of the

steps. These commands are found as part of the script presented in section B.1.2.

Specify libraries

Read design

Define design environment

Set design constraints

Select compile strategy

Optimize/Map design

Analyze and resolve design problems

Save the design database

Link_libraryTarget_librarySymbol_librarySynthetic_library

AnalyzeElaborate

Set_operating_conditionsSet_wire_load_model

Create_clockSet_clock_uncertaintySet_max_dynamic_power

Top_down

Compile

Check_designReport _areaReport _constraintReport _timing

Write

Develop HDL Files

Figure 5.2: Synopsys Basic Workflow.

Before starting the synthesis process, the designer has to make sure that all the necessary

libraries are available. These libraries are files that describe the available standard cells that can

be used during the mapping process, as well as their characteristics. The link and target libraries

48

5.3 Workflow

are technology libraries that define the set of cells and related information, such as cell names, cell

pin names, delay arcs, pin loading, design rules, and operating conditions [19]. The symbol library

contains the symbols for schematic viewing of the design. It must be present if the GUI, Design

Vision, is to be used [19]. The location of all these libraries can either be set in the configuration

files or using the command line. In this work, the libraries were set using the configuration files

because they remain the same throughout the implementation flow.

After the definition of the libraries, the design was read into the Design Compiler work envi-

ronment using the HDL Compiler, which interprets the VHDL code with the circuit’s description

and converts it into a logic gate description, using GTECH cells. These stages correspond to the

“analyze” and “elaborate” commands, in the workflow.

The operating conditions and wire load model are set in the next stages. There are several

operating conditions defined in the libraries, that represent different process corners. Normally,

the Typical conditions, Best conditions and Worst conditions are defined. The Best conditions

setting is used to determine hold violations, while the Worst conditions setting is used to deter-

mine setup violations. Best conditions and Worst conditions were simultaneously used, during

synthesis, to allow the synthesis tool to perform timing analysis and cell selection based on the

most unfavorable conditions. The wire load model is an estimate of the characteristics (area, ca-

pacitance and resistance) of the interconnecting nets after routing. Since, at this stage, there is

no information regarding interconnection nets, this estimate is necessary in order to assess the

delays and perform timing analysis of the design before Place and Route (P&R). These wire load

models are predefined in the libraries. The choice for the specific wire load model that will be used

is based on the designers perception of the design’s interconnection characteristics after routing.

After P&R, it is possible to validate the choice and, if it produced non-optimum results, select

another wire load model and perform a new iteration. The chosen wire load model has influence

on the synthesized circuit, since there are various available cells, with the same logic function but

with different characteristics (area, drive strength, propagation delays, consumed power and input

capacitance) that can be selected by the synthesis tool in order to achieve its goals (whether an

area goal, a timing goal or even both). If the chosen wire load model represents interconnec-

tions with small resistance and capacitance values, the synthesis tool will choose cells with lower

drive capacity, because they still accomplish a given timing constraint and have a smaller area.

Nevertheless, if after P&R the interconnections have more resistance and capacitance than the

values of the wire load model, the timing constraints may be violated. On the other hand, if the

wire load model represents interconnections with high resistance and capacitance values, then

the synthesis tool may choose cells with high drive capacity that will, unnecessarily, occupy more

area and consume more power. In this work, the model named “G30K” was chosen. The “G30K”

is a mid-range model, that seems adequate for a design with the characteristics of the AMEP.

The next stage in the design flow is to set the constraints were then set using the appropriate

49


commands. Since there is no need to restrict the design rule constraints already defined in the

libraries, only the optimization constraints were set in this work. Among the several possible con-

straints available, only the clock, the clock uncertainty and the maximum power constraints were

set. The clock constraint specifies the clock period of the circuit, while the clock uncertainty spec-

ifies the allowed clock skew. These constraints are only guidelines for the synthesis tool and their

compliance must be verified in subsequent steps. Note that a wire load model is used instead

of real wire characteristics. Consequently, if the synthesis tool reports that the design complies

with the timing constraint, it does not necessarily mean that in the final circuit, after P&R, the

constraints are still met. Therefore, subsequent checks will be made during and after the P&R.

The value that was set for the clock period constraint (10ns) was based on the maximum clock

frequency obtained in previous synthesis results for the FPGA implementation of the circuit, and

the required characteristics of the AMEP [2]. The final maximum power constraint (9.5mW) was

set after a few iterations that revealed approximate values for the circuit’s power consumption. It

is worth noting that this power estimate does not take into account the memory blocks, because

of the unavailable views for the synthesis tool. Moreover, after an initial iteration, it was verified

that the occupied area (1.20 mm2) was much less than the available area (5.2 mm2). As a con-

sequence, since unnecessary constraints should not be used, the maximum area constraint was

not set.

The top-down compile strategy was chosen since the design constraints are set at the top

level, as a global objective. The synthesis process of the various sub-blocks of the design is

automatically done as well as the optimization of the various blocks, in order to guarantee that

the global constraints are met. This reduced the workload and achieved the necessary goals.

Alternatively, it is possible to set individual constraints and perform compilation on each of the

sub-blocks. Afterwards, the design would be assembled with the already compiled sub-blocks.

According to the basic basic synthesis workflow shown in figure 5.2, design compilation should

be done in the next stage. During this phase, the design is mapped and optimized according to the

defined constraints. If after compilation the design would not meet the specified constraints, new

constraints should be set or the design should be changed. Then, after compilation, the design

is saved and exported into the formats required by the back-end tools. In this case, the design

would be saved in a proprietary format (DDC) and would also be exported as a Verilog netlist.

The timing constraints would be saved in a file using SDC format, in order to be later imported

into the P&R tool.

5.3.2 Workflow with insertion of scan chains

The basic flow, described in section 5.3.1, does not take into account for test structures. Since

the implemented design requires test structures that are not defined in the VHDL source, the pre-

vious workflow (shown in figure 5.2) must be extended to insert and architect these test structures.

50

5.3 Workflow

The workflow that was followed in this work to achieve the insertion of scan chains is outlined in

figure 5.3.

Read design

Define design environment


Select compile strategy

Run Test-Ready compile

AnalyzeElaborate

Set_operating _conditionsSet_wire_load_model


Top_down

Compile -scan

Set scan style

Check design rules

Set_scan_configuration

Define clocks and asyncs

Set_dft_signal

Create_test_protocolDft_drc

Correct problems

Check constraints Adjust constraints or compile strategy

Check design rules

Create_test_protocolDft_drc

Correct problems

Set scan configuration

Build scan chains

Optimized netlist with scan

Check design rules Correct problems

Check constraints Adjust constraints or compile strategy

Save testable design

Set_scan_configurationSet_scan_path

Insert_dft

report_constrain

Dft_drc

report_constrain

Preview scan chains

Preview_dft

Adjust scan configuration

Develop HDL Files

Figure 5.3: Synopsys Workflow with scan structures.

Synopsys tools support various workflows to insert scan structures, depending on the initial

state of a design. The workflow outlined in this work follows the Unmapped Design Flow [20],

because the initial design is read from a VHDL description without any defined scan structures,

which are subsequently inserted along with the basic design flow. The commands shown in this

picture can be found in the script presented in section B.1.2.

This flow starts by defining the libraries, reading the design files, defining the design envi-

ronment and setting the design constraints, just like the basic flow. Afterwards, it is necessary

51


to define the scan style. In this work, the chosen scan style was the multiplexed flip-flop scan

style, as this is the only scan style supported by the cells of the selected target technology. Next,

the primary input and primary output pins that are used by the scan structures were set. This

directs the DFT Compiler to use specific pins for the test signals. In this work, the pins used by

the test structures are I/O cells that were instantiated in the VHDL code. The DFT Compiler must

be instructed to use these internal pins (hookup pin) as source of the test signals, otherwise it

would create additional ports in the top-level design. The signals required to be specified, for the

implemented test structures in this work, are the scan clock (which is also the system clock), the

scan enable signals (test se and test se2), the test mode signal (test mode) and the scan in

(test si1 and test si2) and scan out (test so1 and test so2) signals. Any set or reset signals

should also be specified, so that the Design Compiler is aware of its function (in this case, only

the reset signal, rst,was defined).

The next step is the creation of a test protocol. By using the previously defined signals, the DFT

Compiler automatically generates the test protocol. Since this circuit does not have any special

test initialization sequence, the automatically generated test protocol is sufficient. After creating

the test protocol, the DFT DRC was performed. At this stage, if the design presents violations,

these should be carefully analyzed to determined whether or not they may or should be corrected,

in order to improve test coverage. If such corrections need to be done, the HDL source files will

have to be edited or, if they can be automatically corrected, the AutoFix feature will have to be

enabled. At this stage, this project presented a few DFT DRC violations (uncontrolable flip-flop

reset lines and clock feeding data input violations). The violations involving the uncontrolable

reset signal of the flip-flops can be fixed using the AutoFix feature. Since the remaining violations

required changing the processor description and were of minor importance and with small impact

on the fault coverage, they were not corrected.

At this phase, the design is ready to be compiled. The compile strategy is the same as in

the basic workflow (a top-down compile strategy). The Test-Ready Compilation is done using the

additional switch in the compile command (-scan), as shown in figure 5.3 and in section B.1.2.

After compilation, the design already has the scan cells inserted on it, but they are not yet inter-

connected to form the scan chain shift register. The connection of the flip-flops to form the scan

chains is performed in a later stage. At this stage, it was verified that the imposed constraints were

still met. If they were not, either the set of constraints should be changed or a different compile

strategy should be chosen and, afterwards, a new compile iteration would be undertaken.

The test protocol was regenerated at this point and the DFT DRC was performed once again,

in order to check if any additional violations have appeared due to the compilation process. If no

violations are reported or if it is considered that the reported violations are of minor importance

the next step in the workflow is the configuration of the scan chains. If there are major violations,

then the test signals or their timings must be redefined and a new test protocol generated. In

52

5.3 Workflow

this work, the violations that exist at this point are the same violations that existed in the last DFT

DRC, since the AutoFix has not yet been performed.

The configuration of the scan chains involves the determination of how many scan chains

should exist, which scan elements (flip-flops) belong to each of them and which signals control

each of the chains. The implemented circuit requires one independent scan chain, formed by the

error address registers of the three memory BIST controllers (scan chain 1). Such chain allows

the extraction of the addresses of possible faulty memory positions, without interfering with the

rest of the circuit. A second scan chain was built with the remaining flip-flops (scan chain 2). By

analysing the result of the two scan chains, it can be observed that the main scan chain has about

750 flip-flops while the scan chain used to extract the values from the memory BIST controllers

has 30 flip-flops. Usually, scan chains should have the same number of flip-flops, to minimize

the test time. Nevertheless, this is not the case with the implemented circuit, because one of the

chains has a very specific purpose (address extraction without interference with the remaining

circuit) which restricts the number of flip-flops in it.

Since the previewed scan chains had the intended configuration, these chains were effectively

implemented in the circuit. The implementation of the scan chains is carried out by connecting the

scan flip-flops that were previously inserted in the design at compile time. At this stage, the whole

set of multiplexers and logic elements that AutoFix found necessary to resolve violations were

also added. After building the scan chains, the result was an optimized netlist that represented

the circuit which performs as described in the HDL source files and that also includes the test

structures to provide a better test coverage.

A final DFT DRC was then performed to ensure that there were no violations or that the ones

that eventually exist are tolerable. In this work, besides the violations of the clock signal feeding

a data input, there were violations regarding the enable signal of the output three-state buffers

being affected by the value of a scan chain element. This last violation is also tolerable since the

AMEP

data

addr

#oe_we

8

20

done

req

gnt

clk en rst

test_se1 test_se2 test_mode

test_si1

test_si2

test_so1

test_so2

2

ram_bisten

Figure 5.4: AMEP interface after inserting scan chains.

53


circuit will not be tested when other devices are connected to the bidirectional bus. A final check

to assure that the design still complies with the imposed constraints was also done. If the resulting

design had violations or if it did not meet the set constraints, new iterations should be performed.

Finally, the design was saved (in DDC format) along with its test protocol (in STIL format). The

output netlist was written in Verilog so that it could be imported into the P&R tool.

After the insertion of the scan structures, the AMEP has the interface shown in figure 5.4. The

test mode signal is used to force the circuit into test mode, which allows control of the reset inputs

of the flip-flops from a primary input. The test se1 and test se2 inputs control the operation of

the scan chain 1 (for memory BIST address extraction) and of the scan chain 2, respectively. Data

input for the respective scan chains is done through the test si1 and test si2 port. Data output

is done through the test so1 and test so2 ports.

5.3.3 Workflow with JTAG insertion

The workflow to insert the JTAG interface can begin from the stage where the gatelevel netlist

of the design is read or by continuing at the stage where the workflow with scan structures in-

sertion ended, as is the case of this implementation. Figure 5.5 shows the considered workflow

containing only the steps that were done by BSD Compiler.

Read design netlist or continue previous

workflow

Set boundary scan specifications


Preview Boundary Scan

Insert Boundary Scan Logic

Generate BSD patterns

Set_dft_signalSet_bsd_instructionSet_scan_path


Preview_dft

Insert_dft

Generate Gatelevel Netlist

Generate BSDL file

Read Pin Map

Create_bsd_patterns

Read_pin_map

Write_bsdl

Write

Figure 5.5: Synopsys JTAG Workflow.

54

5.3 Workflow

The insertion of the JTAG logic is done by Synopsys BSD Compiler and requires a special top

level design, as mentioned earlier. The used interface is shown in figure 5.6. With this purpose,

a new VHDL entity was defined. In this entity, the I/O cells for the JTAG tdi, tdo, tms, tck and

trst signals were instantiated. The core logic, which already contains the I/O cells, was also

instantiated and the proper connections were done.

AMEP CORE

ENBTDI

TMS

TRST

TCK

CLK ADDR

Top Level Design TDO

EN

RST

GNT

test_mode

#OE_WE

DONE

REQ

CLK

EN

RST

GNT

DATA

ADDR

#OE_WE

DONE

REQ

20test_se1test_se2

test_si1test_si2 test_so1

test_so2

8 DATA

t est_mode

ram_bisten

2

Figure 5.6: AMEP interface for JTAG insertion by BSD Compiler.

With the use of a JTAG interface, the memory BIST enable signals and the test structures

control signals can be controlled through the TAP controller, with exception of the test mode

signal. Therefore, the use of the I/O cells for the ram bisten, test se, test se2, test si1,

test si2, test so1 and test so2 signals is not needed, as these signals will be driven by the TAP

controller logic or their input and output will be done through the TDI and TDO ports of the JTAG

interface. As it can be observed in section B.1.2, the removal of these cells and the connection of

the resulting opened nets is done using the commands available in Design Compiler.

After setting the operating conditions, the design constraints and after defining the existing

clock signals for the core logic, the ports associated with each of the JTAG interface signals were

defined. It is important to define the clock signals for the core logic, because the boundary scan

cell for a clock pin should be an observe only cell and if a clock signal is not specified, BSD

Compiler will place a control-and-observe cell in that clock input.

In order to properly connect the Boundary Scan Register (BSR) cells, the BSD Compiler

should have information of the pin mapping used after packaging, in order to make the con-

55


nections between adjacent scan cells. This is defined in a pin mapping file. Such mapping should

be equal to the pin layout order, used in the P&R phase.

The configuration of the TAP controller is performed in the next step. In this circuit, a 4 bit in-

struction register was used, adopting a binary instruction format, which gives a total of 16 possible

instructions. Among these, the three mandatory instructions EXTEST, SAMPLE and PRELOAD

must have their opcode defined. The mandatory instruction BYPASS has an opcode that is formed

by all instruction register bits with a high logic value (logic 1). The IEEE 1149.1 standard specifies

other optional instructions. This design implements the HIGHZ and IDCODE instructions that are

defined in the standard. The HIGHZ instruction is quite useful, since this IC has a three-state

bidirectional bus that may be connected to a shared system bus. Therefore, this instruction allows

the output drivers of this IC to be placed in high impedance, allowing other devices connected to

the same bus to be tested. The IDCODE instruction allows the device to be identified in a larger

system and to check the current version of the IC. This design also implements additional user-

specified instructions to control the memory BIST controllers. These are the SELECTSAMEM

(for the search area memory), SELECTMBMEM (for the macroblock memory) and SELECTINST-

MEM (for the instruction memory), which control the necessary signals to enable the respective

memory BIST controller, without the need for dedicated package pins.

After a preview, the implemented configuration was accepted and BSD Compiler generated

the necessary logic and automatically compiled and optimized only the top-level design which

includes the BSR and the TAP controller. The BSDL file was generated after compilation of

the JTAG logic. This file contains information that is essential for the characterization of the

implemented JTAG logic and that will allow the test equipment to use the available test features.

Compliance of the design with the IEEE 1149.1 standard is also assured by the BSD Compiler.

This step should be performed in order to verify that the design and the implemented JTAG logic

comply with the standard to allow it to be interoperable with other devices that follow the same

standard.

Boundary scan test patterns need to be generated using the BSD Compiler. The generated

test patterns are then simulated using TetraMAX.

5.3.4 Workflow for test generation

The workflow for test vector generation using TetraMAX is represented in figure 5.7. This flow

assumes that a STIL protocol file is available for the synthesized design. This is possible by using

the DFT Compiler that automatically generates the test protocol, as is the case in this work. If not,

a manually generated STIL protocol file has to be written or could be created, using TetraMAX,

for simple designs without test structures.

Before reading the design netlist, all used cells must be read into the internal library. This is

done using the “read netlist” command with the -library option, so that the Verilog models for

56

5.3 Workflow

Read Netlist

Read Library Models

Build ATPG Model

Perform Test Design Rule Checking (DRC)

Prepare to Run ATPG

Run ATPG

Review Test Coverage

Rerun ATPG

Save Test Patterns

Read netlist

Run build_model

Set drcRun drc

Set atpgRemove faultsAdd faults

Run atpg

Report

Run atpg

Write Patterns

Read netlist

Figure 5.7: Synopsys TetraMAX ATPG Workflow.

TetraMAX of the I/O, standard cells and memory blocks, supplied by Faraday, are imported to the

internal library.

After reading the library cells, the design is read using the same read netlist command. Then,

after importing the design, the circuit model for ATPG generation is built. This is done with the

“run build model” command. The STIL protocol file is then defined, using the set drc command.

The ATPG settings are set next. In this work, the SSF model was used and all stuck-at faults

were added. Among these, some will be inherently untestable, due to set constraints on the pri-

mary inputs (e.g. since the test mode signal is constrained to a high value during test procedures,

a stuck-at-1 fault in this node is undetectable).

Initially, a basic-scan ATPG was done. This is the fastest test generation mode and will detect

most of the faults in the design, since this is a full scan design (all flip-flops are included in scan

chains). Nevertheless, some faults will remain untestable due to the existent memory modules

(these are also considered sequential elements).

A second ATPG was done, but in full-sequential mode. This run took longer, but detected

some additional faults that were not previously detected.

The generated test patterns where then saved in the necessary formats to be exported to the

ATE and to be simulated in a logical simulator. Additionally, a fault simulation using the TetraMAX

simulation engine might be performed. This is possible in this particular flow, but is normally used

only when pattern generation is done outside of TetraMAX and the later is only used to verify fault

coverage.

57


58

6BackEnd - From Verilog netlist to

GDS Layout

Contents6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 06.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

59

6. BackEnd - From Verilog netlist to GDS Layout

6.1 Introduction

A verilog netlist represents the logical connections between several components. These com-

ponents have to be physically placed on the die and connected to achieve a functional circuit.

Creating the power structures is also necessary to provide enough power for each cell in the

chip. This section describes the workflow followed to reach a final layout (described in Graphic

Data System (GDS) format), which is then used to produce the fabrication masks, starting from a

verilog netlist and using the Cadence Encounter platform.

6.2 Tools

The tools used for Place and Route belong to the Cadence Encounter family of products.

The Cadence software is largely used in the industry and is the de facto reference software for

placement and routing. Moreover, it is supported by major foundries which supply the necessary

files and libraries for this software.

Of the various Cadence packages available, the SoC Encounter product was selected. This

package of tools is used to make the placement, power planning, routing, clock tree synthesis,

optimizations and GDSII generation. Table 6.1 shows the versions of the tools used from the SoC

Encounter package.

Table 6.1: Cadence tools versions

Tool VersionFirst Encounter v04.10-s374 1 (32bit) 05/12/2005 20:09 (Linux 2.4)NanoRoute Version v04.10-s891 NR050505-1434/USR29-UB (database ver-

sion 2.30, 20) super threading v1.4

6.2.1 First Encounter

The main application in the SoC Encounter package is the First Encounter software. Some of

the functions are performed directly by First Encounter while others are performed by other tools

that are executed by the First Encounter software using its interface.

The First Encounter tool requires a technology description file and a physical library that

describes the standard cells. Both of these items should be available in Library Exchange

Format (LEF) and must be provided by the foundry.

First Encounter is able to perform RC parameter extraction of the routed design. For such

purpose a capacitance file should be provided to achieve better quality of results. Otherwise, First

Encounter can extract RC values based on capacitance and resistance values that it calculates,

using default process parameters and heuristic equations. A 3-D field solver, in this case the

Cadence Field Solver (Coyote), is used to calculate the capacitance values.

60

6.3 Workflow

Delay calculation is also performed by the Encounter software, if the RC values and the cell

timing libraries have been supplied. Encounter is capable of reading the timing libraries in Synop-

sys Technology Library format (.lib) or in Timing Library Format (.tlf).

The synthesis of the clock tree is also performed by Encounter. Clock Tree Synthesis (CTS)

analyzes the clocks in a design and inserts buffers (or inverters) to reduce or eliminate clock

skew. The CTS process can be performed in automatic or manual mode. In automatic mode, the

number of buffer levels and the number of buffers per level are automatically determined based on

the timing constraints set in the clock tree specification file (e.g. maximum clock skew, maximum

and minimum delay), which are set by the designer. In manual mode, the number of levels and the

number of buffers per level are individually set by the designer before performing the synthesis of

the clock tree.

Encounter is also able to perform power analysis. It analyzes the power usage, power grid

IR drop and power grid electromigration of a design [22]. This analysis should be performed at

the sign-off stage to validate the circuit’s power structures. Nevertheless, the analysis relies on

information that may not be available in all standard cells libraries.

6.2.2 NanoRoute

NanoRoute is Cadence’s recommended routing engine. It performs concurrent signal integrity,

timing-driven and manufacturing aware routing of cell, block, or mixed cell and block level de-

signs [23].

NanoRoute is usually invoked by Encounter, but is able to work in standalone mode. When

in standalone mode, it can work using a graphical interface or in batch mode. In this project, the

NanoRoute software is primarily invoked from First Encounter except when performing the LVS

check in which it is run in standalone mode.

NanoRoute performs routing in two stages: global and detail routing. The global routing stage

minimizes congestion and optimizes signal timing by performing global interconnection planning.

This plan is created by routing signal nets at the global cell level. The detailed routing stage

creates the final routing by implementing nets according to design rules, and connecting the pins

of each cell or block to the corresponding nets. During the detailed routing, NanoRoute also

automatically performs search-and-repair, if there are any remaining problems in the circuit nets.

The NanoRoute tool automatically determines when to stop the search-and-repair process [23].

6.3 Workflow

The workflow, using Cadence Encounter, to produce a layout from a verilog netlist is presented

in figure 6.1. The script that implements this workflow can be found in section B.2.2.

Initial data preparation was performed before running the Encounter software. This step in-

61


Pre-Placement Optimization

Floorplanning

Pre-CTS Optimization

Clock Tree Synthesis

Post-CTS Optimization

Routing

Post-Route Optimization

Analysis and Sign-Off

Data Preparation

Layout (GDSII format)

DRC

Repair violations

Placement

Power Planning

Figure 6.1: Design flow for Encounter.

cluded preparing the capacitance table, the standard cells timing library and creating the I/O

assignment file. A capacitance table should be created to achieve better quality of results in the

extraction of RC parameters [22]. To create this capacitance table, the Coyote 3-D field solver was

used. The field solver requires a technology description file, in ICT (IceCaps Technology file) for-

mat, that is supplied by UMC for the adopted process technology. This technology description file

describes the process parameters (e.g. the thickness of the conducting layers, the interlayer pla-

nar dielectric constant and its thickness, the conductors resistance, etc.). The resistance values

are directly defined in the technology description file while the capacitance values are calculated

based on information provided in the same file. This is a one-time operation and the generated

capacitance table can be used for future designs using the same process.

In the adopted cell library, provided by Faraday, the timing files are only available in lib format.

Although Encounter is supposed to read these files, to build the internal cell timing library, it was

not able to do so. Therefore, tlf format files were generated using a Synopsys utility named syn2tlf

that converted lib format files into tlf format files to be used by Encounter.

The I/O assignment file specifies the location of the various I/O cells around the die periphery.

This file was created to implement the disposition of the I/O cells according to the diagram in

section 4.6. It is in this I/O assignment file that the ground and power cells were instantiated and

placed. The corner cells, which provide continuity of the I/O power and ground rings, were also

instantiated and placed using this file.

Besides the library and technology related files, Encounter requires a Verilog netlist with the

design information, which, in this work, is the result of the synthesis process by Design Compiler.

62

6.3 Workflow

It also requires a timing constraints file, in Synopsys Design Contraints (SDC) format, for timing

oriented optimizations. This file was also previously generated by Design Compiler during the

frontend phase and contains the timing information of the clock signals.

Encounter uses cell footprint information to determine functionally equivalent cells so it is able

to perform optimizations, such as replacing a given buffer with a higher drive capacity buffer (buffer

resizing). These footprints are set, for each cell, in the standard cell library files. Moreover, En-

counter requires the footprints of buffer, inverter and delay cells to be defined, so it can use them

during the optimizations that are performed in buffer resizing. Nevertheless, the available library

defines equal footprints for buffer and delay cells even though it has specific delay cells [15]. Al-

though a delay cell may be equivalent to a series of buffers, its function should be restricted to

providing delays. With this standard cell library, Encounter will use buffer and delay cells indis-

tinctively. Moreover, the stelected standard cell library also provides dedicated cells (buffers and

inverters) for the clock signals which must also be specified in Encounter in order to be used

during the synthesis of the clock tree.

After reading in the design, a pre-placement optimization is done. This first optimization step

was used to remove buffers that could have been placed by the Design Compiler (the synthesis

tool) in order to comply with the timing constraints. Since Encounter will also perform timing

driven optimizations, it will add the required buffers where they are needed, taking into account

the placement and routing information, which was unavailable during the synthesis phase.

The floorplaning step is performed next. Determining the dimensions of the floorplan can be

an iterative process and, in a normal flow, the occupied area should be minimized. However, since

EUROPRACTICE defines discrete dimensions for the design, the iterative process to minimize the

area was not performed in this work. Instead, the available space, in the die, was used to spread

as much as possible the I/O buffers in order to minimize the dissipated power per area unit.

Due to the number of I/O cells needed, including the pads, and the required pad pitch, a single

sub-block of 1525 x 1525 µm would not be enough to fit all I/O cells using inline I/O cells with inline

pads. However, if staggered pads and the same inline I/O cells were used, all of the I/O cells and

the pads would fit using only one sub-block. Nevertheless, that would reduce the core size up to

a point where the core logic and memory blocks would not fit. Furthermore, if staggered I/O cells

and staggered pads were used, the available core area would be further reduced. Therefore, the

use of two sub-blocks (3240 x 1525 µm) is required.

The floorplan dimensions that were set in the Encounter software are the dimensions of the

area where the core and I/O cells will be placed. According to UMC Topological Layout Rules [24],

a die seal ring must be present in the final layout. EUROPRACTICE will add this die seal ring,

in accordance with UMC rules, outside of the stipulated design area, so the designer does not

need to account for this structure in the design area dimensions. This die seal ring will have

a minimum width of 10µm and a minimum spacing between the pad metal edge and the seal

63


ring of 10µm. The pad zone area is not accounted for when setting the floorplan dimensions in

Encounter. Therefore, the dimensions set for the floorplan are the dimensions of the die (two

sub-blocks) deduced of the pad dimensions (2 x 79µm ∼= 160µm). Figure 6.2 shows the die block

and floorplan dimensions (which includes the I/O cells zone).

3240 µm

1525

µm

3080µm

1365

µm

PAD zone

PAD zone

PA

D z

one P

AD

zone

I/O cell zone

I/O cell zone

I/O

ce

ll zo

ne

I/O cell zone

Core

Figure 6.2: Die block size, floorplan and core size.

After defining the floorplan, the previously created I/O cell position file was loaded. The po-

sition of the I/O cells may be altered if it is detected that, after an initial placement, a better

disposition of cells would improve the design routability, or, after power grid analysis, there would

be the need to change the number of power and ground connections.

The memory blocks (hard blocks) were placed before inserting the power structures or any

other cells. The hard blocks can be either manually or automatically placed. The memories

must be placed taking into account their power dissipation, because if they are placed too close

together, the temperature in that area might increase above the recommended values. Never-

theless, placing the memory blocks too far apart could negatively influence the compliance with

the timing constraints. In this work, an initial automatic placement of the cells was made in order

to set the location of the memory blocks, according to the timing constraints. However, this au-

tomatic placement does not take into account the power dissipation of the memories. As such,

this initial placement could be considered a guide to find the optimal position in terms of both

timing constraints and power dissipation. Nevertheless, the initial automatic placement placed

the memories too close. Therefore, these were placed further apart and their status was set to

pre-placed, which indicates that the next placement iteration should keep these blocks in their

pre-set position. A block halo (empty zone around the blocks) was added to the memory blocks,

64

6.3 Workflow

to prevent any cell from being placed in this area, in order to reserve a space to add power and

ground rings to these blocks. Furthermore, this block halo also avoids design rule violations that

may occur when standard cells are adjacent to these memories.

After placing the memory blocks, the main power structures were added during the power

planning stage. Designing a power grid can be difficult and reaching a satisfactory result may

involve an iterative process. An initial power structure was implemented and, afterwards, verified.

If needed, the power structures would be redesigned to correct any problems. The initial power

structure is composed of one power and one ground rings, each being 20µm wide and 13 evenly

spaced pairs of stripes (one stripe for the power net and another for the ground net) of 10µm width

each. Additionally, there is, for each of the memory blocks, one power and ground rings (block

power ring) with 10µm width. The global power and ground ring was implemented in the higher

metal layers (metal4 and metal5) because these metal layers have less resistance than the lowest

metal layers [24] and, as this ring will support all the current supply to the chip, it is a probable

candidate for high IR drop. The memory power and ground rings were also implemented using

the higher metal layers, in order to reduce the IR drop. From this point forward, the term ”power

net/ring” refers to both the power and ground net/ring.

Encounter supports designs with more than one power domain. Therefore, Encounter needs

to know to which power net it must connect each cell’s power and ground pins. At this phase, the

global power ring, the memories power rings and the power stripes are implemented. However,

the power connections to each individual cell are not yet implemented, but simply described (using

the globalNetConnect command). The power routing process, performed later, will effectively

connect the existent power structures to the cells and memory power pins using the appropriate

metal tracks.

Timing driven placement with high effort was performed next. This placement strategy is

performed to place the cells (excluding the memories which were pre-placed) in order to achieve

the best timing. During this process several placement and trial routing iterations are automatically

done until a solution is reached.

After the placement and the power planning stages, but before synthesizing the clock tree, an

optimization was performed. In this pre-CTS optimization phase, the Encounter software performs

the replacement of cells with other, functionally equivalent, cells but with different driving capaci-

ties (gate resizing). It also performs global buffer insertion and netlist restructuring to repair setup

time violations and design rule violations and improve the timing slacks (the difference between

the calculated timing value and the timing constraint) [25].

The next step, in the backend design flow, was the synthesis of the clock tree. The Clock Tree

Synthesis (CTS) configuration file includes constraints information about maximum clock skew,

maximum and minimum delays and the maximum depth of logic in the clock tree. The maximum

clock skew value was set at 300ps while the maximum delay was set at 1.5ns and the minimum

65


delay at 0ns. The maximum depth of logic levels of the clock tree was set at 8 levels. Several

other options are available to control the synthesis of the clock tree, but are not needed for this

design. To build the clock tree, CTS routes the clock networks, based on the constraints set on the

configuration file, and then optimizes the clock tree to improve the skew including resizing buffers

or inverters, adding buffers, refining placement, and correcting routing [25].

Just after synthesis of the clock tree, a new optimization was done. Post-CTS optimization

repairs remaining design rule violations, setup time and hold time violations (only if the setup time

is not worsened) and corrects the timing information [22].

The addition of filler cells in the core and empty cells in the I/O zone was done before the

routing phase. Filler and empty cells exist in different sizes. The filler and the empty cells were

added, starting with the widest cell and ending with the straightest, in order to occupy empty

spaces, in the core and in the I/O ring respectively, with the widest cells first. This is particularly

relevant with filler cells, due to the fact that the straightest filler cell does not provide decoupling

capacity. Therefore, widest cells should be added in the first place. If there is the need to insert

new cells after routing (e.g. antenna diodes), these filler cells can be automatically removed.

The routing phase starts with the routing of the power structures. The SRoute engine is used

to perform routing of special nets like power nets. After power routing, all cells (including I/Os)

and blocks were connected to the power nets.

The global and detailed routing were performed next by the NanoRoute routing engine. The

first routing iteration was timing driven in order to meet the timing constraints. When performing a

non-timing-driven routing, NanoRoute might detour some nets in order to avoid creating violations

but when performing timing-driven routing it does not detour timing critical nets. Instead, it forces

them to be routed as short as possible, which can create congestion and violate design rules.

Later, when design-rule checking takes precedence, these nets will be detoured [22].

Since after the timing driven routing, there were design rule violations (e.g. a short circuit

between the clock network and a power network), these had to be fixed. Encounter has the

ability to delete the violating nets and then perform routing of the deleted nets. As such, the

violating nets were deleted and Engineering Change Order (ECO) routing used to perform routing

of the changed (deleted) nets. This routing step needs to be non-timing-driven, in order to avoid

creating other violations. Since, after this second routing iteration, there were no violating nets, it

is possible to follow to the next step.

After a design rule violation free design was reached, a post-route optimization was made.

This optimization step fixes the timing problems and, additionally, the design rule violations that

may have been introduced by this optimization process.

At this phase the design was routed and RC extraction and final timing verification was done.

RC parameters were extracted from the final layout in order to perform the delay calculation. The

result of the delay calculation is a back-annotation Standard Delay File (SDF) file that contains

66

6.3 Workflow

timing information concerning the nets delays and that can be used in a final simulation of the

design.

A power grid analysis was performed to validate the correct planning of the power structures.

To perform this analysis, Encounter needs a pad location file. This file indicates the source lo-

cation for the power nets. Usually, the source location are the power I/O cells, but this can be

changed if studying other locations for the power cells is necessary. The pad location file was

manually created in order to define the location of the core’s power I/O cells as the location of the

power nets sources. After performing the power analysis, there were no detected violations of

electromigration rules. Additionally, the maximum IR drop was within accepted values (5mV). An

estimate of the consumed power was also performed.

At this stage, the design was saved in DEF format so that it could be imported into the

NanoRoute tool for a final LVS verification. This LVS check certified that the layout connections

corresponded to the netlist connections.

After performing all validations, the design is ready to be exported to GDSII format. This file

contains the geometry of the metal layers, used in routing, as well as the position of the cells

in the layout. To generate this file, a mapping file had to be supplied. This mapping file makes

the correspondence between the layer names used in the Encounter software (e.g. metal1, via,

metal2) and the corresponding GDS layout layer number according to UMC Rules.

This GDSII file containing the layout of the metal layers, used in routing, is sent to EURO-

PRACTICE which will merge this information with the standard cell’s layout. Therefore, a complete

GDSII file containing the complete layout information is now achieved.

This concludes the steps required to obtain a GDSII file, describing the processor’s layout,

using a standard cell library, described in the VHDL source code.

The complete GDSII file is then sent to the foundry that will use it to produce the masks used

in the IC manufacture.

67


68

7Results

Contents7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 07.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

69

7. Results

7.1 Introduction

After performing the synthesis, the placement and the routing steps, the design is complete.

This chapter presents the obtained circuit layout and the results from the final timing analysis of

the circuit. Obtained values from the synthesis process are also presented in order to compare

the cost of inserting test structures. All of the tools were run in a Linux machine (CentOS release

4.4) with a Dual Core AMD Opteron Processor at 2GHz and with 2GB of memory.

7.2 Results

To assess the impact of the test structures on the circuit’s area and performance, four circuits

were initially synthesized: the AMEP processor without any test structures (Basic), with the ad-

ditional memory BIST controllers (Basic + Mem test), with the memory BIST controllers and the

scan chains (Basic + Mem test + scan) and with the memory BIST structures, the scan chains and

the IEEE 1149.1 compliant boundary scan logic (Basic + Mem test + scan + JTAG). The presented

results were obtained by the synthesis tool and are used only for comparing the different costs of

the implemented test structures. The worst conditions defined in the libraries were used to obtain

these values. The presented area values are the sum of the values reported by the synthesis tool

with the value of the area occupied by the three memories (0.67mm2). This is necessary because

the synthesis tool is unaware of the area occupied by the memories, because available memory

libraries do not include a view for the this tool. The resulting areas and the minimum clock period,

after synthesis, are summarized in table 7.1.

Table 7.1: Results from synthesis tool.

OccupiedArea (mm 2)

Relative Area Minimum Pe-riod (ns)

Basic 1.17 100.0% 9.99Basic + Mem test 1.20 102.8% 9.99Basic + Mem test + scan 1.21 103.9% 10.00Basic + Mem test + scan + JTAG 1.90 162.7% 10.00

These results demonstrate that the insertion of dedicated test structures, like the memory

BIST controllers and the scan chains has a relative small impact either on the circuit timing or on

the occupied area. However, the area increase due to the implementation of the boudary scan

logic (IEEE 1149.1) is significant. This increment results mainly from the implementation of the

TAP controller of the IEEE 1149.1 standard. Nevertheless, it will be included in the final circuit

since there is enough area available on the die and it will allow the test of the connections at a

board level.

The results obtained after placement and routing, of the circuit with the memory BIST con-

trollers, the scan chains and the IEEE 1149.1 compliant boundary scan logic, are presented in

70

7.2 Results

table 7.2. The power consumption value is an estimate, of the power of the core, performed by

encounter based on a net toggle probability of 45%. Since a 50% probability would indicate that

every net in the circuit would toggle on every positive edge of the clock, a 45% for the overall

net toggle probability is a reasonable estimate, since the test dedicated parts of the circuit will be

disabled in normal function mode, thus reducing the overall net toggle probability.

Table 7.2: Layout results.

Occupied Area 4.9 mm2

Minimum Period 9.8 nsConsumed Power (@100MHz) 14.5 mW

From these results, it can be seen that the initial wire load model used in the synthesis tool

was a good estimate, since the design met the set timing constraints (the circuit is able to run with

a maximum clock frequency of 102MHz). With a power consumption of 14.5mW @ 100MHz, the

AMEP processor meets its requirements for power consumption and therefore it will be capable

of efficiently implementing motion estimation algorithms in battery-supplied devices.

Furthermore, the power analysis performed by Encounter also indicated that the maximum

current density values present at the various layers and vias are within the recommended values

by UMC, in order to comply with electromigration rules at a temperature of 125◦C [24]. Table 7.3

summarizes these values according to the process layer. Note that the current values at the metal

6 layer is zero, since this metal layer was not used for routing.

Table 7.3: Power analysis results.

Layer/Via Maximum [24] ActualMetal 1 0.44 mA/µm 0.22 mA/µmMetal 2 0.53 mA/µm 0.02 mA/µmMetal 3 0.53 mA/µm 0.13 mA/µmMetal 4 0.53 mA/µm 0.15 mA/µmMetal 5 0.53 mA/µm 0.05 mA/µmMetal 6 0.89 mA/µm 0.00 mA/µmVia12 0.21 mA/cut 0.02mA/cutVia23 0.21 mA/cut 0.01mA/cutVia34 0.21 mA/cut 0.01mA/cutVia45 0.21 mA/cut 0.01mA/cutVia56 0.21 mA/cut 0.01mA/cut

The maximum estimated IR drop, calculated by Encounter, is 5mV which is an acceptable

value for this circuit and this technology (it is 0.3% of the 1.8V supply voltage). As a conclusion, the

current density values and the maximum IR drop values indicate that the initial power structures

were adequately sized.

The AMEP final design, including two scan chains and the JTAG boundary scan logic (IEEE

1149.1), has been obtained using the design flow described in Chapter 5 and Chapter 6. The final

layout of the circuit is presented in figure 7.1 where different blocks and cells are identified.

71

7. Results

Search Area Memory

Macroblock Memory Instruction MemoryPower Rings

I/O Cell PadCorner Cell

Figure 7.1: AMEP chip layout.

It can be observed that the memory blocks (search area, macroblock and instruction mem-

ories) are distributed through the die to avoid excessive temperature. The I/O cells and their

respective pads, as well as the corner cells, are present at the periphery of the die. The global

power ring is also visible in the space between the I/O cells and the core area.

A final LVS check and simulation were performed and confirmed the validity of this layout.

72

8Conclusions

Contents8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5

73

8. Conclusions

8.1 Conclusions

Motion Estimation is the most computationally expensive part of a video encoder system.

Therefore, an efficient architecture for motion estimation was proposed in [2].

In this dissertation, a structured workflow was defined, and followed, to implement the Adaptive

Motion Estimation Processor (AMEP) [2] on an Application Specific Integrated Circuit (ASIC) using

a standard cell library. The implementation of the AMEP circuit starts from a VHDL description

and ends with the final physical description of the layout, exported in a GDSII format file, which is

sent to the foundry for manufacture. During this process, several EDA tools were used to perform

the various steps.

The Synopsys Inc. software was used to perform the steps of the frontend phase (synthesis

and insertion of test structures) of this project, while the Cadence SoC Encounter platform was

used for the steps in the backend phase (placement, routing and sign-off analysis).

The UMC CMOS L180 1P6M MM/RFCMOS process technology [24] with the corresponding

standard cell library from Faraday Technology Corporation [15] were chosen to implement this

circuit.

A special attention was put into enhancing the circuit testability in order to validate the circuit

after being manufactured and assist in the detection of eventual design errors. Therefore, a

memory BIST controller was designed and implemented to allow the test of the memories used

in the processor. This controller implements functions that are usefull during the protoype stage

of the processor, such as allowing the address of a failing memory position to be extracted. The

implementation of this controller required to change and augment the VHDL description of the

processor to include this test dedicated hardware.

Furthermore, two scan chains were created, during the synthesis stage, to improve testability,

using the available options of the synthesis tool. Additionally, the IEEE 1149.1 TAP controller

and the associated Boundary Scan Register (BSR) were implemented to provide test capability

of the circuit’s interconnections, when integrated into a board, and allow control of the internal

memory BIST controllers, reducing the number of additional pins dedicated to test structures.

The test patterns, used to verify that the chip is properly manufactured, were generated using the

Synopsys TetraMAX tool.

After completing the steps in the frontend phase, a Verilog netlist was achieved, which repre-

sents the interconnections between the used standard cells that implement the circuit’s functions.

This netlist is the basis for the backend phase. This phase starts with the placement of the cells

inside the available die area. The three memory blocks were manually placed due to temperature

constraints. The remaining cells were automatically placed in order to achieve the best timing.

The power structures necessary to supply the required current to every cell in the chip were

created and, in the final verifications, validated. The synthesis of the clock tree was done by

74

8.2 Future Work

Encounter, and assured that the clock skew was less than 300ps which represents approximately

3% of the minimum clock period (9.8ns).

After performing the routing of the signal nets, a final timing analysis concluded that the mini-

mum clock period is 9.8ns, which corresponds to a maximum working frequency of 102MHz. An

LVS check and a simulation with timing details were performed and validated the circuit’s cor-

rect implementation. A power analysis was also performed and revealed that the created power

structures complied with the electromigration rules set by UMC and that the maximum IR drop

is 5mV, which is less than 1% of the 1.8V supply. This analysis also concluded that the core of

the manufactured processor will have a maximum power consumption of 14.5mW @ 100MHz.

Therefore, the low power consumption estimated for the manufactured chip makes it adequate to

perform motion estimation on battery-supplied devices.

8.2 Future Work

In order to produce the AMEP Integrated Circuit, the final layout will be sent to EUROPRAC-

TICE for manufacture in the 22th of October run. Afterwards, it will be encapsulated using a

CLCC68 package. Meanwhile, a connection board to make the interface of the manufactured

AMEP with an already existing video coding platform will be developed.

It is also required to develop the software needed to perform the test of the circuit. This soft-

ware is responsible for managing all the signals necessary to deliver the generated test vectors

through the scan chains and read the resulting output values. This software must also compare

the read values from the circuit with the expected outputs, in order to verify the correct manufac-

ture of the chip.

Moreover, the manufactured AMEP will be used in the video coding platform to perform real-

time motion estimation, while its power consumption is measured, to assess its compliance with

the low power constraints imposed by the battery-supplied devices.

75

8. Conclusions

76

Bibliography

[1] T. Dias, S. Momcilovic, N. Roma, and L. Sousa, “Adaptive motion estimation processor for

autonomous video devices,” EURASIP Journal on Embedded Systems, special issue on

Embedded Systems for Portable and Mobile Video Platforms, vol. 2007, no. 57234, pp. 1–

10, May 2007.

[2] S. Momcilovic, T. Dias, N. Roma, and L. Sousa, “Application specific instruction set processor

for adaptive video motion estimation,” in Proc. of 9th EUROMICRO Conference on Digital

System Design: Architectures, Methods and Tools - DSD’2006. IEEE Computer Society,

August 2006, pp. 160–167.

[3] N. Roma, “Processadores dedicados para estimacao de movimento em sequencias de

vıdeo,” Master’s thesis, Universidade Tecnica de Lisboa - Instituto Superior Tecnico, Lisboa,

Jan. 2001.

[4] V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms

and Architectures. Kluwer Academic Publishers, 1995.

[5] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems Testing and Testable

Design. COMPUTER SCIENCE PRESS, 1990.

[6] TetraMAX ATPG User Guide (Version Y-2006.06), Synopsys, Inc., June 2006.

[7] DFT Compiler Understanding Test Automation User Guide (DB Mode) (Version X-2005.09),

Synopsys, Inc., September 2005.

[8] IEEE 1149.1-2001 - Standard Test Access Port and Boundary-Scan Architecture, IEEE, June

2001.

[9] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing. Kluwer Academic Pub-

lishers, 2000.

[10] “An optimal march test for locating faults in drams,” in Records of the 1993 IEEE International

Workshop on Memory Testing. IEEE Computer Society, August 1993, pp. 61–66.

[11] I. Koren, “Should yield be a design objective?” in Proc. IEEE 2000 First International Sym-

posium on Quality Electronic Design. IEEE Computer Society, March 2000, pp. 115–120.

77

Bibliography

[12] N. Harrison, “A simple via duplication tool for yield enhancement,” in Proc. of the 2001 IEEE

International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01). IEEE

Computer Society, October 2001, pp. 39–47.

[13] H. H. Chen and C. K. Wong, “Wiring for manufacturability and yield maximization in computer-

aided vlsi design,” in Proc. of Technical Papers. 1993 International Symposium on VLSI Tech-

nology, Systems, and Applications. IEEE Computer Society, May 1993, pp. 68–72.

[14] eSi-Route/11TMHigh Performance 0.18µ Standard Cell Library - Part Number: UMCL18U250

(Rev. 2.4), Virtual Silicon Technology, Inc, November 2001.

[15] FARADAY ASIC CELL LIBRARY FSA0A C 0.18µm STANDARD CELL (v1.0), Faraday Tech-

nology Corporation, August 2004.

[16] Bonding Pad Layout Guidelines (Ver. 5 P1), UMC, October 2001.

[17] 0.18µm (FSA0A C) Standard Cell Library ESD Application Note (v1.0), Faraday Technology

Corporation, September 2004.

[18] Ceramic packaging guidelines for UMC technologies (v1.1), EUROPRACTICE IC SERVICE,

December 2003.

[19] Design Compiler User Guide (Version Y-2006.06), Synopsys, Inc., June 2006.

[20] DFT Compiler User Guide Vol. 1: Scan (XG Mode) (Version Y-2006.06), Synopsys, Inc., June

2006.

[21] BSD Compiler User Guide (XG Mode) (Version Y-2006.06), Synopsys, Inc., June 2006.

[22] Encounter User Guide (Product Version 4.1.5), Cadence Design Systems, Inc., May 2005.

[23] NanoRoute Technology Reference (Product Version 4.1.5), Cadence Design Systems, Inc.,

May 2005.

[24] 0.18um Mixed-Mode and RFCMOS 1.8V/3.3V 1P6M Metal Metal Capacitor Process Tech-

nology Layout Rule (Ver. 2.9 P.1), UMC, May 2006.

[25] Encounter Timing Closure Guide (Product Version 4.1.3), Cadence Design Systems, Inc.,

December 2004.

78

AVHDL Code

ContentsA.1 Memory Test Controller VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . 80

79

A. VHDL Code

A.1 Memory Test Controller VHDL Code

---------------------------------------------------------------------------------------------------------------------------------------------------------------- File : mem_wrapper_1port .vhd-- Author(s) : Nuno Sebastiao-- Date : 14/02/07--------------------------------------------------------------------------------- Description :-- Interface for memory module with integrated BIST controller .-- ADDRESS_WIDTH is the memory address width-- DATA_WIDTH is the memory data width-- BYTEWRITE is different than ’0’ if bytewrite capability exists in the memory.---- Note: DATA_WIDTH must be greater than 8-- If BYTEWRITE is available , DATA_WIDTH must be divisible by 8--------------------------------------------------------------------------------------------------------------------------------------------------------------

library IEEE;use IEEE.STD_LOGIC_1164.all;

entity SU180_1024X8X2BM1_WRAPPER isport(

CLK : in STD_LOGIC;CS : in STD_LOGIC;OE : in STD_LOGIC;nWEl : in STD_LOGIC; -- Write enable signalnWEh : in STD_LOGIC; -- Write enable signalADDR : in STD_LOGIC_VECTOR (9 downto 0);DI : in STD_LOGIC_VECTOR (15 downto 0);DO : out STD_LOGIC_VECTOR (15 downto 0);bisten : in STD_LOGIC;bistgo : in STD_LOGIC;bistrst : in STD_LOGIC;bistrslt : out STD_LOGIC;bistend : out STD_LOGIC

);end SU180_1024X8X2BM1_WRAPPER;

architecture Behavioral of SU180_1024X8X2BM1_WRAPPER is

-- if BIST = 1, the memory bist controller will be synthesized.constant BIST : INTEGER := 1;constant DATA_WIDTH : POSITIVE := 16;constant ADDRESS_WIDTH : POSITIVE := 10;constant BYTEWRITE : INTEGER := 1;

component mem_bist_controller_1portgeneric (

ADDRESS_WIDTH : POSITIVE := 8;DATA_WIDTH : POSITIVE := 8;BYTEWRITE : INTEGER := 0

);port(

clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;go : in STD_LOGIC;bistctr_din : in STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistbwen : out STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);bistaddr : out STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);bistctr_dout : out STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0)

);end component;

component SU180_1024X8X2BM1port(

A0 : IN std_logic;A1 : IN std_logic;A2 : IN std_logic;A3 : IN std_logic;A4 : IN std_logic;A5 : IN std_logic;A6 : IN std_logic;A7 : IN std_logic;A8 : IN std_logic;A9 : IN std_logic;DO0 : OUT std_logic;DO1 : OUT std_logic;DO2 : OUT std_logic;DO3 : OUT std_logic;DO4 : OUT std_logic;

80


DO5 : OUT std_logic;DO6 : OUT std_logic;DO7 : OUT std_logic;DO8 : OUT std_logic;DO9 : OUT std_logic;DO10 : OUT std_logic;DO11 : OUT std_logic;DO12 : OUT std_logic;DO13 : OUT std_logic;DO14 : OUT std_logic;DO15 : OUT std_logic;DI0 : IN std_logic;DI1 : IN std_logic;DI2 : IN std_logic;DI3 : IN std_logic;DI4 : IN std_logic;DI5 : IN std_logic;DI6 : IN std_logic;DI7 : IN std_logic;DI8 : IN std_logic;DI9 : IN std_logic;DI10 : IN std_logic;DI11 : IN std_logic;DI12 : IN std_logic;DI13 : IN std_logic;DI14 : IN std_logic;DI15 : IN std_logic;WEB0 : IN std_logic;WEB1 : IN std_logic;CK : IN std_logic;CS : IN std_logic;OE : IN std_logic

);end component;

component reg_egeneric (

WIDTH : POSITIVE :=32);

port (CLK : in STD_LOGIC;CE : in STD_LOGIC;Din : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);Dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)

);end component;

component mux_2to1generic (

WIDTH : POSITIVE);port (

S : in STD_LOGIC;A : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);B : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);O : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)

);end component;

signal mem_cen , mem_oen : STD_LOGIC;signal mem_wen : STD_LOGIC;signal mem_di : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal mem_adr : STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);signal mem_bwen , mem_bwen_s : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);signal bwen : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);

signal dout_s : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal fault_dtct : STD_LOGIC;

signal bisten_s , bistcen , bistoen , bistwen : STD_LOGIC;signal bistdi : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal bistadr : STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);signal bistbwen : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);

begin

bwen (0) <= nWEl;bwen (1) <= nWEh;

--mem_bwen and mem_wen are active lowMEM_BWEN_GEN : for i in DATA_WIDTH /8-1 downto 0 generate

mem_bwen(i) <= mem_bwen_s(i) or mem_wen;end generate;

BIST_CTRL_GEN_1PORT: if (BIST = 1) generateBIST_CTRL : mem_bist_controller_1port

generic map (ADDRESS_WIDTH => ADDRESS_WIDTH ,DATA_WIDTH => DATA_WIDTH ,

81

A. VHDL Code

BYTEWRITE => BYTEWRITE)

port map (clk => CLK ,rst => bistrst ,en => bisten ,go => bistgo ,bistctr_din => dout_s ,fault_dtct => fault_dtct ,bistend => bistend ,bisten => bisten_s ,bistoen => bistoen ,bistwen => bistwen ,bistcen => bistcen ,bistbwen => bistbwen ,bistaddr => bistadr ,bistctr_dout => bistdi

);

-- bisten_s is active lowMEM_ADRA_SEL : mux_2to1

generic map (WIDTH => ADDRESS_WIDTH

)port map (

S => bisten_s ,A => bistadr ,B => ADDR ,O => mem_adr

);

MEM_DINA_SEL : mux_2to1generic map (

WIDTH => DATA_WIDTH)port map (

S => bisten_s ,A => bistdi ,B => DI,O => mem_di

);

--wen/bistwen is active low--mem_wen is active lowMEM_WEN_SEL : process (bisten_s , bistwen)begin

case bisten_s iswhen ’0’ => mem_wen <= bistwen;when others => mem_wen <= ’0’;

end case;end process;

--bistcen is active low--cen is active high--mem_cen is active highMEM_CEN_SEL : process (bisten_s , bistcen , CS)begin

case bisten_s iswhen ’0’ => mem_cen <= not bistcen;when others => mem_cen <= CS;


--bistoen is active low--oen is active high--mem_oen is active highMEM_OEN_SEL : process (bisten_s , bistoen , OE)begin

case bisten_s iswhen ’0’ => mem_oen <= not bistoen;when others => mem_oen <= OE;


MEM_BWEN_SEL : mux_2to1generic map (

WIDTH => DATA_WIDTH /8)port map (

S => bisten_s ,A => bistbwen ,B => bwen ,O => mem_bwen_s

);end generate;

NO_BIST_CTRL: if (BIST /= 1) generatemem_adr <= ADDR;mem_di <= DI;mem_bwen_s <= bwen;mem_wen <= ’0’;mem_cen <= CS;

82


mem_oen <= OE;end generate;

MEM_1K_16_BW: if (( DATA_WIDTH = 16) and (ADDRESS_WIDTH = 10) and (BYTEWRITE /= 0))generate

RAM_1K_16 : SU180_1024X8X2BM1port map(

A0 => mem_adr (0),A1 => mem_adr (1),A2 => mem_adr (2),A3 => mem_adr (3),A4 => mem_adr (4),A5 => mem_adr (5),A6 => mem_adr (6),A7 => mem_adr (7),A8 => mem_adr (8),A9 => mem_adr (9),DO0 => dout_s (0),DO1 => dout_s (1),DO2 => dout_s (2),DO3 => dout_s (3),DO4 => dout_s (4),DO5 => dout_s (5),DO6 => dout_s (6),DO7 => dout_s (7),DO8 => dout_s (8),DO9 => dout_s (9),DO10 => dout_s (10),DO11 => dout_s (11),DO12 => dout_s (12),DO13 => dout_s (13),DO14 => dout_s (14),DO15 => dout_s (15),DI0 => mem_di (0),DI1 => mem_di (1),DI2 => mem_di (2),DI3 => mem_di (3),DI4 => mem_di (4),DI5 => mem_di (5),DI6 => mem_di (6),DI7 => mem_di (7),DI8 => mem_di (8),DI9 => mem_di (9),DI10 => mem_di (10),DI11 => mem_di (11),DI12 => mem_di (12),DI13 => mem_di (13),DI14 => mem_di (14),DI15 => mem_di (15),WEB0 => mem_bwen (0),WEB1 => mem_bwen (1),CK => CLK ,CS => mem_cen ,OE => mem_oen

);end generate;

DO <= dout_s;bistrslt <= fault_dtct;

end Behavioral;

---------------------------------------------------------------------------------------------------------------------------------------------------------------- Project : AMEP-- Affiliations : PARSIG - Parallel Structures and Signal Processing-- SIPS - Signal Processing Systems Group-- INESC -ID - Institute for Systems and Computer Engineering:-- Research and Development in Lisbon-- Funding : FCT Project POSI/EEA -CPS /60765 (2005/01/01 -2008/12/31)--------------------------------------------------------------------------------- File : mem_bist_controller_1port.vhd-- Author(s) : Nuno Sebastiao-- Date : 02/07/07--------------------------------------------------------------------------------- Copyright (c) 2005 -8 Signal Processing Systems Group - INESC -ID , Lisbon--------------------------------------------------------------------------------- Description :-- Memory BIST Controller-- ADDRESS_WIDTH is the address width of the memory to be tested-- DATA_WIDTH is the data width of the memory to be tested-- BYTEWRITE is different than ’0’ if bytewrite capability exists in the memory.---- Note: DATA_WIDTH must be greater than 8 and divisible by PATTERN_WIDTH-- If BYTEWRITE is available , DATA_WIDTH must be divisible by 8---- PATTERN_WIDTH is the width of the Pattern bits generated by the state machine-- The state machine must be changed accordingly.--------------------------------------------------------------------------------------------------------------------------------------------------------------

--{{ Section below this comment is automatically maintained

83

A. VHDL Code

-- and may be overwritten--{entity { mem_bist_controller} architecture { mem_bist_controller }}


entity mem_bist_controller_1port isgeneric (

ADDRESS_WIDTH : POSITIVE := 8;DATA_WIDTH : POSITIVE := 8;BYTEWRITE : INTEGER := 0

);port(

clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;go : in STD_LOGIC;bistctr_din : in STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistbwen : out STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto 0);bistaddr : out STD_LOGIC_VECTOR(ADDRESS_WIDTH -1 downto 0);bistctr_dout : out STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0)

);end mem_bist_controller_1port;

architecture Behavioral of mem_bist_controller_1port is

constant PATTERN_WIDTH : POSITIVE := 2;

function sub_gt_zero (n,m : positive) return natural isvariable result : integer;

beginif (n>m) then

result := n-m;else

result := 0;end if;return result;

end sub_gt_zero;

component mem_bist_controller_smgeneric (

DATA_WIDTH : POSITIVE := 8);port (

clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;addr_gen_endcount : in STD_LOGIC;go : in STD_LOGIC;dout_cmp : in STD_LOGIC;bwen_end : in STD_LOGIC;fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistportsel : out STD_LOGIC;bistbwen_gen_en : out STD_LOGIC;bistbwen_gen_reset : out STD_LOGIC;bistbwen_gen_din : out STD_LOGIC;addr_gen_rst : out STD_LOGIC;addr_gen_en : out STD_LOGIC;addr_gen_dir : out STD_LOGIC;cmp_reg_en : out STD_LOGIC;pattern : out STD_LOGIC_VECTOR (1 downto 0)

);end component;

component updown_countergeneric (

WIDTH : POSITIVE := 8);port (

clk : in STD_LOGIC;en : in STD_LOGIC;dir : in STD_LOGIC;rst : in STD_LOGIC;count : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)

);end component;

component shift_reggeneric (

WIDTH : POSITIVE := 8);port (

84


clk : in STD_LOGIC;en : in STD_LOGIC;reset : in STD_LOGIC;din : in STD_LOGIC;dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)

);end component;

component cmp_eqgeneric (

WIDTH : POSITIVE);port (

A : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);B : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);O : out STD_LOGIC

);end component;

component reg_regeneric (


port (CLK : in STD_LOGIC;CE : in STD_LOGIC;

RST : in STD_LOGIC;Din : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);Dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)

);end component;

signal bistbwen_s , bistbwen_s_reg , byte_select : STD_LOGIC_VECTOR(DATA_WIDTH /8-1 downto0);

signal mem_data_to_cmp , mem_patt_to_cmp : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal bistctr_dout_s , bistctr_dout_s_reg : STD_LOGIC_VECTOR(DATA_WIDTH -1 downto 0);signal pattern : STD_LOGIC_VECTOR(PATTERN_WIDTH -1 downto 0);signal curr_addr , curr_addr_reg , end_up_addr , end_down_addr : STD_LOGIC_VECTOR(

ADDRESS_WIDTH -1 downto 0);signal bistwen_s , bistwen_s_reg : STD_LOGIC_VECTOR (0 downto 0);signal cmp_reg_en : STD_LOGIC;signal c_bwen , c_wen , c_dout , c_din , c_addr : STD_LOGIC;signal addr_gen_en , addr_gen_rst , addr_gen_dir , addr_gen_endcount : STD_LOGIC;signal bistbwen_gen_en , bistbwen_gen_reset , bistbwen_gen_din : STD_LOGIC;signal bwen_end : STD_LOGIC;

begin

SM: mem_bist_controller_sm generic map (DATA_WIDTH => DATA_WIDTH

)port map (

clk => clk ,rst => rst ,en => en,addr_gen_endcount => addr_gen_endcount ,go => go,dout_cmp => c_dout ,bwen_end => bwen_end ,fault_dtct => fault_dtct ,bistend => bistend ,bisten => bisten ,bistoen => bistoen ,bistwen => bistwen_s (0),bistcen => bistcen ,bistportsel => open ,bistbwen_gen_en => bistbwen_gen_en ,bistbwen_gen_reset => bistbwen_gen_reset ,bistbwen_gen_din => bistbwen_gen_din ,addr_gen_rst => addr_gen_rst ,addr_gen_en => addr_gen_en ,addr_gen_dir => addr_gen_dir ,cmp_reg_en => cmp_reg_en ,pattern => pattern

);

ADDR_GEN: updown_counter generic map(WIDTH => ADDRESS_WIDTH

)port map (

clk => clk ,en => addr_gen_en ,dir => addr_gen_dir ,rst => addr_gen_rst ,count => curr_addr

);

DATAOUT_CMP: cmp_eq generic map (WIDTH => DATA_WIDTH

)port map (

A => mem_patt_to_cmp ,

85

A. VHDL Code

B => mem_data_to_cmp ,O => c_dout

);

CURR_ADDRESS_REG: reg_re generic map (WIDTH => ADDRESS_WIDTH

)port map (

CLK => clk ,CE => cmp_reg_en ,RST => rst ,Din => curr_addr ,Dout => curr_addr_reg

);

MEM_DATAIN_REG: reg_re generic map (WIDTH => DATA_WIDTH

)port map (

CLK => clk ,CE => cmp_reg_en ,RST => rst ,Din => bistctr_dout_s ,Dout => bistctr_dout_s_reg

);

BISTCTR_DATAOUT: for i in 0 to (DATA_WIDTH/PATTERN_WIDTH -1) generatebistctr_dout_s(PATTERN_WIDTH*i+( PATTERN_WIDTH -1) downto PATTERN_WIDTH*i) <=

pattern;end generate BISTCTR_DATAOUT;

BYTEWRITE_STRUCTS: if (BYTEWRITE /= 0) generateBYTEWRITE_GEN: shift_reg generic map (

WIDTH => DATA_WIDTH /8)port map (

clk => clk ,en => bistbwen_gen_en ,reset => bistbwen_gen_reset ,din => bistbwen_gen_din ,dout => bistbwen_s

);

DATABYTESELECT: for i in 0 to (DATA_WIDTH /8-1) generatebyte_select(i) <= (not bistbwen_s_reg(i)) and en;

end generate DATABYTESELECT;

DATABYTESTOCMP: for i in 0 to (DATA_WIDTH /8-1) generateDATABYTEGEN: for j in 0 to 7 generate

mem_data_to_cmp(i*8+j) <= bistctr_din(i*8+j) and byte_select(i);

mem_patt_to_cmp(i*8+j) <= bistctr_dout_s_reg(i*8+j) andbyte_select(i);

end generate DATABYTEGEN;end generate DATABYTESTOCMP;

BWEN_REG: reg_re generic map (WIDTH => DATA_WIDTH /8

)port map (

CLK => clk ,CE => cmp_reg_en ,RST => rst ,Din => bistbwen_s ,Dout => bistbwen_s_reg

);

bwen_end <= bistbwen_s (0);

end generate BYTEWRITE_STRUCTS;

NOBYTEWRITE_STRUCTS: if (BYTEWRITE = 0) generateBYTEGEN: for i in 0 to (DATA_WIDTH -1) generate

mem_data_to_cmp(i) <= bistctr_din(i) and en;end generate BYTEGEN;

mem_patt_to_cmp <= bistctr_dout_s_reg;

c_bwen <= ’1’;

bwen_end <= ’1’;

bistbwen_s <= (others => ’0’);

end generate NOBYTEWRITE_STRUCTS;

ADDRESSENDCOUNT: process (curr_addr , addr_gen_dir , end_down_addr , end_up_addr)begin

if (( curr_addr = end_down_addr) and (addr_gen_dir = ’1’)) or (( curr_addr =

86


end_up_addr) and (addr_gen_dir = ’0’)) thenaddr_gen_endcount <= ’1’;

elseaddr_gen_endcount <= ’0’;

end if;end process;

bistwen <= bistwen_s (0);bistctr_dout <= bistctr_dout_s;bistaddr <= curr_addr;bistbwen <= bistbwen_s;end_up_addr <= (others => ’1’);end_down_addr <= (others => ’0’);

end Behavioral;

---------------------------------------------------------------------------------------------------------------------------------------------------------------- File : mem_bist_controller_sm .vhd-- Author(s) : Nuno Sebastiao-- Date : 07/02/07--------------------------------------------------------------------------------- Description :-- Memory BIST Controller State Machine for memory with bytewrite-- This controller implements the following MARCH test:-- {up(w01); up(r01 ,w10); up(r10 ,w01); down(r01 ,w10); down(r10 ,w01); up(r01);-- up(w00); up(r00 ,w11); down(r11 ,w00); up(r00)}-- The first 10 steps are done using the entire memory word lenght and using-- port A while the last 6 steps are done for every byte and using port B.--------------------------------------------------------------------------------------------------------------------------------------------------------------


entity mem_bist_controller_sm isgeneric (

DATA_WIDTH : POSITIVE := 8);port(

clk : in STD_LOGIC;rst : in STD_LOGIC;en : in STD_LOGIC;addr_gen_endcount : in STD_LOGIC;go : in STD_LOGIC;dout_cmp : in STD_LOGIC;bwen_end : in STD_LOGIC;fault_dtct : out STD_LOGIC;bistend : out STD_LOGIC;bisten : out STD_LOGIC;bistoen : out STD_LOGIC;bistwen : out STD_LOGIC;bistcen : out STD_LOGIC;bistportsel : out STD_LOGIC;bistbwen_gen_en : out STD_LOGIC;bistbwen_gen_reset : out STD_LOGIC;bistbwen_gen_din : out STD_LOGIC;addr_gen_rst : out STD_LOGIC;addr_gen_en : out STD_LOGIC;addr_gen_dir : out STD_LOGIC;cmp_reg_en : out STD_LOGIC;pattern : out STD_LOGIC_VECTOR (1 downto 0)

);end mem_bist_controller_sm;

architecture Behavioral of mem_bist_controller_sm is

component reg_regeneric (


port (CLK : in STD_LOGIC;CE : in STD_LOGIC;

RST : in STD_LOGIC;Din : in STD_LOGIC_VECTOR(WIDTH -1 downto 0);Dout : out STD_LOGIC_VECTOR(WIDTH -1 downto 0)

);end component;

type STATE_TYPE is (Idle , Init , Step1a , Step1 , Step2 , Step3 , Step4 , Step5 , Step6 , Step7, Step8 ,

Step9 , Step10 , Step11 , Step12 , Step13 , Step14 ,Step15 , Step16 ,

Init_1to2 , Init_3to4 , Init_5to6 , Init_7to8 ,Init_9to10 , Init_10to11 ,

Init_11to12 , Init_13to14 , Init_15to16 , Pause ,Finala , Final);

signal curr_state , next_state : STATE_TYPE;

87

A. VHDL Code

signal return_state , next_return : STATE_TYPE;signal n_fault , addr_gen_en_s : STD_LOGIC;signal output_preserv : STD_LOGIC_VECTOR (3 downto 0);signal output_preserv_en : STD_LOGIC;signal addr_gen_dir_s , bistbwen_gen_din_s : STD_LOGIC;signal pattern_s : STD_LOGIC_VECTOR (1 downto 0);signal outp_to_preserve : STD_LOGIC_VECTOR (3 downto 0);

begin

n_fault <= dout_cmp;addr_gen_dir <= addr_gen_dir_s;bistbwen_gen_din <= bistbwen_gen_din_s;pattern <= pattern_s;

STATE_UPDATE: process(clk , rst , en, next_state , next_return)begin

if (rst = ’1’) thencurr_state <= IDLE;return_state <= IDLE;

elsif (clk = ’1’ and clk ’event) thenif (en = ’1’) then

curr_state <= next_state;return_state <= next_return;

end if;end if;

end process;

NEXT_STATE_EVAL: process(curr_state , return_state , go, addr_gen_endcount , bwen_end ,n_fault)

begin

next_state <= curr_state;next_return <= return_state;

case curr_state is

when Idle => if (go = ’1’) thennext_state <= Init;

end if;when Init => next_state <= Step1a;

when Step1a => next_state <= Step1;next_return <= Step1;

when Step1 => if addr_gen_endcount = ’1’ thennext_state <= Init_1to2;next_return <= Init_1to2;

end if;if n_fault = ’0’ then

next_state <= Pause;end if;

when Init_1to2 => next_state <= Step2;next_return <= Step2;if n_fault = ’0’ then


when Step2 => next_state <= Step3;next_return <= Step3;if n_fault = ’0’ then



elsenext_state <= Step2;next_return <= Step2;








elsenext_state <= Step4;

88


next_return <= Step4;end if;if n_fault = ’0’ then

























when Step11 => if (addr_gen_endcount = ’1’ and bwen_end = ’1’) thennext_state <= Init_11to12;next_return <= Init_11to12;








89

A. VHDL Code














when Step16 => if (addr_gen_endcount = ’1’ and bwen_end = ’1’) thennext_state <= Finala;next_return <= Finala;



when Finala => next_state <= Final;next_return <= Final;if n_fault = ’0’ then


when Final => if go = ’1’ thennext_state <= Idle;

end if;

when Pause => if (go = ’1’) thennext_state <= return_state;

end if;

when others => next_state <= Idle;


----------------------------------------------------------------------------- Output preserve---------------------------------------------------------------------------

outp_to_preserve <= addr_gen_dir_s & bistbwen_gen_din_s & pattern_s;

OUTPUTPRESERVE_REG: reg_re generic map (WIDTH => 4

)port map (

CLK => clk ,CE => output_preserv_en ,RST => rst ,Din => outp_to_preserve ,Dout => output_preserv

);

----------------------------------------------------------------------------- Signal assignment statements for combinatorial outputs---------------------------------------------------------------------------

addr_gen_en_s <= not addr_gen_endcount and n_fault;

addr_gen_en_assignment:addr_gen_en <= ’1’ when (curr_state = Step1a) else

addr_gen_en_s when (curr_state = Step1) elseaddr_gen_en_s when (curr_state = Step3) elseaddr_gen_en_s when (curr_state = Step5) elseaddr_gen_en_s when (curr_state = Step7) elseaddr_gen_en_s when (curr_state = Step9) else

90


addr_gen_en_s when (curr_state = Step10) elseaddr_gen_en_s and bwen_end when (

curr_state = Step11) elseaddr_gen_en_s and bwen_end when (



curr_state = Step16) else’1’ when (

curr_state = Init_7to8) else -- forcounter wrap around

go and not addr_gen_endcount when (curr_state = Pause and return_state = Step1) else





go and not addr_gen_endcount when (curr_state = Pause and return_state =Step10) else

go and not addr_gen_endcount and bwen_endwhen (curr_state = Pause and

return_state = Step11) elsego and not addr_gen_endcount and bwen_end

when (curr_state = Pause andreturn_state = Step12) else

go and not addr_gen_endcount and bwen_endwhen (curr_state = Pause and

return_state = Step14) elsego and not addr_gen_endcount and bwen_end

when (curr_state = Pause andreturn_state = Step16) else

’0’;

addr_gen_rst_assignment:addr_gen_rst <= ’1’ when (curr_state = Init) else

’1’ when (curr_state = Init_1to2) else’1’ when (curr_state = Init_3to4) else’1’ when (curr_state = Init_9to10) else’1’ when (curr_state = Init_10to11) else’1’ when (curr_state = Init_11to12) else’1’ when (curr_state = Init_15to16) else’0’;

addr_gen_dir_s_assignment:addr_gen_dir_s <= ’1’ when (curr_state = Step7) else

’1’ when (curr_state = Init_7to8) else’1’ when (curr_state = Step9) else’1’ when (curr_state = Step15) else’0’ when (curr_state = Step1a) else’0’ when (curr_state = Step1) else’0’ when (curr_state = Step3) else’0’ when (curr_state = Step5) else’0’ when (curr_state = Step10) else’0’ when (curr_state = Step11) else’0’ when (curr_state = Step13) else’0’ when (curr_state = Step16) elseoutput_preserv (3) when (curr_state = Pause)

else’X’;

cmp_reg_en_assignment:cmp_reg_en <= ’1’ when (curr_state = Init) else

’1’ when (curr_state = Step1a) elsego when (curr_state = Pause) elsen_fault;

pattern_s_assignment:pattern_s <= "01" when (curr_state = Step1a) else

"01" when (curr_state = Step1) else"01" when (curr_state = Init_1to2) else"01" when (curr_state = Step2) else"10" when (curr_state = Step3) else"10" when (curr_state = Init_3to4) else"10" when (curr_state = Step4) else"01" when (curr_state = Step5) else"01" when (curr_state = Init_5to6) else"01" when (curr_state = Step6) else"10" when (curr_state = Step7) else"10" when (curr_state = Init_7to8) else"10" when (curr_state = Step8) else"01" when (curr_state = Step9) else"01" when (curr_state = Init_9to10) else"01" when (curr_state = Step10) else"01" when (curr_state = Init_10to11) else

91

A. VHDL Code

"00" when (curr_state = Step11) else"00" when (curr_state = Init_11to12) else"00" when (curr_state = Step12) else"11" when (curr_state = Step13) else"11" when (curr_state = Init_13to14) else"11" when (curr_state = Step14) else"00" when (curr_state = Step15) else"00" when (curr_state = Init_15to16) else"00" when (curr_state = Step16) elseoutput_preserv (1 downto 0) when (curr_state =

Pause) else"XX";

bistwen_assignment:bistwen <= ’0’ when (curr_state = Step1a) else

’0’ when (curr_state = Step1) else’0’ when (curr_state = Step3) else’0’ when (curr_state = Step5) else’0’ when (curr_state = Step7) else’0’ when (curr_state = Step9) else’0’ when (curr_state = Step11) else’0’ when (curr_state = Step13) else’0’ when (curr_state = Step15) else’1’;

bisten_assignment:bisten <= ’1’ when (curr_state = Idle) else

’0’;

bistoen <= ’0’;bistcen <= ’0’;

bistbwen_gen_en_assignment:bistbwen_gen_en <= ’1’ when (curr_state = Init_10to11) else

’1’ when (curr_state = Step11) else’1’ when (curr_state = Step13) else’1’ when (curr_state = Step15) else’1’ when (curr_state = Step16) else’0’;

bistbwen_gen_reset_assignment:bistbwen_gen_reset <= ’1’ when (curr_state = Init) else

’0’;

bistbwen_gen_din_s_assignment:bistbwen_gen_din_s <= bwen_end when (curr_state = Step11) else

bwen_end when (curr_state = Step13)else



’1’ when (curr_state = Init_10to11)else

output_preserv (2) when (curr_state =Pause) else

’X’;

fault_dtct_assignment:fault_dtct <= ’1’ when (curr_state = Pause) else

’0’;

output_preserv_en_assignment :output_preserv_en <= ’0’ when (curr_state = Idle) else

’0’ when (curr_state = Pause) else’1’;

bistportsel_assignment:bistportsel <= ’1’ when (curr_state = Step11) else

’1’ when (curr_state =Init_11to12) else

’1’ when (curr_state =Step12) else







’0’;

bistend_assignment:bistend <= ’1’ when (curr_state = Final) else

’0’;

end Behavioral;

92

BScripts and Configuration Files

ContentsB.1 Synopsys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4B.2 Cadence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

93

B. Scripts and Configuration Files

B.1 Synopsys

B.1.1 Configuration Files

### ". synopsys_dc.setup" Initialization File for## Dc_Shell and Design_Analyzer#

.....

## Site -Specific Variables#

# from the System Variable Groupset link_force_case "check_reference"

set synthetic_library ""

set target_library { /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys/fsa0a_c_sc_wc.db /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_wc.db }

set link_library [concat * $target_library]

set physical_library ""

set search_path [list . ${synopsys_root }/ libraries/syn ${synopsys_root }/dw/sim_ver ${synopsys_root }/dw/syn_ver /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys ]

set command_log_file "./ command.log"set designer "Nuno Sebastiao"set company "INESC -ID"set find_converts_name_lists "false"

set symbol_library { /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys/fsa0a_c_sc.sdb /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io.sdb }

.....

B.1.2 Scripts

B.1.2.A Design Compiler Script file

################################################ Synopsys Design Compiler Script File##############################################

set reanalyze 1set jtag 1set scan 1

set version "v3.4 _io_tryB"

set design_directory "~/ synopsys/work/amepv3.4_io"

set log_directory "~/ synopsys/work/syn/log"set db_directory "~/ synopsys/work/syn/db"set report_directory "~/ synopsys/work/syn/reports"set sclib_directory "/home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/synopsys

"set iolib_directory "/home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys

"

set hdlin_enable_dft_drc_info true

set_min_library $sclib_directory/fsa0a_c_sc_wc.db -min_version $sclib_directory/fsa0a_c_sc_bc.db

set_min_library $iolib_directory/fsa0a_c_io_wc.db -min_version $iolib_directory/fsa0a_c_io_bc.db

if {$reanalyze} {

94

B.1 Synopsys

analyze -library WORK -format vhdl "$design_directory/amep_config_pack.vhd$design_directory/amep_alias_pack.vhd $design_directory/functions_pack.vhd$design_directory/misc_logic/and.vhd $design_directory/misc_logic/or.vhd$design_directory/misc_logic/xor.vhd $design_directory/misc_logic/decoder_5bit.vhd$design_directory/misc_logic/pencoder_4bit.vhd $design_directory/misc_logic/buffer_tristate.vhd $design_directory/misc_logic/arithmetic/adder_half.vhd$design_directory/misc_logic/arithmetic/adder_full.vhd $design_directory/misc_logic/arithmetic/adder_cla_pack.vhd $design_directory/misc_logic/arithmetic/adder_cla_blockA.vhd $design_directory/misc_logic/arithmetic/adder_cla_blockB.vhd$design_directory/misc_logic/arithmetic/adder_cla.vhd $design_directory/misc_logic/arithmetic/adder_csa.vhd $design_directory/misc_logic/arithmetic/PrefixAnd.vhd$design_directory/misc_logic/arithmetic/Incrementer .vhd $design_directory/misc_logic/arithmetic/multiplier.vhd $design_directory/misc_logic/comparators/cmp_eq.vhd $design_directory/misc_logic/comparators/cmp_lt.vhd $design_directory/misc_logic/counters/cntr_re.vhd $design_directory/misc_logic/counters/cntr_circular.vhd $design_directory/misc_logic/counters/cntr_circular_ld.vhd $design_directory/misc_logic/multiplexers/mux_2to1.vhd $design_directory/misc_logic/multiplexers/mux_4to1.vhd $design_directory/misc_logic/multiplexers/mux_8to1.vhd$design_directory/misc_logic/multiplexers/mux_16to1.vhd $design_directory/misc_logic/multiplexers/mux_32to1.vhd $design_directory/misc_logic/multiplexers/muxb_4to1.vhd $design_directory/misc_logic/registers/reg_e.vhd $design_directory/misc_logic/registers/reg_re.vhd $design_directory/misc_logic/registers/reg_le.vhd$design_directory/misc_logic/registers/reg_le_const.vhd $design_directory/misc_logic/registers/reg_se.vhd $design_directory/memories/updown_counter.vhd$design_directory/memories/shift_reg.vhd $design_directory/memories/mem_bist_controller_sm.vhd $design_directory/memories/mem_bist_controller_1port.vhd$design_directory/memories/mem_bist_controller_2port.vhd $design_directory/

memories/SU180_1024X8X2BM1_WRAPPER.vhd $design_directory/memories/SJ180_2048X8X1BM1_WRAPPER.vhd $design_directory/memories/SJ180_512X8X1BM1_WRAPPER.vhd $design_directory/amep_units/sadu/cout_detector_B_block.vhd $design_directory/amep_units/sadu/cout_detector_A_block.vhd $design_directory/amep_units/sadu/cout_detector.vhd $design_directory/amep_units/sadu/sad_cmp.vhd $design_directory/amep_units/sadu/amep_sadu_lp.vhd $design_directory/amep_units/sadu/amep_sadu_parallel_adder.vhd $design_directory/amep_units/sadu/amep_sadu_parallel.vhd $design_directory/amep_units/agu_2/amep_agu_multiplier.vhd $design_directory/amep_units/agu_2/amep_agu_addr_decoder.vhd $design_directory/amep_units/agu_2/amep_agu_controller_ld_sm.vhd $design_directory/amep_units/agu_2/amep_agu_controller_ld.vhd $design_directory/amep_units/agu_2/amep_agu_controller_sad_sm.vhd $design_directory/amep_units/agu_2/amep_agu_controller_sad.vhd $design_directory/amep_units/agu_2/amep_agu.vhd$design_directory/amep_units/amep_alu.vhd $design_directory/amep_units/amep_ad_unit.vhd $design_directory/amep_core_id_decoder_sm.vhd $design_directory/amep_core_id_decoder.vhd $design_directory/amep_core_if.vhd $design_directory/amep_core_id.vhd $design_directory/amep_core_exe.vhd $design_directory/amep_core.vhd $design_directory/io_cells.vhd $design_directory/ amep_core_iocells.vhd"

if {$jtag} {

analyze -library WORK -format vhdl "$design_directory/jtag_io_cells.vhd$design_directory/amep_core_iocells_jtag.vhd"

}

}

elaborate amep_core_iocells -architecture Behavioral -library WORK

set compile_delete_unloaded_sequential_cells false

set_operating_conditions -min BCCOM -min_library fsa0a_c_sc_bc -max WCCOM -max_libraryfsa0a_c_sc_wc

set_wire_load_mode top

set_wire_load_model -name G30K -library fsa0a_c_sc_wc

set_critical_range 1 amep_core_iocells

create_clock -name "CLK_IN" -period 10 -waveform { 0 5 } { CLK }

set_clock_uncertainty 0.1 CLK_IN

set_drive 0 CLK

set_max_dynamic_power 9.5 mW

if {$scan} {

set_scan_configuration -style multiplexed_flip_flop

set test_default_strobe 40.0set test_default_strobe_width 1.0set test_default_bidir_delay 0.0set test_default_delay 0.0set test_default_period 100.0

set_dft_signal -view existing_dft -type ScanClock -port CLK -timing [list 1 21]set_dft_signal -view existing_dft -type Reset -port RST -active_state 1set_dft_signal -type ScanEnable -port test_se -hookup_pin IO_CELLS_INST/test_se_i -

active_state 1

95


set_dft_signal -type ScanEnable -port test_se2 -hookup_pin IO_CELLS_INST/test_se2_i -active_state 1

set_dft_signal -type TestMode -port test_mode -hookup_pin IO_CELLS_INST/test_mode_i -active_state 1

set_dft_signal -type ScanDataOut -port test_so1 -hookup_pin IO_CELLS_INST/test_so1_iset_dft_signal -type ScanDataOut -port test_so2 -hookup_pin IO_CELLS_INST/test_so2_iset_dft_signal -type ScanDataIn -port test_si1 -hookup_pin IO_CELLS_INST/test_si1_iset_dft_signal -type ScanDataIn -port test_si2 -hookup_pin IO_CELLS_INST/test_si2_i

create_test_protocol

dft_drc

set_dft_configuration -fix_clock enable -fix_set enable -fix_reset enable

}

if {$scan} {

compile -scan -map_effort high -area_effort medium -power_effort high

} else {

compile -map_effort high -area_effort medium -power_effort high

}

report_constrain -all_violators

if {$scan} {

create_test_protocol

dft_drc

set_scan_configuration -replace false

set_scan_path chain1 -view spec -include_elements {AMEP_CORE_INST/FETCH/CODE_RAM/BIST_CTRL/CURR_ADDRESS_REG AMEP_CORE_INST/EXECUTE/AGU/SA_MEMORY/BIST_CTRL/CURR_ADDRESS_REG AMEP_CORE_INST/EXECUTE/AGU/MB_MEMORY/BIST_CTRL/CURR_ADDRESS_REG} -complete true -scan_enable test_se -scan_data_in test_si1 -scan_data_out test_so1

set_scan_path chain2 -scan_enable test_se2 -scan_data_in test_si2 -scan_data_outtest_so2

set_dft_configuration -fix_clock enable -fix_set enable -fix_reset enable

set_dft_signal -type TestData -port CLKset_dft_signal -type TestData -port RST

set_autofix_configuration -type reset -test_data RST

preview_dft -show all

dft_drc -v

insert_dft


dft_drc -v

report_scan_path

estimate_test_coverage

set test_stil_netlist_format verilog

set version "scan_$version"

if {$jtag} {

set version "jtag_$version"

remove_cell {IO_CELLS_INST/ram_bisten0_iocell IO_CELLS_INST/ram_bisten1_iocellIO_CELLS_INST/test_se_iocell IO_CELLS_INST/test_se2_iocell IO_CELLS_INST/test_si1_iocell IO_CELLS_INST/test_si2_iocell IO_CELLS_INST/test_so1_iocellIO_CELLS_INST/test_so2_iocell}

remove_net {IO_CELLS_INST/RAM_BISTEN_i [0] IO_CELLS_INST/RAM_BISTEN_i [1]IO_CELLS_INST/test_se_i IO_CELLS_INST/test_se2_i IO_CELLS_INST/test_si1_iIO_CELLS_INST/test_si2_i IO_CELLS_INST/test_so1_i IO_CELLS_INST/test_so2_i}

connect_net IO_CELLS_INST/RAM_BISTEN [0] IO_CELLS_INST /RAM_BISTEN_i [0]connect_net IO_CELLS_INST/RAM_BISTEN [1] IO_CELLS_INST /RAM_BISTEN_i [1]connect_net IO_CELLS_INST/test_se IO_CELLS_INST/test_se_iconnect_net IO_CELLS_INST/test_se2 IO_CELLS_INST/test_se2_iconnect_net IO_CELLS_INST/test_si1 IO_CELLS_INST/test_si1_iconnect_net IO_CELLS_INST/test_si2 IO_CELLS_INST/test_si2_iconnect_net IO_CELLS_INST/test_so1 IO_CELLS_INST/test_so1_iconnect_net IO_CELLS_INST/test_so2 IO_CELLS_INST/test_so2_i

96

B.1 Synopsys

elaborate amep_core_iocells_jtag -architecture Structural -library WORK

current_design ./ amep_core_iocells_jtag.db:amep_core_iocells_jtag

set_dft_configuration -bsd enable -scan disable

set_dont_touch amep_core_iocells

set synthetic_library {dw_foundation.sldb}set link_library [ concat $target_library $synthetic_library *]

set_operating_conditions -min BCCOM -min_library fsa0a_c_sc_bc -max WCCOM -max_library fsa0a_c_sc_wc

set_wire_load_mode top

set_wire_load_model -name G30K -library fsa0a_c_sc_wc

set_critical_range 1 amep_core_iocells

create_clock -name "CLK_IN" -period 10 -waveform { 0 5 } { CLK }

set_clock_uncertainty 0.1 CLK_IN

set_drive 0 CLK

set_dft_signal -view existing_dft -type TCK -port tck -timing {10 30}

set_max_dynamic_power 9.5 mW

disconnect_net -all *Logic0*

set_dft_signal -view spec -type tck -port tckset_dft_signal -view spec -type tdi -port tdiset_dft_signal -view spec -type tdo -port tdoset_dft_signal -view spec -type tms -port tmsset_dft_signal -view spec -type trst -port trst

read_pin_map $design_directory/amep_package1.map

set_bsd_configuration -style synchronous -instruction_encoding binary -ir_width 4 -asynchronous_reset true -check_pad_designs all -control_cell_max_fanout 3

set_bsd_instruction {IDCODE} -register DEVICE_ID -capture_value {16’ h13333111}-code {0110}

set_bsd_instruction {HIGHZ} -code {0111}

set_bsd_instruction {SELECTSAMEM} -register BYPASS -code {1001} -inst_enable {AMEP_CORE_IOCELLS_INST/RAM_BISTEN [0]}

set_bsd_instruction {SELECTMBMEM} -register BYPASS -code {1010} -inst_enable {AMEP_CORE_IOCELLS_INST/RAM_BISTEN [1]}

set_bsd_instruction {SELECTINSTMEM} -register BYPASS -code {1011} -inst_enable{AMEP_CORE_IOCELLS_INST/RAM_BISTEN [1] AMEP_CORE_IOCELLS_INST/RAM_BISTEN [0]}

set_bsd_instruction {EXTEST} -code {0010}

set_bsd_instruction {SAMPLE PRELOAD} -code {0011}

preview_dft -bsd allinsert_dft

write_bsdl -output $db_directory/amep_compiled_$version.bsdl

check_bsd

create_bsd_patterns -effort high -type functional


}

write_test_protocol -output $db_directory/amep_compiled_$version.spf

}

write -hierarchy -format ddc -output $db_directory/amep_compiled_${version}_clk_corrected.ddc

write_sdc $db_directory/amep_compiled_$version.sdc

report_power -analysis_effort high > $report_directory/ power_$version.rpt

97


report_area -hierarchy -nosplit > $report_directory/area_$version.rpt

report_constraint -significant_digits 2 > $report_directory/constraints_$version.rpt

report_timing -path full -delay max -nworst 1 -max_paths 1 -significant_digits 2 -sort_by group> $report_directory/timing_$version.rpt

change_names -hierarchy -rules verilog

write -hierarchy -format verilog -output $db_directory/amep_compiled_$version.v

exit

B.1.2.B Tetramax Script file

######################################## TetraMAX Script File######################################

read netlist /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tetramax/fsa0a_c_sc_tmax.lib -library

read netlist /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/tetramax/fsa0a_c_io_tmax.lib -library

read netlist /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1.tmax -library

read netlist /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1.tmax -library

read netlist /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1.tmax -library

read netlist /home/ncas/synopsys/work/syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.v

run build_model amep_core_iocells_jtag

set drc /home/ncas/synopsys/work/syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.spfrun drc

set atpg -full_seq_time 0set atpg -abort_limit 30

remove faults -all

add faults -all

set faults -fault_coverage

run atpg basic_scan -ndetects 2

run atpg fast_sequential_only

run atpg full_sequential_only

report faults -summary

write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.bin -internal -format binary

write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.vhdl -internal -format vhdl93 -serial

write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.parallel.stil99 -internal -format stil99 -nopatinfo -parallel 0 -nocore

write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.serial.stil99 -internal -format stil99 -nopatinfo -serial -nocore

write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.parallell.v -internal -format verilog_single_file -parallel 0

write patterns /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.serial.v -internal-format verilog_single_file -serial

set patterns external /home/ncas/synopsys/work/test/amep_v3 .3 _io_scan_test_patterns.serial.stil99

remove faults -all

add faults -all

run fault_sim -sequential -nodrop_faults -ndetects 1

98

B.2 Cadence

B.2 Cadence

B.2.1 Configuration Files

B.2.1.A Configuration file for importing the design to Encoun ter

################################################# ## FirstEncounter Input configuration file ## ################################################## Created by First Encounter v04.10- s374_1 on Fri May 11 14:18:51 2007global rda_Inputset cwd /home/ncas/synopsys/work/cadenceset rda_Input(import_mode) {-treatUndefinedCellAsBbox 0 -keepEmptyModule 1 }set rda_Input(ui_netlist) "../ syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB_uniquified.v"set rda_Input(ui_netlisttype) {Verilog}set rda_Input(ui_ilmlist) {}set rda_Input(ui_ilmspef) {}set rda_Input(ui_settop) {1}set rda_Input(ui_topcell) {amep_core_iocells_jtag}set rda_Input(ui_celllib) {}set rda_Input(ui_iolib) {}set rda_Input(ui_areaiolib) {}set rda_Input(ui_blklib) {}set rda_Input(ui_kboxlib) {}set rda_Input(ui_gds_file) {}set rda_Input(ui_oa_oa2lefversion) {}set rda_Input(ui_timelib ,min) "/home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/

SU180_1024X8X2BM1/SU180_1024X8X2BM1_BC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1_BC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1_BC.lib /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tlf/fsa0a_c_sc_bc.tlf /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_bc.lib"

set rda_Input(ui_timelib ,max) "/home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1_WC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1_WC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1_WC.lib /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tlf/fsa0a_c_sc_wc.tlf /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_wc.lib"

set rda_Input(ui_timelib) "/home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1_TC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1_TC.lib /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1_TC.lib /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/FrontEnd/tlf/fsa0a_c_sc_tc.tlf /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/IO/FrontEnd/synopsys/fsa0a_c_io_tc.lib"

set rda_Input(ui_smodDef) {}set rda_Input(ui_smodData) {}set rda_Input(ui_dpath) {}set rda_Input(ui_tech_file) {}set rda_Input(ui_io_file) {}set rda_Input(ui_timingcon_file) "../ syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.sdc"set rda_Input(ui_latency_file) {}set rda_Input(ui_scheduling_file) {}set rda_Input(ui_buf_footprint) {}set rda_Input(ui_delay_footprint) {}set rda_Input(ui_inv_footprint) {}set rda_Input(ui_leffile) "/home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005 Q4v1 .2/SC/BackEnd/lef

/header6_V55.lef /home2/ncas/asiclibs/UMC18/faraday/ mem_files /7 june07/SU180_1024X8X2BM1/SU180_1024X8X2BM1.lef /home2/ncas/asiclibs/UMC18/faraday/mem_files /7 june07/SJ180_2048X8X1BM1/SJ180_2048X8X1BM1.lef /home2/ncas/ asiclibs/UMC18/faraday/mem_files /7june07/SJ180_512X8X1BM1/SJ180_512X8X1BM1.lef /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c/2005 Q4v1 .2/SC/BackEnd/lef/fsa0a_c_sc.lef /home2/ncas/asiclibs/UMC18/faraday/fsa0a_c /2005Q4v1 .2/IO/BackEnd/lef/fsa0a_c_io.lef"

set rda_Input(ui_cts_cell_footprint) {}set rda_Input(ui_cts_cell_list) {}set rda_Input(ui_core_cntl) {aspect}set rda_Input(ui_aspect_ratio) {0.441368}set rda_Input(ui_core_util) {0.20228}set rda_Input(ui_core_height) {935.0}set rda_Input(ui_core_width) {2649.44}set rda_Input(ui_core_to_left) {40.56}set rda_Input(ui_core_to_right) {40.0}set rda_Input(ui_core_to_top) {40.0}set rda_Input(ui_core_to_bottom) {40.0}set rda_Input(ui_max_io_height) {0}set rda_Input(ui_row_height) {5.04}set rda_Input(ui_isHorTrackHalfPitch) {0}set rda_Input(ui_isVerTrackHalfPitch) {1}set rda_Input(ui_ioOri) {R0}set rda_Input(ui_isOrigCenter) {0}set rda_Input(ui_exc_net) {}set rda_Input(ui_delay_limit) {1000}set rda_Input(ui_net_delay) {1000.0 ps}set rda_Input(ui_net_load) {0.5pf}set rda_Input(ui_in_tran_delay) {0.1ps}set rda_Input(ui_captbl_file) "/home2/ncas/asiclibs/UMC18/UMC18_1P6M_MMC/umc18MMC.capTbl"set rda_Input(ui_defcap_scale) {1.0}set rda_Input(ui_detcap_scale) {1.0}

99


set rda_Input(ui_xcap_scale) {1.0}set rda_Input(ui_res_scale) {1.0}set rda_Input(ui_shr_scale) {1.0}set rda_Input(ui_time_unit) {none}set rda_Input(ui_cap_unit) {}set rda_Input(ui_oa_reflib) {}set rda_Input(ui_oa_abstractname) {}set rda_Input(ui_oa_layoutname) {}set rda_Input(ui_sigstormlib) {}set rda_Input(ui_cdb_file) {}set rda_Input(ui_echo_file) {}set rda_Input(ui_xilm_file) {}set rda_Input(ui_qxtech_file) {}set rda_Input(ui_qxlib_file) {}set rda_Input(ui_qxconf_file) {}set rda_Input(ui_pwrnet) {VCC}set rda_Input(ui_gndnet) {GND}set rda_Input(flip_first) {1}set rda_Input(double_back) {1}set rda_Input(assign_buffer) {1}set rda_Input(ui_pg_connections) ""set rda_Input(ui_gen_footprint) {0}

B.2.1.B I/O assignment file

####################################################### ## Silicon Perspective Corp. ## FirstEncounter IO Assignment ## #######################################################

Version: 1

### NORTH SIDE ###

Orient: R180Offset: 140.12

Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_3 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_4 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_5 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_6 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_7 NSkip: 104.78Pad: IO_VCC_0 N VCC3IODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_8 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_9 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_10 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_11 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_12 NSkip: 104.78Pad: IO_GND_0 N GNDIODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_13 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_14 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_15 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_16 NSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_17 N

### WEST SIDE ###


Pad: JTAG_IO_CELLS_INST/tdi_iocell WSkip: 227.54Pad: CORE_VCC_0 W VCCKDSkip: 130.82Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/clk_iocell WSkip: 130.82Pad: CORE_GND_0 W GNDKDSkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_0 WSkip: 34.1

100

B.2 Cadence

Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_1 WSkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_2 W

### SOUTH SIDE ###


Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/test_mode_iocell SSkip: 104.78Pad: JTAG_IO_CELLS_INST/trst_iocell SSkip: 104.78Pad: JTAG_IO_CELLS_INST/tms_iocell SSkip: 104.78Pad: JTAG_IO_CELLS_INST/tdo_iocell SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/gnt_iocell SSkip: 104.78Pad: IO_GND_1 S GNDIODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/req_iocell SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/done_iocell SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_0 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_1 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_2 SSkip: 104.78Pad: IO_VCC_1 S VCC3IODSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_3 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_4 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_5 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_6 SSkip: 104.78Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/data_iocell_7 S

### EAST SIDE ###


Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/rst_iocell ESkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/en_iocell ESkip: 130.82Pad: CORE_GND_1 E GNDKDSkip: 130.82Pad: JTAG_IO_CELLS_INST/tck_iocell ESkip: 130.82Pad: CORE_VCC_1 E VCCKDSkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/oe_nwe_iocell ESkip: 34.1Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_19 ESkip: 38.44Pad: AMEP_CORE_IOCELLS_INST/IO_CELLS_INST/address_iocell_18 E

Orient: R0Pad: NE_CORNER NE CORNERD

Orient: R90Pad: NW_CORNER NW CORNERD

Orient: R180Pad: SW_CORNER SW CORNERD

Orient: R270Pad: SE_CORNER SE CORNERD

B.2.1.C Clock Tree Synthesis configuration file

## FirstEncounter(TM) Clock Synthesis Technology File Format#

AutoCTSRootPin CLKRouteClkNet YESNoGating NO

DetailReport YESSetDPinAsSync NOPostOpt YES

101


OptAddBuffer YES

MaxDelay 1.5nsMinDelay 0ns # default value

MaxDepth 8Buffer BUF12CK BUF8CK BUF6CK BUF4CK BUF3CK BUF2CK BUF1CK INV12CK INV8CK INV6CK INV4CK INV3CK

INV2CK INV1CK DELA DELB DELC DLY1 DLY2 DLY3 DLY4

MaxSkew 300psSinkMaxTran 150psBufMaxTran 150ps

ExcludedPin+ amep_core_iocells_jtag_BSR_top_inst/amep_core_iocells_jtag_data_in_1

ThroughPin

End

B.2.2 Scripts

############################################# Cadence Encounter Script File###########################################

loadConfig /home/ncas/synopsys/work/cadence/amep_compiled_jtag_scan_v3 .4 _io_tryB.conf 0

setUIVar rda_Input ui_cts_cell_list {BUF1CK BUF2CK BUF3CK BUF4CK BUF6CK BUF8CK BUF12CK INV1CKINV2CK INV3CK INV4CK INV6CK INV8CK INV12CK}

setUIVar rda_Input ui_delay_footprint I+OIsetUIVar rda_Input ui_buf_footprint I+OIsetUIVar rda_Input ui_inv_footprint I+O!I

commitConfig

floorPlan -d 3029.94 1314.40 50 50 50 50fit

loadiofile "../ syn/db/amep_compiled_jtag_scan_v3 .4 _io_tryB.io"

createRouteBlk -box 140.74 140.74 2889.20 1173.60 -layer 6

removeBufferTree

addRing -spacing_bottom 0.8 -width_left 20 -width_bottom 20 -width_top 20 -spacing_top 0.8 -layer_bottom metal5 -center 1 -stacked_via_top_layer metal5 -width_right 20 -around core -jog_distance 0.8 -offset_bottom 0.8 -layer_top metal5 -threshold 0.8 -offset_left 0.8 -spacing_right 0.8 -spacing_left 0.8 -offset_right 0.8 -offset_top 0.8 -layer_right metal4 -nets {GND VCC } -stacked_via_bottom_layer metal1 -layer_left metal4

clearGlobalNetsglobalNetConnect GND -type pgpin -pin GND -inst *globalNetConnect VCC -type pgpin -pin VCC -inst *globalNetConnect VCC -type tiehiglobalNetConnect GND -type tielo

amoebaPlace -timingdrivencheckPlace

setObjFPlanBox Instance AMEP_CORE_IOCELLS_INST/AMEP_CORE_INST/EXECUTE/AGU/MB_MEMORY/RAM_512_8577.641 230.099 959.561 589.699

setObjFPlanBox Instance AMEP_CORE_IOCELLS_INST/AMEP_CORE_INST/EXECUTE/AGU/SA_MEMORY/RAM_2048_8849.956 704.641 1815.396 1088.421

setObjFPlanBox Instance AMEP_CORE_IOCELLS_INST/AMEP_CORE_INST/FETCH/CODE_RAM/RAM_1K_16 2173.549230.06 2656.269 589.04

setBlockPlacementStatus -name AMEP_CORE_IOCELLS_INST/ AMEP_CORE_INST/FETCH/CODE_RAM/RAM_1K_16 -status preplaced

setBlockPlacementStatus -name AMEP_CORE_IOCELLS_INST/ AMEP_CORE_INST/EXECUTE/AGU/SA_MEMORY/RAM_2048_8 -status preplaced

setBlockPlacementStatus -name AMEP_CORE_IOCELLS_INST/ AMEP_CORE_INST/EXECUTE/AGU/MB_MEMORY/RAM_512_8 -status preplaced

addRing -spacing_bottom 2 -width_left 10 -width_bottom 10 -width_top 10 -spacing_top 2 -layer_bottom metal5 -stacked_via_top_layer metal5 -width_right 10 -around each_block -jog_distance 0.44 -layer_top metal5 -threshold 0.44 -spacing_right 2 -spacing_left 2 -offset_bottom 3 -offset_left 3 -offset_right 3 -offset_top 3 -type block_rings -layer_rightmetal4 -nets {GND VCC } -stacked_via_bottom_layer metal1 -layer_left metal4

addStripe -block_ring_top_layer_limit metal4 -max_same_layer_jog_length 0.88 -padcore_ring_bottom_layer_limit metal4 -set_to_set_distance 200 -stacked_via_top_layer

102

B.2 Cadence

metal5 -padcore_ring_top_layer_limit metal4 -spacing 1 -xleft_offset 100 -xright_offset 100-merge_stripes_value 0.44 -layer metal4 -block_ring_bottom_layer_limit metal4 -width 10 -

nets {GND VCC } -stacked_via_bottom_layer metal1

amoebaPlace -timingdriven -highEffortcheckPlacesetDrawMode place

reclaimArea

timeDesign -preCTSoptDesign -preCTS

setCTSMode -useCTSRouteGuide

specifyClockTree -clkfile amep_compiled_jtag_scan_v3 .4 _io_tryB.ctstch

createSaveDir amep_core_iocells_ctsckSynthesis -rguide amep_core_iocells_cts/amep_core_iocells_cts.guide -report

amep_core_iocells_cts/amep_core_iocells_cts.ctsrpt -forceReconvergentsaveClockNets -output amep_core_iocells_cts/amep_core_iocells_cts.ctsntfsaveNetlist amep_core_iocells_cts/amep_core_iocells_cts.vsavePlace amep_core_iocells_cts/amep_core_iocells_cts.place

timeDesign -postCTSoptDesign -postCTS -setup -hold

addFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLERaddFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLERaddFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLERaddFiller -cell FILLERCC FILLERBC FILLERAC FILLER8C FILLER4C FILLER2C FILLER1 -prefix FILLER

addIoFiller -cell EMPTY16D -prefix IOFILLERaddIoFiller -cell EMPTY8D -prefix IOFILLERaddIoFiller -cell EMPTY4D -prefix IOFILLERaddIoFiller -cell EMPTY2D -prefix IOFILLERaddIoFiller -cell EMPTY1D -prefix IOFILLER

sroute -jogControl { preferWithChanges differentLayer }

getNanoRouteMode -quietgetNanoRouteMode -quiet envSuperThreadingsetNanoRouteMode -quiet -drouteFixAntenna truesetNanoRouteMode -quiet -routeInsertAntennaDiode falsesetNanoRouteMode -quiet -routeReInsertFillerCellList filler_cell_list.txtsetNanoRouteMode -quiet -timingEngine CTEsetNanoRouteMode -quiet -routeWithTimingDriven truesetNanoRouteMode -quiet -routeWithEco falsesetNanoRouteMode -quiet -routeWithSiDriven truesetNanoRouteMode -quiet -routeTdrEffort 5setNanoRouteMode -quiet -routeSiEffort normalsetNanoRouteMode -quiet -routeWithSiPostRouteFix falsesetNanoRouteMode -quiet -drouteAutoStop truesetNanoRouteMode -quiet -routeSelectedNetOnly falsesetNanoRouteMode -quiet -drouteStartIteration defaultsetNanoRouteMode -quiet -envNumberProcessor 1setNanoRouteMode -quiet -drouteEndIteration default

setNanoRouteMode -drouteUseViaOfCut 4setNanoRouteMode -drouteUseBiggerOverhangViaFirst truesetNanoRouteMode -drouteOptimizeUseMultiCutVia true

trialRoute -handlePreroutesetCteReportwriteDesignTiming .timing_file.tiffreeTimingGraphglobalDetailRoute

clearDrcverifyGeometryeditDeleteViolations

setNanoRouteMode -quiet -routeWithTimingDriven falsesetNanoRouteMode -quiet -routeWithEco truesetNanoRouteMode -quiet -routeWithSiDriven falseglobalDetailRoute

clearDrcverifyGeometry

timeDesign -postRouteoptDesign -postRoute -setup -hold

setOpCond -maxLibrary fsa0a_c_sc_wc -max WCCOM -minLibrary fsa0a_c_sc_bc -min BCCOM

setExtractRCMode -detail -rcdb amep_core_iocells.rcdb -relative_c_t 0.00999999977648 -total_c_t5.0 -reduce 5 -noise

extractRC -outfile amep_core_iocells.capsetDelayCalMode -signalStormdelayCal -sdf amep_core_iocells.sdf

setAnalysisMode -setup -async -skew -clockTreebuildTimingGraph

103


reportSlacks -outfile amep_core_iocells.slkrptSlackClkDomain -infile amep_core_iocells.slk

autoFetchDCSources VCCautoFetchDCSources GNDsavePadLocation -outfile /home/ncas/synopsys/work/cadence/amep_core_iocells_jtag.pp

saveToggleProbability -outfile /home/ncas/synopsys/work/cadence/amep_core_iocells_jtag.pp {CLK_IN 100.000 0.450}

updatePower -irDropAnalysis average -postCTS -toggleFile amep_core_iocells_jtag.tg -padamep_core_iocells_jtag.pp -report power -reportInstanceVoltage instance.voltage -reportInstancePower instance.power -reportRailAnalysis power.graph -mode floorplan VCC

saveDesign amep_core_iocells_jtag.encstreamOut amep_core_iocells_jtag -mapFile /home2/ncas/ asiclibs/UMC18/GDSstreamOut.map -libName

DesignLib -structureName amep_core_iocells_jtag -stripes 1 -units 1000 -mode ALLdefOut -floorplan -netlist -routing amep_core_iocells_jtag.defsaveNetlist amep_core_iocells_jtag_final .v

104

B.2 Cadence

105

Dissertaç ˜ao para obtenç ˜ao do Grau de Mestre em ...

Documents