E-Journal of VLSI

COPYRIGHT RESERVED VOL. 1, MARCH 2012

1 E-JOURNAL ON HERITAGE OF VLSI DESIGN AND TECHNOLOGY

Volume 1, March 2012

Published by ECE Dept., Heritage Institute of Technology

ECE-VLSI PresentsE-Journal onHeritage of VLSI Design and Technology



PATRONS:

Prof. Probir Roy,

Executive Director

Prof. B.B. Paira,

Director

Prof. (Dr.) Dulal Chandra Ray,

Joint Director

Prof. (Dr.) Sambhunath Biswas,

Deputy Director

Prof. (Dr.) Pranay Chaudhuri,

Principal

Prof. Siladitya Sen,

Head Of the Department

Electronics And Communication Engineering

Prof. Krishanu Datta,

Lead of Microelectronics & VLSI Design

Electronics & Communication Engineering



Note From Advisor

ECE M.Tech VLSI Students (2011-13) took the initiative to create 1st edition of e-journal on VLSI named "Heritage of VLSI Design and Technology" under my supervision. This journal includes articles on VLSI Design Flow, Memory Design, Circuit Design, Physical Design, VLSI Transistor Scaling, Fabrication Technology and VLSI System Design. There will be more articles on advanced VLSI topics as well as further refinements on existing articles in future editions. Since VLSI is truly inter-disciplinary subject, besides ECE Department, faculties and Students from CSE, Electrical, Instrumentation and Physics Departments are welcome to provide VLSI related articles for future editions of this journal.

Thanks,

Prof. Krishanu Datta

Associate Professor

B.E.(J.U.), M.Tech (IIT KGP, Gold Medalist), Ex Engineer (intel USA)

Lead: Microelectronics and VLSI design

Department of Electronics and Communication Engg.

Heritage Institute of Technology



CONTENTS

TITLE PAGE NO. VLSI DESIGN METHODOLOGY 5-17 VLSI MEMORY OVERVIEW 18-33 VLSI COMBINATIONAL CIRCUIT TOPOLOGIES 34-43 PERFORMANCE ESTIMATION OF VLSI DESIGN 44-52 VLSI TRANSISTOR & INTERCONNECT SCALING

OVERVIEW 53-65 VLSI STANDARD CELL LAYOUT 66-75 VLSI FABRICATION OVERVIEW 76-83 FABRICATION OF CMOS INVERTER USED IN VLSI 84-90

INTRODUCTION TO VHDL FOR VLSI DESIGN 91-101 EMBEDDED REAL TIME VLSI SYSTEM 102-106



VLSI DESIGN METHODOLOGY

Author: Debadrita Dalal & Debashis Ghosh M.Tech(VLSI), ECE-Student (2011-13)

Abstract: In this paper, several existing techniques and methodologies have been shown to significantly enhance the circuit design process for high performance chips. Keyword: Standard-cell, custom design, semi-custom design, PLD, PAL, PLA, FPGA, design methodology. 1. Introduction: The electronics industry has achieved a phenomenal growth over the last two decades, mainly due to the rapid advances in integration technologies, large-scale systems design - in short, due to the advent of VLSI. The number of applications of integrated circuits in high-performance computing, telecommunications and consumer electronics has been rising steadily and at a very fast pace. Typically, the required computational power (or, in other words, the intelligence) of these applications is the driving force for the fast development of this field. One of the most important characteristics of information services is their increasing need for very high processing power and bandwidth (in order to handle real-time video, for example). The other important characteristic is that the information services tend to become more and more personalized (as opposed to collective services such as broadcasting), which means that the devices must be more intelligent to answer individual demands and

at the same time they must be portable to allow more flexibility/mobility. As more and more complex functions are required in various data processing and telecommunications devices, the need to integrate functions in a small system or package is also increasing. The level of integration as measured by the number of logic gates in a monolithic chip has been steadily rising for almost three decades, mainly due to the rapid progress in processing interconnect technology. 2. What Is VLSI Design: Very-large-scale integration (VLSI) is the process of creating integrated circuits by integrating thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed.

Figure 1: A VLSI integrated-circuit die



The first semiconductor chips held two transistors each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device (small-scale integration (SSI)). Improvements in technique led to devices with hundreds of logic gates, known as medium-scale integration (MSI). Further improvements led to large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and billions of individual transistors. Table 1.1 shows the evolution of logic complexity in integrated circuits over the last three decades, and marks the milestones of each era. Here, the numbers for circuit complexity should be interpreted only as representative examples to show the order-of-magnitude. ERA COMPLEXITY Small scale Integration (SSI) 21 - 26

Medium Scale Integration (MSI) 26 - 211 Large Scale Integration (LSI) 211 - 216 Very Large Scale Integration (VLSI) 216 - 221 Ultra Large Scale Integration (ULSI) 221 - 226

Table 1: Evolution of logic complexity in integrated circuits.

Today, most of the integrated circuit chips contain hundreds of millions of transistor which are ULSI or beyond complexity.

3. Challenges Of VLSI : There are many challenges that a VLSI chip had to face when it is integrated using latest process technology: Power usage/Heat dissipation:

As threshold voltages have ceased to scale with advanced process

technology, dynamic power dissipation has not scaled as per expectation. Maintaining logic complexity when scaling the design down only means that the power dissipation per area will go up. This has given rise to techniques such as dynamic voltage and frequency scaling (DVFS) to minimize overall power.

Process variation: As photolithography techniques tend closer to the fundamental laws of optics, achieving high accuracy in doping concentrations and etched wires is becoming more difficult and prone to errors due to variation. Designers now must simulate across multiple fabrication process corners before a chip is certified ready for production.

Stricter design rules : Due to lithography and etch issues with scaling, design rules for layout have become increasingly stringent. Designers must keep ever more of these rules in mind while laying out circuits. The overhead for custom design is now reaching a tipping point, with many design houses opting to switch to electronic design automation (EDA) tools to automate their design process.

Timing/design closure : As clock frequencies tend to scale up, designers are finding it more difficult to distribute and maintain low clock skew between these high frequency clocks across the entire chip.

First-pass success: As die sizes shrink (due to scaling), and wafer sizes go up (to reduced manufacturing costs/die), the number of dies per wafer increases, and the complexity of making suitable photo masks goes up rapidly. A mask set for a modern technology can cost several million dollars. This non-recurring expense deters the old iterative



philosophy involving several "spin-cycles" to find errors in silicon, and encourages first-pass silicon success. Several design philosophies have been developed to aid this new design flow, including design for manufacturing (DFM), design for test (DFT).

4. Design Parameters :

For VLSI design, certain design parameters needs to be satisfied as follows:

Less area/volume and therefore, compactness.

Less power consumption. Less testing requirements at system

level. Higher reliability. More performance. Significant cost savings. Time To Market (TTM).

5. VLSI Design Flow : The VLSI design process, at various levels, is usually evolutionary in nature. It starts with a given set of requirements. Initial design is developed and then tested against the requirements. When requirements are not met, the design has to be improved. If such improvement is either not possible or too costly, then the revision of requirements and its impact analysis must be taken into consideration. The VLSI design flow mainly consists of three major domains, namely:

behavioral domain, structural domain, physical domain,

5.1.1 Behavioural Representation:

The design flow starts from the algorithm that describes the behavior of the target chip. The corresponding architecture

of the processor is first defined. It is mapped onto the chip surface by floor planning. The next design evolution in the behavioral domain defines finite state machines (FSMs) which are structurally implemented with functional modules such as registers and arithmetic logic units (ALUs). In this type of representation functional design comes into approach. After system specification functional design is done with design verification also. In fig2 it has been shown that firstly functional design of the model is done and then its been verified. If any mismatch occurs then again it returns for fresh design or design rectification, which has been described by the feedback path in the figure.

5.1.2 Structural Representation: It is actually the gate level representation of the required chip to be made. A logical specification specifies how components are interconnected to perform a certain function. In general, this description is a list of modules and interconnections. Whereas in the behavioral domain one could move through a hierarchy of algorithm, register level and Boolean equation, at logical level the levels of abstraction include the module level, the gate level, the switch level and the circuit level. 5.1.3 Physical Representation: The physical specification for a circuit is used to define how a particular part has to be constructed to yield a specific structure and hence behavior. In an IC process, the lowest level of physical specification is the photo-mask information required by various processing steps in the fabrication process [1]. Similar to the behavioral and logical domains, various levels of abstraction may be defined for the physical representation of a chip.



After all the previous steps these modules are then geometrically placed onto the chip surface using CAD tools for automatic module placement followed by routing, with a goal of minimizing the interconnects area and signal delays.

Figure 2: A view of VLSI design flow.

Figure 2 provides a more simplified view of the VLSI design flow, taking into account the various representations, or abstractions of design - behavioral, logic, circuit and mask layout. Note that the verification of design plays a very important role in every step during this process. The failure to properly verify a design in its early phases typically causes significant and expensive re-design at a later stage, which ultimately increases the time-to-market.

Although the design process has been described in linear fashion for simplicity, in reality there are many iterations back and forth, especially between any two neighboring steps, and occasionally even remotely separated pairs. Although top-down design flow provides an excellent

design process control, in reality, there is no truly unidirectional top-down design flow. Both top-down and bottom-up approaches have to be combined. For instance, if a chip designer defined architecture without close estimation of the corresponding chip area, then it is very likely that the resulting chip layout exceeds the area limit of the available technology. In such a case, in order to fit the architecture into the allowable chip area, some functions may have to be removed and the design process must be repeated. Such changes may require significant modification of the original requirements. Thus, it is very important to feed forward low-level information to higher levels (bottom up) as early as possible.

6. VLSI Design Styles:

Figure 3: Types of VLSI design.

6.1.1. Full Custom Design:

Full-custom design is a methodology for designing with mostly manual effort.

Full-custom design potentially maximizes the performance of the chip, and minimizes its area, but is labor-intensive to implement. In a fuller custom design, the entire mask design is done new without use of any library. However, the development cost of such a design style is becoming prohibitively high. Thus, the concept of

DESIGN STYLES

FULL CUSTOM DESIGN

SEMI CUSTOM DESIGN

PROGRAMMABLE LOGIC DEVICES

PAL PLA FPGA



design reuse is becoming popular in order to reduce design cycle time and development cost. The most rigorous full custom design can be the design of a memory cell, be it static or dynamic. Since the same layout design is replicated, there would not be any alternative to high density memory chip design. For logic chip design, a good compromise can be achieved by using a combination of different design styles on the same chip, such as standard cells, data-path cells and PLAs. In real full-custom layout in which the geometry, orientation and placement of every transistor is done individually by the designer, design productivity is usually very low - typically 10 to 20 would not be any alternative to high density memory chip design. For logic chip design, a good compromise can be achieved by using a combination of different design styles on the same chip, such as standard cells, data-path cells and PLAs. In real full-custom layout in which the geometry, orientation and placement of every transistor is done individually by the designer, design productivity is usually very low - typically 10 to 20 transistors per day, per designer. The figure 4 represents the steps involved in full custom design approach, also known as bottom-up approach. The standard cell (or manually created leaf cell) is the basic building block of the IC. In the hierarchy 1, blocks A,B,C are used and there is a inter- relationship among them. The block A is

Figure 4: Flowchart showing steps of full custom design.

prepared from the leaf cell,as shown by the red arrow. Similarly the blocks B and C are also made in the same manner. Now using hierarchy1, block X in hierarchy2 is created. .Similarly the other blocks i.e. Y and Z are also made in the same manner. Ultimately the final chip having and number of hierarchy are created. It is called bottom-up approach as design from the lowest of hierarchy.

n

Full Custom Design Flow (Bottom-up)



6.1.2 Applications Of Custom Design:

Used in array, datapath- high performance of ALU.

High performance memory, srams, drams - due to high repeatibility and Regularity.

6.2. Semi-Custom Design: This design style is also called standard cell based design. The standard cells are pre-designed for general use and the same cells are used in many different chip designs. In this type of design, most work is done by the help of tools hence design effort is less. Figure8 describes the process of preparing chip by semi-custom design approach.

Figure 8: Flowchart showing steps of semi- custom design.

Figure8 represents the steps involved in semi-custom design approach. It is also known as top-down approach. In this process we start from the chip level specification and finally reach to bottommost hierarchy of design. As shown in the figure, in hierarchy 1 we have used the blocks A,B,C and there is a inter-relationship among them, using those we are making the block X in hierarchy2,similarly the other blocks in hierarchy 2 i.e. B and C, are also prepared. Then finally we reach to the standard cell or leaf cell. The semi-custom cell is also known as top-down approach because in this process, we start from the full chip level and then approach to the standard cell.

6.2.1. Need Of Custom Design:

This approach is reasonably faster than custom approach.

Most steps are done by tools, less labour intensive.

Standard cell library is used for design.

6.2.2 Applications Of Semi-Custom Design:

Semi custom design style is used in control unit in microprocessor- due to random nature of logic.

DSP,ASIC processor- for most ASIC processors time to market is very critical. Hence semi custom approach using EDA tools and standard cells are popular.

Semi Custom Design Flow (Top - Down)



CUSTOM APPROACH

SEMI-CUSTOM APPROACH

Slow process.

This approach is reasonably faster

More engineering cost.

Less engineering cost .

It is only used to design very high performance systems.

Most ASICs are currently designed using this method.

Most of the design steps are done manually. Most of the design steps are done by EDA tools.

Labour intensive to implement.

Its not as such.

Standard cell library may or may ot be needed.

Standard cell library is used for fabrication

Design productivity is usually very low.

High design productivity

More area efficient approach.

Less area efficient.

Maximises the performance of the chip.

No

High Repeatibility and Regularity. Low Repeatibility and Regularity.

6.2.3 Comparison Between Custom And Semi Custom Approach



6.3 Programmable Logic Devices (PLD):

PLD are the building blocks which are used to realize digital systems, they can also be used to design combinational logic as well as sequential logic. In the combinational circuit design, it is easy to make circuits function based on hardware available rather than go to k map minimization technique for reducing the number of gates.

PLD s are the devices with large no. of AND and OR gates, available in single hardware and its upon users discretion how they will implement the logic functions. The most important feature of this device is programmability. This means the functions which we want to realize can be programmed according to the application or specifications as and when required by the manufacturer or by the end user. Some even has the erasable property, which makes it reconfigurable and the hardware can be reused.

6.3.1 A Brief History :

Before the invention of PLD, ROM (Read Only Memory) was used as programmable device. In ROM we can store combination of input variables to generate different outputs. Basically we can say that ROM is nothing but a truth table in a hardwired form. In Truth Table we have the input combination and for each combination, outputs tells us whether it is logic high or low, and it may have several outputs which represents several functions that we want to realize in the circuit.

ROM has several disadvantages so these days it is mainly used to distribute firmware.

Much slower than dedicated logic circuits.

Consume more power.

Usually more expensive than programmable logic devices, especially when high speed is required.

As in most of the ROMs input or output registers may not be present so they can’t be used for sequential logic.

6.3.2 Operation Of PLD:

A PLD is a combination of a logic device and a memory device. The memory is used to store the pattern that was given to the chip during programming. Most of the methods for storing data in an integrated circuit have been adapted for use in PLDs. These include:

Silicon antifuses

SRAM

EPROM or EEPROM cells

Flash memory

Silicon antifuses are the storage elements used in the PAL (Programmable Array Logic), the first type of PLD. They are called antifuses because they work in the opposite way to normal fuses, initially connects until they are broken by an electric current.

SRAM, or static RAM, is a volatile memory, meaning that its contents are lost each time the power is switched off. SRAM-based PLDs therefore have to be programmed every time the circuit is switched on.

An EPROM cell is a MOS (metal-oxide-semiconductor) transistor that can be switched on by trapping an electric charge permanently on its gate electrode. This is done by a PAL programmer. The charge remains for many years and can only be removed by exposing the chip to strong ultraviolet light in a device called an EPROM eraser.



Flash memory is non-volatile in nature retaining its contents even when the power is switched off. It can be erased and reprogrammed as required. This makes it useful for PLD memory.

As of 2005, most CPLDs (Complex PLDs) are electrically programmable and erasable, and non-volatile. This is because they are too small to justify the inconvenience of programming internal SRAM cells every time they start up, and EPROM cells are more expensive due to their ceramic package with a quartz window.

6.3.3 Programmable Logic Array(PLA):

PLA is a programmable logic device used to realize combinational logic functions. Logically, a PLA is a circuit that allows synthesizing Boolean functions in sum-of-product form. The typical implementation consists of input buffers for all inputs, the programmable AND-matrix followed by the programmable OR-matrix, and output buffers.

Figure 9: Block diagram of PLA

In 1970, Texas Instruments developed a mask-programmable IC based on the IBM read-only associative memory or ROAM. This device, the TMS2000, was programmed by altering the metal layer during the production of the IC. The TMS2000 had up to 17 inputs and 18 outputs with 8 JK flip flop for memory. TI

coined the term Programmable logic array for this device [2].

6.3.4 PLA Vs Read-Only Memory:

A combinational circuit may occasionally have don't-care conditions. When implemented with a Read-only memory, a don't care condition becomes an address input that will never occur. The words at the don't care address need not be programmed and may be left in their original state (all 0's or all 1’s). The result is inefficient use of Read-only memory [2]. This suggest that if the number of don’t care condition is more than it is economical to go for PLA. A PLA is similar to a ROM in concept; however, the PLA does not provide full decoding of the variables and does not generate all the min-terms as in the ROM. In the PLA, the decoder is replaced by a group of AND programmed to generate a product term of the input variables. The AND and OR gates inside the PLA are initially fabricated with links among them. The specific Boolean functions are implemented in sum of products form by opening appropriate links and leaving the desired connections [2].

If we want to use PLA to implement below:

f (A,B,C) = ∑ (0,2,4,5,7).

The function can generate five min-terms-

f (A,B,C) = A’B’C’ + A’BC’ + AB’C’ + ABC.

Minimizing by Karnaugh Map we get:

Figure 10. Karnaugh map

Programmable

Programmable



Thus the five terms has been minimized to just three terms.

f(A,B,C) = AC + A’C’ + AB’

The function can now be implemented by a PLA as the OR of three terms.

Figure 11: Gate representation of PLA.

Therefore in a PLA, only those terms that are needed are generated by the AND array and rest are discarded.

6.3.5 Programmable Array Logic(PAL):

The term Programmable Array Logic (PAL) is used to describe a family of programmable logic device semiconductors used to implement logic functions in digital circuits introduced by Monolithic Memories, Inc. (MMI) in March 1978 [4]. Unlike PLA,PAL consists of a programmable AND gate array followed by a fixed OR gate array. This makes PAL devices easier to program and less expensive as it takes less silicon area than PLA. On the other hand, since the OR array is fixed, it is less flexible than a PLA device.

As PAL has a programmable AND gate array and a fixed OR gate array therefore the PAL can be programmed to generate the

required product terms in the AND gate array but the outputs of the AND gate array are connected in a fixed manner to the different OR gates of the OR gate array. Thus, unlike PLA there is no need to find any common product terms between the different output logic functions and all the output functions are realized in their minimum sum of product forms.

Figure 12: Gate representation of PAL

Fig.12 illustrates the internal connection of a four-input, eight AND-gates and three-output PAL device before programming. Note that while every buffer/inverter is connected to AND gates through links, F1-related OR gates are connected to only three AND outputs, F2 with three AND gates, and F3 with two AND gates. So this particular device can generate only eight product terms, out of which two of the three OR gates may have three product terms each and the rest of the OR gates will have only two product terms. Therefore, while designing with PAL, particular attention is to be given to the fixed OR array [6].

We can have registered output function from PAL, which is another powerful feature of a PAL chip. In this function, the output is fed in to flip-flop and the output of the flip-flop is available to the user while the complimentary output of the flip-flop is fed



back implicitly in to the chip as input to the AND gate array. This allows these outputs to be used as state machine variables and thus allows the synthesis of both the combinational and sequential part of a digital state machine in a single chip. Usually a PAL has some outputs as pure combinational outputs while some are register in nature.

6.3.6 Field-Programmable Gate Array (FPGA):

It is to be noted that the word "Programmable" in the PLDs does not indicate that they all are field-programmable; i.e. the device can be configured by the discretion of the end user. In fact, mostly they are mask-programmed during manufacture in the same manner as a ROM. This is particularly true of PLAs that are embedded in more complex and numerous integrated circuits such as microprocessors. PLAs that can be programmed after manufacture are called FPLA (Field-programmable logic array) or more popularly FPGA (Field-programmable gate array).

As the name suggest, FPGA is field programmable, i.e. the chip can be configured by customer to realize any design. It is based on gate array technology.

Figure 13: Basic architecture of FPGA

FPGAs consist of three major resources:

Configurable Logic blocks (CLB) Routing blocks (Programmable

Interconnect) I/O blocks Other Resources:

i. Memory ii. Multiplexers

iii. Global clock buffers iv. Boundary scan logic

Figure 14: Structure of a CLB

Each CLB contains:

Many slices Local routing – provides feedback

between slices in the same CLB and also to the neighboring CLBs.

A switch matrix provides access to general routing resources.



Figure 15: Internal block diagram of a slice

Each slice has a three types of output

Registered output Non registered output BUFTs associated with each CLB,

accessible by all CLB outputs.

We have come across through LUT, A LUT (Lookup table) (fig.15) is actually a memory array which makes FPGA configurable. For example a four input AND gate can be realized by LUT. Four inputs can be LUT where four bit addresses with one bit output.

FPGA DESIGN flow is four step process:

i. Define the design using VHDL, Verilog or AHDL languages.

ii. Compile and simulate the design. Find and fix timing violations.

iii. Power consumption estimates and perform synthesis.

iv. Download the design to the target FPGA board.

The key advantage of FPGA is reconfigurable hardware.

The main drawback of FPGA is that hardware is not ASIC which can lead to non-optimized power performance and density.

7. Conclusion:

The results presented in this paper focus on existing methodologies of VLSI chip

ii)

i)

iii)

iv)



design. Recent advancement in manufacturing technology has provided the opportunity to integrate millions of transistors on a embedded core based SOCs (System On Chip. True design efficiency can be achieved if these cores can be reused for various soc based products. Therefore novel approaches need to be addressed in order to provide a plug-and-play methodology for core based design paradigm.

The demand for low power VLSI circuits for portable and handheld devices will continue to grow in the future. Hence investigation on efficient low power VLSI design methodology is needed.

8. Acknowledgement:

We would like to express my gratitude heartily to all those who gave me the possibility to complete this journal. We are deeply indebted to our supervisor Prof. Krishanu Datta, Microelectronics & VLSI Department, Heritage Institute of Technology,Kolkata,India, whose help, stimulating suggestions and encouragement helped us to complete this paper.

9. Reference:

1. Principles of CMOS VLSI DESIGN,by Neil H.E.Weste and Kamran Eshragian.

2. Andres, Kent (October 1970). A Texas Instruments Application Report: MOS programmable logic arrays... Texas Instruments. Bulletin CA-158. Report introduces the TMS2000 and TMS2200 series of mask programmable PLAs.

3. Digital Logic & Computer Design” by Mano.

4. "Monolithic Memories announces: a revolution in logic design". Electronic Design (Rochelle, NJ: Hayden Publishing) 26 (6): 148B, 148C. March 18, 1978. Introductory advertisement on PAL (Programmable Array Logic).

5. “Digital Principles and Logic Design” by Arijit Saha, N. Manna.



VLSI MEMORY OVERVIEW

Author: Sampa Paul and Nilakshi Bordoloi M.Tech (VLSI), ECE-Student (2011-13)

Abstract: In this paper, several types of memory used in VLSI are mentioned. Key VLSI memory types like SRAM, DRAM, ROM and Flash Memory are described in details.

Key Words: SRAM, DRAM, ROM, Flash.

1. Introduction: The term memory identifies data (digital information essential to all digital systems) storage that comes in the form of electronic device or tapes or disks (Fig.1). The term memory is popularly used as a shorthand for physical memory. Which refers to the actual chips capable of holding data. The amount of memory required in a particular system depends on the type of the application, but, in general, the number of transistors for the information (data) storage function is much larger than the number of transistors used for logic operations and other purposes.

The ever increasing demand for the larger data storage capacity has driven the fabrication technology and memory development toward more compact design rule and consequently toward higher data storage densities. Thus maximum realizable data storage capacity of single-chip semiconductor memory arrays approximately double, every two years.

2. Memory Specifications:

The area efficiency of the memory array, i.e., the number of stored data bits per unit area is one of the key design criteria that determines the overall storage capacity and consequently the memory cost per bit. Another important design parameter is memory access time, i.e., the time required to store and /or retrieve a particular data bit in the memory array.

VLSI memory

Volatile Non-Volatile

Register File DRAM SRAM Tertiary Storage Secondary Storage ROM

EPROM PROM MROM EEPROM Flash Memory

Fig. 1: VLSI Memory



The access time determines the memory speed measured in nanoseconds, the time to access data that is stored in memory is an important performance criterion of the memory array. Finally static and dynamic power consumption of the memory array is a significant factor to be considered in the design because of the increasing importance of low-power VLSI applications.

3. Memory Architecture :

Fig. 2: Memory architecture overview.

Memory Core is (data storage structure), consist of individual memory cells arranged in an array of horizontal rows and vertical columns (Fig. 2), each cell can store only one bit of binary information. In this structure, there are 2N rows for word lines, and 2M columns for bit lines. Thus total number of memory cells in this array is 2N ×2M. To access a particular memory cell (data bit) in this array, the corresponding word line and bit line must be activated (selected) according to the address coming from the outside of the memory array. As the signal levels at the

outside (TTL signal on memory board) and inside (CMOS signal in the memory chip) of the memory array are different, the level of address is converted through memory chip interface called input address buffers. The row decoder circuit selects one out of 2N word lines according to an N-bit row address while column decoder circuit selects one out of 2M bit lines according to an M-bit column address. The performance of the chip interface circuit determines a major portion of the total memory speed especially in high performance SRAMs. Other chip control signals, e.g., Chip Select (퐶푆), Write Enable (푊퐸) are also provided to activate the read or write operation of the particular memory chip out of the memory.

4. Hierarchy of storage/Memory :

Fig 3: - Various forms of Memory

4. I. Primary Storage : Primary storage (or main memory or CPU internal memory), often referred to simply as memory, is the only one directly accessible to the CPU. The



CPU continuously reads instructions stored there and executes them as required. Any data actively operated on is also stored there in uniform manner.

4. II. Secondary Storage : Also known as external memory or auxiliary storage differs from primary storage in that it is not directly accessible by the CPU. The computer usually uses its input/output channels to access secondary storage and transfers the desired data using intermediate area in primary storage. Secondary storage does not lose the data when the device is powered down it is non-volatile. Per unit, it is typically also two orders of magnitude less expensive than primary storage. Consequently, modern computer systems typically have two orders of magnitude more secondary storage than primary storage and data is kept for a longer time there.

4. III. Tertiary Storage/tertiary memory : It provides a third level of storage. Typically it involves a robotic mechanism which will mount (insert) and dismount removable mass storage media into a storage device according to the system's demands; this data is often copied to secondary storage before use. It is primarily used for archiving rarely accessed information since it is much slower than secondary storage (e.g. 5–60 seconds vs. 1–10 milliseconds). This is primarily useful for extraordinarily large data stores, accessed without human operators. Typical examples include tape libraries and optical jukeboxes.

4. IV. Off-line storage : It is computer data storage on a medium or a device that is not under the control of a processing unit. The medium is recorded, usually in a secondary

or tertiary storage device, and then physically removed or disconnected. It must be inserted or connected by a human operator before a computer can access it again. Unlike tertiary storage, it cannot be accessed without human interaction.

Storage technologies at all levels of the storage hierarchy can be differentiated by evaluating certain core characteristics as well as measuring characteristics specific to a particular implementation. These core characteristics are volatility, mutability, accessibility, and addressability. For any particular implementation of any storage technology, the characteristics worth measuring are capacity and performance. Energy use of a storage devices that reduce fan usage, automatically shut-down during inactivity, and low power hard drives can reduce energy consumption 90 percent. 2.5 inch hard disk drives often consume less power than larger ones.

5. Types Of Memory : In the following, we will investigate popular MOS memory arrays and discuss in detail their operations and design issues related to area, speed, and power consumption for each type. Semiconductor memory is generally classified according to the type of data storage and data access.

Characteristics Primary storage

Secondary Storage

Tertiary storage

Accessible by CPU directly not directly not

directly

Types Volatile Non-

volatile Non-

volatile

Application

SRAM, DRAM, REG. FILE

Hard disk drives.

tape libraries

Table1: Comparative Study of Memory / Storage



The read/write memory is commonly called Random Access Memory (RAM) must permit the modification (writing) of data bits stored in the memory arrays, as well as their retrieval (reading) on demand. Unlike sequential-access memories such as magnetic tapes, any cell can be accessed with nearly equal access time. The stored data is volatile; i.e., the stored data is lost when the power supply voltage is turned off. Based on the operation type of individual data storage cells, RAMs are classified into two main categories. Dynamic RAMs (DRAM) and Static RAMs (SRAM).

5.I. SRAM : Static random access memory (SRAM) is a type of semiconductor memory where the word static indicates that, it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit. SRAM exhibits data remanence, but is still volatile in the conventional sense that data is eventually lost when the memory is not powered. Each bit in an SRAM is stored on four transistors that form two cross-coupled inverters. This storage cell has two stable states which are used to denote 0 and 1. Two additional

Fig 4: - A six-transistor CMOS SRAM cell.

access transistors serve to control the access to a storage cell during read and write operations. A typical SRAM uses six MOSFETs (Fig. 4) to store each memory

bit. In addition to such 6T SRAM, other kinds of SRAM chips use 8T, 10T or more transistors per bit to implement more than one (read and/or write) port, which may be useful in certain architectural requirements.

Generally, the fewer transistors needed per cell, the smaller each cell can be. Since the cost of processing a silicon wafer is relatively fixed for a particular process node, using smaller cells and hence packing more bits on one wafer reduces the cost per bit of memory.

Access to the cell SRAM is enabled by the word line (WL in fig. 4) which controls the two access transistors M5 and M6 which, in turn, control whether the cell should be connected to the bit lines: BL and 퐵퐿. They are used to transfer data for both read and write operations. Although it is not strictly necessary to have two bit lines, both the signal and its inverse are typically provided in order to improve noise margins.

During read accesses, the bit lines are actively driven high and low by the inverters in the SRAM cell. The symmetric structure of SRAMs also allows for differential signaling, which makes small voltage swings more easily detectable.

5. I. (a) SRAM Operation :

An SRAM cell has three different states it can be in: standby (the circuit is idle), reading (the data has been requested) and writing (updating the contents). The SRAM to operate in read mode and write mode should have "readeability" and "write stability" respectively. The three different states work as follows:

Standby : If the word line is not asserted, the access transistors M5 and M6 (Fig. 4) disconnect the cell from the bit lines. The



two cross-coupled inverters formed by M1 – M4 will continue to reinforce each other as long as they are connected to the supply.

Reading : Assume that the content of the memory is a 1, stored at Q. The read cycle is started by precharging both the bit lines to a logical 1, then asserting the word line WL enabling both the access transistors. The second step occurs when the values stored in Q and 푄 are transferred to the bit lines by leaving BL at its precharged value and discharging 퐵퐿 through M1 and M5 to a logical 0. On the BL side, the transistors M4 and M6 pull the bit line toward VDD, a logical 1. If the content of the memory was a 0, the opposite would happen and 퐵퐿 would be pulled toward 1 and BL toward 0. Once these BL and 퐵퐿 have a small voltage difference between them (say~100mV), a sense amplifier detects which line has higher voltage and amplify the output to tell whether there was 1 stored or 0.

Fig. 5 : SRAM operation with read / write control circuit

Writing : The start of a write cycle begins by applying the value to be written to the bit lines. If we wish to write a 0, we would apply a 0 to the bit lines, i.e. setting 퐵퐿 to 1 and BL to 0. WL is then asserted and the

value that is to be stored is latched in. Note that the reason this works is that the bit line input-drivers are designed to be much stronger than the relatively weak transistors in the cell itself, so that they can easily override the previous state of the cross-coupled inverters. Careful sizing of the transistors in an SRAM cell is needed to ensure proper operation.

5. I. (b) SRAM Applications: The power consumption of SRAM varies

widely depending on how frequently it is accessed; draw very little power and can have nearly negligible power consumption when sitting idle — in the region of a few micro-watts.

Static RAM exists primarily as: General purpose products with asynchronous interface, such as the 28 pin 32Kx8 chips (usually named XXC256), and similar products up to 16 Mbit per chip and with synchronous interface, usually used for caches and other applications requiring burst transfers, up to 18 Mbit (256Kx72) per chip

SRAM is integrated on chip as RAM or cache memory in micro-controllers (usually from around 32 bytes up to 128 kilobytes) and as the primary caches in powerful microprocessors, such as the x86 family and many others (from 8 kB, up to several megabytes).

SRAM is used to store the registers and parts of the state-machines used in some microprocessors (see register file). It is also used in FPGAs, ASICs and CPLDs.

Many categories of industrial and scientific subsystems, automotive electronics, and similar, contain static RAM. Some amount (kilobytes or less) is also embedded in practically all modern appliances, toys, etc. that implement an electronic user interface.



SRAM is also used in personal computers, workstations, routers and peripheral equipment: internal CPU caches and external burst mode SRAM caches, hard disk buffers, router buffers, etc. LCD screens and printers also normally employ static RAM to hold the image displayed (or to be printed). Small SRAM buffers are also found in CDROM and CDRW drives.

Hobbyists, specifically homebuilt processor enthusiasts, often prefer SRAM due to the ease of interfacing.

5. II. DRAM : It stores each bit of data in a separate capacitor within an integrated circuit. The capacitor can be either charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. Since capacitors leak charge, the information eventually fades unless the capacitor charge is refreshed periodically. Because of this refresh requirement, it is a dynamic memory.

The main memory (the "RAM") in personal computers is Dynamic RAM (DRAM). It is the RAM in laptop, notebook and workstation computers, in Fig. 6 Single transistor DRAM cell is shown.

Fig. 6: - 1 transistor DRAM cell

5. II. (a) DRAM Circuit Operation : Row Address Select (RAS) Present first half of address to

DRAM chip

Used to read row from memory array.

Column Address Select (CAS) Present second half of address to

DRAM chip Use to select bits from row for

read/write Cycle time RAS +CAS + Rewriting data

back to array Refresh cycle Access to refresh capacitors Needed every few milliseconds

(say, 64 msec); varies with chip. The circuit shown in Fig.7 is a very

simple circuit of dynamic RAM with a capacity of four bits. There is only one column, with four rows. Select the bit you want with the two inputs labeled "row select". The output is on the right. To write a bit, specify the bit you want to write with the "data" input and then select the "write" input. This will charge (or discharge) the appropriate capacitor. The capacitors will slowly drain over time, so each row must be refreshed periodically. To do this, select the row and select the "refresh" input.

Fig. 7: - 4-bit DRAM array



Principle of Read Operation of DRAM circuit is shown in Fig.8 for simple 4 × 4 array. To read a bit from a column, the following operations take place:

1. Initially the sense amplifier is disabled. Then the bit lines are precharged to exactly equal voltages that are in-between high and low logic levels . The bit lines are physically symmetrical to keep the capacitance as equal and therefore the voltages as equal as possible.

2. Next, the precharge circuit is switched off. Since the bit lines are relatively long, they have enough capacitance to maintain the pre-charged voltage for a brief time.

3. The desired row's word line is then driven high to connect a cell's storage capacitor to its bit line. This causes the transistor to conduct, transferring charge between the storage cell and the connected bit line. If the storage cell capacitor is discharged, it will slightly decrease the voltage on the bit-line as the precharge is transferred to the storage capacitor. If the storage cell is charged, the bit-line voltage increases only slightly. This happens as bit line capacitance is much layer than individual storage capacitor.

4. Next, the sense amplifier is turned on. The positive feedback takes over and amplifies the small voltage difference between bit-lines until one bit line is fully at the lowest voltage and the other is at the maximum high voltage. Once this has happened, the row is "open" (the desired cell data is available).

5. All columns are sensed in simultaneously and the result is sampled into the data latch. A provided Column address then selects which latch bit to connect to the external circuit.

Fig. 8: - DRAM read operation

6. While reading of all columns proceeds, current flows back up the bit lines from the sense amplifiers to the storage cells. This reinforces (i.e. "refreshes") the charge in the storage cell by increasing the voltage in the storage capacitor if it was charged to begin with, or by keeping it discharged if it was empty.

7. When done with the reading all the columns in the current row, the word line is switched off to disconnect the cell storage capacitors (the row is "closed"), the sense amplifier is switched off, and the bit lines are precharged to again.

Principle of Write Operation of DRAM circuit is shown in Fig. 9 for simple 4 × 4 array. To write to memory, the row is opened and a given column's sense amplifier is temporarily forced to the desired high or low voltage state, thus it drives the bit line to



charge or discharge the cell storage capacitor to the desired value. Due to positive feedback, the amplifier will then hold it stable even after the forcing is removed. During a write to a particular cell, all the columns in a row are sensed simultaneously just as in reading, a single column's cell storage capacitor charge is changed, and then the entire row is written back in, as illustrated in the fig. 9.

Fig. 9: - DRAM write

Typically, manufacturers specify that each row must have its storage cell capacitors refreshed every 64 ms or less, as defined by the JEDEC (Foundation for developing Semiconductor Standards) standard. Refresh logic is provided in a DRAM controller which automates the periodic refresh, that is no software or other hardware has to perform it. This makes the controller's logic circuit more complicated, but this drawback is outweighed by the fact that DRAM is much cheaper per storage cell and because each storage cell is very simple, DRAM has

much greater capacity per geographic area than SRAM.

Some systems refresh every row in a burst of activity involving all rows every 64 ms. Other systems refresh one row at a time staggered throughout the 64 ms interval. For example, a system with 213 = 8192 rows would require a staggered refresh rate of one row every 7.8 µs which is 64 ms divided by 8192 rows. A few real-time systems refresh a portion of memory at a time determined by an external timer function that governs the operation of the rest of a system, such as the vertical blanking interval that occurs every 10–20 ms in video equipment. All methods require some sort of counter to keep track of which row is the next to be refreshed. Most DRAM chips include that counter. Older types require external refresh logic to hold the counter (under some conditions, most of the data in DRAM can be recovered even if the DRAM has not been refreshed for several minutes).

Z

Characteristics SRAM DRAM

Transistor Required

Six transistors

One pass transistor and a capacitor are required

Cost More Less

Speed Faster Slower

Use Cache memory

Main memory

Density / Area less high

Operation complexity

Simple Refresh operations are necessary for correct operation.

Table2: Comparative Study of SRAM & DRAM



In Table 2, comparative performance of SRAM vs DRAM is presented.

5. (III) ROM: Computers always contain a small amount of read-only memory (ROM) that holds instructions for starting up the computer. As name implies, only retrieval of stored data and is possible in ROM, ROM does not permit modifications of the stored information contents during normal operation. It is non-volatile which means once turned off, the computer the information is still there and refresh operation in not required.

Fig. 11: ROM

Depending on the type of data programming (MROM) various ROM type memories are possible. ROM can be categorized as Mask ROM in which data is written during chip fabrication by using a photo mask, MROM can be programmed by the following ways

Metal Programming Via Programming Diffusion Programming

Fig.12: - Mask (Fuse) ROM

In case of Programmable ROM (PROM) or one-time programmable ROM (OTP),

data are written electrically after the chip is fabricated. To write data onto a PROM chip, you need a special device called a PROM programmer or PROM burner.

Depending on data erasing characteristics, PROM can be classified into Fuse ROM, in which case data is written by blowing the fuse electrically and cannot be erased or modified. For Erasable PROM (EPROM) (Fig. 13), data can be erased by exposing it to ultraviolet light (typically for 10 minutes or longer) through the crystal glass on the package.

Fig.13: - EPROM

Electrically Erasable PROM (EEPROM) is similar to a PROM, but requires only electrical voltage to erase data (Fig.14). Like other types of PROM, EEPROM retains its contents even when the power is turned off.

Fig.14: - EEPROM

Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory can be erased and rewritten faster than ordinary EEPROM. Modern NAND flash makes efficient use of silicon chip area resulting in individual ICs with a capacity as high as 32 GB as of 2007; this



feature, along with its endurance and physical durability, has allowed NAND flash to replace magnetic disk in some applications (such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used as a replacement for older ROM types, but not in applications that take advantage of its ability to be modified quickly and frequently. Flash memory is similar to EEPROM. The principal difference is that EEPROM requires data to be written or erased one byte at a time whereas flash memory allows data to be written or erased in blocks. This makes flash memory faster.

5. IV. Flash Memory : Flash memory technology is a mix of EPROM and EEPROM technologies. The term flash was chosen because a large chunk of memory could be erased at one time. The name, therefore, distinguishes flash devices from EEPROMs, where each byte is erased individually. Flash memory technology is today a mature technology. It is a strong competitor to other nonvolatile memories such as EPROMs and EEPROMs, and to some DRAM applications.

5. IV. (a). Flash Memory Cell: The more common elementary flash cell consists of one transistor with a floating gate, similar to an EPROM cell. However, technology and geometry differences between flash devices and EPROMs exist. In particular, the gate oxide between the silicon and the floating gate is thinner for flash technology. Source and drain diffusions are also different. These differences allow the flash device to be programmed and erased electrically. Fig. 15 shows a comparison between a flash memory cell and an EPROM cell from a same manufacturer (AMD) with the same technology complexity. The cells look similar since the gate oxide thickness and

the source/drain diffusion differences are not visible in the photographs.

Fig. 15: AMD EPROM Vs. AMD Flash Memory Cells

Other flash cell concepts are based upon EEPROM technology. Fig. 16 shows a split-gate cell and Fig. 17 shows a transistor with the tunnel oxide in only a part of the oxide under the floating gate. These cells are larger than the conventional one-transistor cell, but are far smaller than the conventional two-transistor EEPROMcell.

Fig. 16: Split Gate Flash Cell

The electrical functionality of the flash memory cell is similar to that of an EPROM or EEPROM. Electrons are trapped onto the floating gate during programing. These electrons modify the threshold voltage of the



storage transistor. Electrons are trapped in the floating gate due to Fowler-Nordheim tunneling (as with the EEPROM) or hot electron injection (as with the EPROM). During erase, Electrons are removed from the floating gate using Fowler-Nordheim tunneling as with the EEPROM. Fig. 18 summarizes the different modes of flash programming.

5. IV. (b). Flash Architecture : Designers have developed multiple flash memory array architectures, yielding a trade-off between die size and speed. NOR,

Fig.17: Tunnel Window Flash Cell NAND, DINOR, and AND are the main architectures developed for flash memories.

Table3: Flash Chip and Cell size Comparison NOR Flash: The NOR architecture is currently the most popular flash architecture. It is commonly used in EPROM and EEPROM designs. Aside from active transistors, the largest contributor to area in the cell array is the metal to diffusion contacts. NOR architecture requires one contact per two cells, which consumes the most area of all the flash architecture alternatives. Electron trapping in the floating gate is done by hot-electron injection. Electrons are removed by Fowler-Nordheim tunneling. NAND Flash: To reduce cell area, the NAND configuration was developed. Fig. 19 shows the layouts of NOR and NAND configurations for the same feature size. The NAND structure is considerably more compact as there is no metal to diffusion contact per pair of cells.

Fig. 19: Comparison of NOR and NAND Architectures

Fig. 18 : Comparison Between the different types of Flash Programming



A drawback to the NAND

configuration is that when a cell is read, the sense amplifier sees a weaker signal than that on a NOR configuration since several transistors are in series. Fig. 20 and Table-4 describe the NAND architecture from Toshiba. The weak signal slows down the speed of the read circuitry, which can be overcome by operating in serial access mode. This memory will not be competitive for random access applications. Table-5 shows a speed comparison of NOR and NAND devices. DINOR Flash :DINOR (divided bit-line NOR) and AND architectures are two other flash architectures that attempt to reduce die area compared to the conventional NOR configuration. Both architectures were co-developed by Hitachi and Mitsubishi.

Fig. 20: Toshiba Flash NAND Cell

Table 4: Toshiba’s 32Mbit Flash Characteristics

Table 5: NOR vs NAND Access times

The DINOR design uses sub-bit lines in polysilicon. Mitsubishi states that its device shows low power dissipation, sector erase, fast access time, high data transfer rate, and 3V operation. Its device uses a complex manufacturing process involving a 0.5µm CMOS triple well, triple-level polysilicon, tungsten plugs, and two layers of metal. Fig. 21 shows the DINOR architecture. AND Flash: With AND architecture, the metal bit line is replaced by an embedded diffusion line. This provides a reduction in cell size. The 32Mbit AND-based flash memory device proposed by Hitachi needs a single 3V power supply. In random access mode, the device is slower than a NOR-based device. HitachiÕs device is specified to operate with a 50ns high-speed serial access time.

Fig. 21: DINOR Architecture

Table 6: Flash and DRAM Cell size comparison

Fig. 22. a:NOR Architecture



Fig. 22.b: NAND Architecture

Fig. 22.c: DINOR Architecture

Fig. 22.d: AND Architecture Fig. 22 presents a review of the different flash architectures. Table 6 shows a cell size comparison between DRAM, NAND, and NOR flash architectures. The NOR flash one-transistor cell has roughly the same size as a DRAM cell for the same process generation.

Several companies strongly support one type of flash architecture. However, to hedge their bets and to offer products for several different end uses, many firms have elected to build flash devices using more than one type of architecture. Table-7 shows vendors’ support of flash memory architectures.

Table 7: vendors’ Support of Flash Memory Architecture 5. IV. (c). Multi-Level Storage Cell (MLC): Most of the major flash

companies are working to develop their version of a multi-level cell flash device. The goal of this device is to store information in several different levels inside the same memory cell. The most common developments are those that store information on four different levels in the same cell.

In multi-level cell, there are two difficult issues that must be addressed by manufacturers. The first is to tightly control the program cycle that gives four different levels of charge. The second difficulty is to accurately recognize, during the read cycle, the four different threshold voltages of the programmed transistor.

Flash devices must be reliable even in worst case conditions. External parameters (power supplies, temperatures, etc.) may vary from the time the flash device is programmed to the time it is read.

During each of the past several years, papers were presented by most of the major flash manufacturers regarding multi-level cell technology. Intel presented a paper on its four-level storage work at the 1995 ISSCC conference. Samsung presented a 128Mbit four-level NAND flash cell and NEC presented a 64Mbit four-level NOR flash cell at 1996 ISSCC conference.

At the 1995 Symposium on VLSI Circuits, Toshiba presented a development for future high density MLC NAND flash



memories. At the December 1996 IEDM Conference, SGS-Thomson presented a study on MLC for the different flash architectures and their trade-offs. During the first half of 1997, Intel announced that it sampled 64Mbit MLC parts. SanDisk, along with manufacturing partner Matsushita, used the technology to boost single-chip capacity to 64Mbit. It refers to its multi-level cell technology as “Double Density” or “D2”.

Table 8: Trade-off of MLC using different flash architecture

SanDisk claims that the 64Mbit die is only 10 percent larger than the company’s 32Mbit die. Meanwhile, the company is also working on a 256Mbit Double Density flash device. 5. IV. (d). Flash Power Supply Requirements: Currently, flash power supplies range from 5V/12V down to 2V. Flash memory power supplies vary widely from vendor to vendor. There are two main reasons for this variation. First, flash cells need high voltage for programming. With different types of flash architectures and designs, different program/erase techniques (Fowler-Nordheim tunneling or hot-electron

injection) exist. These architectures do not share the same voltage requirements. For example, high voltage with no current can be generated internally with a voltage pump. However the source/drain current of hot-electron injection requires an external power supply. The second reason for wide power supply variation is that there are many applications that currently require different power supply levels. Some applications may require low-voltage flash devices while others operate well using flash device with high-voltage characteristics. Manufacturers can propose different types of power supplies that best fit a specific application.

SmartVoltage is an Intel concept. However, other manufacturers including Sharp and Micron have signed on to license the technology. SmartVoltage parts can be used for several power supplies. Read voltage may be 2.7V, 3.3V or 5.5V and programming voltage may be 3.3V, 5V or 12V. Flash memories are used in a wide variety of applications as illustrated Figure 10-18. All these applications allow vendors to offer several flash solutions. Using the NAND flash architecture for serial access applications is one example.

Table 9: Flash Diversity 5. IV. (e). Flash Reliability Concerns: There are three primary reliability concerns of a flash memory IC. They are data retention, thin oxide stress, and over or under erasing/programming.



Fig. 23: Erased Threshold Voltage Shift for Flash Memory Cell

Regarding erase/program, flash ICs that use hot electron injection for trapping electrons in the floating gate are programmed (data equal to 0) by capturing electrons in the floating gate, as with an EPROM.

Flash ICs that use Folwer-Nordheim tunneling for trapping electrons in the floating gate will be programmed (data equal to 0) by removing the electrons from the floating gate, as with an EEPROM. The reliability concern is to either over program or over erase as shown in Figure 23.

6. Conclusion:

As of 2011, the most commonly used data storage technologies are semiconductor, magnetic, and optical, while paper still sees some limited usage. Media is a common name for what actually holds the data in the storage device. Some other fundamental storage technologies have also been used in the past or are proposed for development such as TRAM, ZRAM, TTRAM, CBRAM, SONOS, RRAM, Racetrack memory, NRAM, Millipede.

Table 11: Other Memory Details.

Fig. 24: Memory capacity vs. Access Speed & Cost per byte.

7. Acknowledgement: The help given by Professor

Krishanu Datta, Microelectronics & VLSI Department, Heritage Institute of Technology, Kolkata, India is greatly appreciated. He reviewed the paper and

Types Use Basic components Semiconductor

(volatile and non-volatile)

Personal Computer semiconductor-based integrated circuits

Magnetic (non-volatile)

Magnetic disk, Floppy disk, Hard disk drive,

Magnetic tape

to store information different patterns of magnetization on a magnetically coated

Optical (non-volatile)

CD, CD-ROM, DVD, BD-ROM, CD-R, DVD-R, DVD+R, BD-R: CD-

RW, DVD-RW, DVD+RW, DVD-RAM.

stores information in deformities on the surface of a circular disc and reads this information by illuminating the surface with a laser diode and observing the reflection.

Paper (non-volatile)

Punch card Information was recorded by punching holes into the paper or cardboard medium and was read mechanically (or later optically) to determine whether a particular location on the medium was solid or contained a hole.

Characteristics PROM EPROM EEPROM Flash

Writable Once Yes Yes Yes

Erase size n/a Entire chip

Byte Sector

Maximum erase cycle

n/a Limited Limited Limited

Cost Moderate Moderate Expensive Moderate

Speed

Fast Fast Fast to read, slow to erase/write

Fast to read, slow to erase/write

Table10: Comparative Study of different ROM Chips



provided excellent suggestions and informations. 8. References:

[1] YUAN TAUR University of California, San Diego; TAK H. NING IBM T. J. Watson Research Center, New York

[2] SUNG-MO KANG & YUSUF LEBLEBICI; CMOS digital Integrated Circuits - Analysis & Design; Tata McGraw-Hill.

[3] Wikipedia, the free encyclopedia

[Google].



VLSI COMBINATIONAL CIRCUIT TOPOLOGIES Author: Madhurima Moitra & Jaya Bar M.Tech (VLSI), ECE-Student (2011-13)

Abstract: This paper presents a brief introduction to popular combinational login family in VLSI circuits. In this paper advantages and disadvantages of different logic family are discussed.

Keywords: CMOS, pseudo nMOS, domino, transmission gate.

1. Introduction:

In a 1886 letter, Charles Sanders Peirce described how logical operations could be carried out by electrical switching circuits. Starting in 1898, Nikola Tesla filed for patents of devices containing electro-mechanical logic gate circuits. In 1907, Lee De Forest used Fleming Valve as AND logic. In 1937, Claude E. Shannon introduced the use of Boolean algebra in the analysis and design of switching circuits. In 1924, Walther Bothe invents the first modern electronic AND gate [6].

Logic gates are primarily implemented transistor acting as electronic switches. Today these circuits are implemented by CMOS (Complimentary-Metal-Oxide-Semiconductor) or Static CMOS. CMOS is a technology for constructing integrated circuits. Frank Wanlass patented CMOS in 1967. Two important characteristics of CMOS devices

are high noise immunity and low static power consumption. Significant power is only drawn when the transistor in the CMOS device are switching between on and off states. Consequently, CMOS devices do not produce as much heat as other types of logic family, like transistor-transistor logic (TTL) or NMOS logic. CMOS also allows a high density of logic functions on a chip. It was primarily for this reason CMOS became the most used technology to be implemented in VLSI chips [6].

However Static CMOS circuit have some disadvantages. To avoid the disadvantages, the concept of ratioed circuits and domino logic got introduced.

2. Static CMOS:

In VLSI we can design any combinational logic circuit by using Complementary MOSFET (CMOS). In this type of logic gates, we use both PMOS and NMOS. In CMOS we use two types of network, i.e., Pull Up Network (usually made by PMOS) and Pull Down Network (usually made by NMOS) (Fig.- 1) [1].



Fig. 1: General Form

CMOS implementation of an inverter is shown in fig. 2.

Fig. 2: CMOS Implementation of an Inverter

For CMOS inverter when the input is ‘0’ then the PMOS will be in ON state and the NMOS will be in OFF state and we get the output from the VDD as the high output. Again, when the input is ‘1’, then the PMOS will be in OFF state and NMOS will be in ON state and we get the output from the ground as the low output [1].

Similarly CMOS implementation of NAND gate and NOR gate are shown in fig. 3 and fig. 4 respectively [1]. An important characteristic of a CMOS circuit is the duality that exists between its PMOS transistors and NMOS transistors. A CMOS circuit is created to allow a path always to exist from the output to either the power source or ground. To accomplish this, the set of all paths to the voltage source must be the complement of the set of all paths to ground. This can be easily accomplished by defining one in terms of the NOT of the other [6].

Fig. 3: CMOS Implementation of NAND Gate

Fig. 4: CMOS Implementation of NOR Gate



2. A. Implementing Boolean Expression by Static CMOS:

To design any combinational logic circuit by using CMOS we have to follow below steps:

1. Using K-Map we have to simplify the given Boolean expression to minimize number of transistors.

2. With the help of De Morgan’s Law we have to modify the Boolean expression to make it CMOS implementable.

3. Implement the design using CMOS gates.

2. B. Example of Implementing Boolean Expression by Static CMOS: Let, assume a Boolean expression -

Step – 1: Using K-Map:

By simplifying the given expression we can say that:

Step – 2: With the help of De Morgan’s Law we can modify the equation suited for CMOS implementation:

Step – 3: The two options of Static CMOS implementation of the modified Boolean expression is shown in fig. 5a and fig. 5b–

Fig. 5a: Static CMOS Implementation of the given example (option 2)

Fig. 5b: Static CMOS Implementation of the given example (option 1)



2. C. Advantages of Static CMOS Circuits:

1. Standby power of this type of circuit is low.

2. This circuit has excellent noise immunity.

3. Since one type of MOS is in ON state at a time, there is no contention current present in this type of circuit.

2. D. Disadvantages of Static CMOS Circuits:

1. As a large number of PMOS transistors are used in this type of circuits, the input capacitance of the circuit is large limits speed of the gates.

2. Required area for this circuit is large. 3. As the input capacitance is large, the

dynamic power of Static CMOS circuit is large.

3. Ratioed Circuits:

To reduce the number of MOS transistor and the input capacitance the concept of pseudo NMOS ratioed circuit was introduced[2]. In this type of circuit the total Pull Up Network is replaced by a single PMOS acting as active resistive load [2].

3. A. Pseudo nMOS:

In this circuit the Pull Down Network is NMOS circuit. As the Pull Up Network is replaced by a single PMOS with gate terminal connected to ground, PMOS is always in ON state, i.e., the PMOS is acting as a resister. General form of any circuit implemented by Pseudo nMOS is shown in

fig. 6. In this circuit the input capacitance is 60% lower than Static CMOS. VOL depends on W/L ratio of P load transistor with respect to effective W/L ratio of N Pull Down Network. Since it is ratioed logic, the engineering effort to size the gate is more than Static CMOS [1].

Fig. 6: General Form

Fig. 7: Pseudo nMOS Implementation of an Inverter

Fig. 8: Pseudo nMOS Implementation of NAND Gate



Fig. 9: Pseudo nMOS Implementation of NOR Gate

3. B. Advantages of Pseudo nMOS Circuits:

1. The circuit has less number of transistors than the static CMOS circuit, the capacitance is low and the circuit is faster than the static CMOS [2].

2. Due to less capacitance, dynamic power is also lower than Static CMOS [2].

3. C. Disadvantages of Pseudo nMOS Circuits:

1. When the output is low, the Pull Down Network is in ON state. Since PMOS low transistor is always ON state, a constant d.c. current flows from VDD to ground. This current is known as contention current. Static power dissipation of this circuit is high due to contention current.

2. The noise immunity of this circuit is low [1] due to non zero VOL.

4. Domino Logic:

To avoid the disadvantages of both Static CMOS and Ratioed circuit, a new type of logic family was introduced known as Domino Logic. In this type of circuit one

extra NMOS is connected in series with the pull down network. This NMOS is known as Footed NMOS. Here also the pull up network is replaced by a single PMOS, which is called precharge transistor. A single clock input is connected with these PMOS and NMOS. This type of circuit has problem of leakage and charge sharing noise. To avoid this problem a keeper circuit is added with this circuit, known as Domino Keeper.

In this keeper circuit a PMOS is added parallely with the precharge PMOS transistor. This PMOS transistor is a weak transistor. When domino output is high, this keeper transistor is in ON state and we get a normal non-floating output [2].

Fig. 10: General Form of Domino Logic

Fig. 11: Implementation of Inverter by Domino Logic



Fig. 12: Implementation of NAND Gate by Domino Logic

Fig. 13: Implementation of NOR Gate by Domino Logic

4. A. Advantages of Domino Logic:

1. As the input capacitance is low, this type of circuit is faster than other circuit, i.e., delay is minimum.

2. Required area for this logic is less than Static CMOS.

3. By using precharge and evaluate transistor contention current is avoided [2].

4.b. Disadvantages of Domino Logic:

1. As the input capacitance is low, it is expected that the dynamic power become low. But due to the higher activity factor than the Static CMOS

circuit the dynamic power become high for domino gate.

2. In this type of circuit clock input is used. Hence additional design effort is required related to clock connection.

3. This circuit needs domino keeper design. Due to keeper design, more design effort is needed to design domino circuit [2].

5. Pass Transistor Logic:

One of the most popular application of combinational circuit is MUX and EX-OR structure. Pass transistor can provide more efficient way of implementing these logics which are popularly used in various data path and control path of DSP processor .In addition pass transistor is key to implement the sequential logic.

5. A. Pass Transistor Logic Operation:

In this logic transistor are uses as switches to pass logic levels between nodes of a circuit. MOS transistors are nothing but controlled switches where voltage at gate controls path from source to drain. The symbol and truth table of pass transistor is shown in fig 14 and 15 [1].

Fig. 14: Transistors symbols and switch –level models



Fig. 15: Transistors symbols and truth table

An NMOS transistor is an almost perfect switch when passing a ‘0’ fig (17) and thus we say it passes a strong ‘0’. The NMOS transistor is not good at passing a ‘1’. The high voltage level is less than VDD fig (16). We say it passes a degraded or weak '1.' A P-MOS transistor has the opposite behavior, passing strong 'Is but degraded '0's. When an N-MOS or P-MOS is used alone as switch, we sometimes call it a pass transistor [1].

Logic "1" Transfer [3]:

Fig. 16: Circuit for the logic "1" transfer event.

Fig. 17: Variation of Vx as a function of time during logic "I" transfer.

N-MOS Pass Transistor – Logic ‘0’ Transfer [3]:

Fig. 18: Circuit for the logic "0" transfer event.

Fig. 19: Variation of Vx as a function of time during logic "O" transfer

5. B. Cascaded Pass Transistors:

As shown in figure (20) with an n-channel transistor high voltages are degraded by three low voltage VT. Cascaded NMOS or PMOS transistors can cause significant noise problem at the output as shown in figure (20) where actual value of logic 1 output may look like logic 0 [6].

Fig. 20: Cascaded Pass Transistors



To avoid the noise problem of PMOS and NMOS transistor logic, CMOS transmission gate is introduced.

5. C. The CMOS Transmission gate:

The CMOS transmission gate consists of one NMOS and one PMOS transistor fig (21), connected in parallel. The gate voltages applied to these two transistors are also set to be complementary signals. As such, the CMOS TG operates as a bidirectional switch between the nodes A and B which is controlled by signal C. If the control signal C is logic-high, i.e., equal to VDD, then both transistors are turned on and provide a low-resistance current path between the nodes A and B. If the control signal C is low, then both transistors will be off, and the path between the nodes A and B will be an open circuit. This condition is also called the high-impedance state.

The substrate terminal of the NMOS transistor is connected to ground and the substrate terminal of the PMOS transistor is connected to VDD. The substrate-bias effect for both transistors depending on the bias conditions. Figure (21) shows three other commonly used symbolic representations of the CMOS transmission gate [3].

Fig. 21: Four different representations of the CMOS transmission gate (TG).

Fig. 22: Equivalent resistance of the CMOS transmission gate, plotted as a function of the output voltage [5].

5. C. I. CMOS Transmission Gate Logic Design: 2-input MUX :

Fig. 23: 2-input MUX implementation by CMOS Transmission Gate

CMOS Transmission Gates can be used in logic design.Operation: Three regions of operation (charging capacitor 0 to VDD). Region1 - (0 < Vout < |Vtp|) – both



transistors saturated. Region 2 :(|Vtp| < Vout < VDD – Vtn) – N saturated, P linear. Region3: (VDD - Vtnc) < Vout < VDD) – N cut-off, P linear. Resistance of a CMOS transmission gate remains relatively constant (when both transistors are turned on) over the operating voltage range 0 to VDD [1].

5. C. II. XOR Implementations using CMOS Transmission Gates:

Fig. 24: XOR implementation by CMOS Transmission Gate.

• The top circuit implements an XOR function with two CMOS transmission gates and two inverters.8 transistors total (4 fewer than a complex CMOS implementation) [1].

Fig. 25: XOR implementation by CMOS Transmission Gate

• The XOR can be implemented with only 6 transistors with one transmission gate, one standard inverter, and one special inverter gate powered from B to B’ (instead of VDD and VSS) and inserted between A and the output F [1].

5.d. Pass Transistor Design Methodology:

Choose proper number of inputs as MUX select inputs and data inputs.

Plot variables on K-maps accordingly to optimize the logic.

Implement the logic using minimum number of pass transistor[4] .

Example: 푓 = 푎 푏 + 푏 푐 푑 + 푎푐푑

Now plot the K-Map using d and d’ [3]

Boolean expression for f can be expressed as

푓 = 1.푎 푏 + 푏 푐 푑 + 푎푐푑 + 푎 푏 . 0+ 0.푎 푏푐



Fig. 26: Implementation of Boolean Expression by Pass Transistor Logic

Fig. 27: Implement the Logic Using Minimum Number of Pass Transistors 6. Conclusion: Combinational circuit is very important part of VLSI circuit specially in pipeline architecture. Most of CPU as DSP processor have pipeline architecture and combinational circuit performance in

terms of power, delay and area is key to overall performance of a processor. In this paper, advantages and disadvantages of mostly used combinational logic family are discussed in terms of power, area and delay performance.

7. Acknowledgement:

We would like to thank Prof. KRISHANU DATTA, Microelectronics & VLSI Department, Heritage Institute of Technology, Kolkata, India, for his enduring guidance and encouragement throughout our studies. He gave us moral support and guided us to write the paper. He had been very kind and patient while suggesting us the outlines of this project and correcting our doubts. We are really very thankful to him for his overall supports. 8. References:

[l] N. Weste and K. Eshragian , Principles of CMOS VLSI Design: A Systems Perspective,Addison-Wesley.

[2]CMOS_VLSI_Design, Neil Waste and David Harris

[3] CMOS Digital Integrated Circuits Analysis and Design, SUNG-MO (STEVE) KANG (University of Illinois at Urbana- Champaign) YUSUF LEBLEBIGI (Worcester Polytechnic Institute, Swiss Federal Institute of Technology-Lausanne)

[4] IAENG International Journal of Computer Science, 37:1, IJCS_37_1_06. Independent-Gate FinFET Circuit Design Methodology .Michael C. Wang. [5] Pass Transistor and transmission gate logics. Kuwait University Electrical Engineering Department EE434 Electronics. [6] Wikipedia, the free encyclopedia [Google].



PERFORMANCE ESTIMATION OF VLSI DESIGN

Author: Arindam Sadhu M.Tech (VLSI), ECE-Student (2011-13)

Abstract: This paper explores performance estimation of VLSI design using simple RC delay model based an Elmore Delay Method. In this paper Pre-layout & Post-layout VLSI design flow for delay convergence is also shown.

Keyword: Propagation delay , NMOS, PMOS, Rise Time, Fall Time, Elmore Delay, Layout.

1.Introduction: The idea of Metal oxide semiconductor field effect transistor (MOSFET) was patented by J. E. Lilienfeld in early1930s era, well before invention of Bipolar junction transistor(BJT).MOS technologies became popular in 1960s and With Complementary MOS device where both N-type and P-type MOSFET were fabricated on single chip. Introduction of CMOS at mid 1960s initiated a revolution in semiconductor industry.[1]

Fig:1 CMOS Inverter

Fig:2 CMOS characteristics curve.

Fig:1 is complementary metal oxide semiconductor field effect transistor where PMOS and NMOS are fabricated on single chip. Fig2 describe i/p vs o/p characteristic curve of CMOS. The circuit topology is complementary push pull ,For low input the PMOS drives the output node when NMOS act as a load. For high input its operation is vice-versa. Consequently both devices contribute equally to the circuit operation characteristics. The advantages are that steady state power dissipation of CMOS circuit is virtually negligible except small power dissipation due to leakage current and CMOS exhibits a full output voltage swing between 0V to Vdd, with excellent noise immunity.

CMOS technology rapidly captured the digital market. CMOS gates dissipates power only during switching. It was also soon discovered that dimension of MOS device scaled down to get more high



package density and high performance. Also CMOS scaling can be easily done than those of other types transistor. The main principle force of device scaling has been to improve the speed of the device. In this paper we are discussing about CMOS delay and estimation.[1] 2. Delay Modeling: A digital circuit is constituted by transistor, usually organized into logic gates. Typically approaches to timing analysis divide the design into stages, with each stage consisting of a gate output and interconnect path it drives. Digital systems are often designed at the gate or cell level making it possible to pre-characterize the gate or cell delay for timing analysis. The cell delays are generally expressed empirically as a function of load capacitance and input signal transition. A delay calculator expects a certain waveform at the Fan-out as a function of the waveform at the switching input pin. 2.1 Delay: Delay is defined as the interval between the time when input waveform crosses a specified threshold and when output waveform crosses a given threshold.These two point are usually set as the point at which the waveform reach half of their final value(50% point) while in transition.

Fig.3 DELAY

Transition time of a waveform says how long it takes to reach its final value in transition. Fig.3 describe clearly about delay and transition time.

The time from a half Vdd point of the input to a half Vdd point of the output is defined as a delay tpHL, propagation delay for high to low, in discharging case when NMOS are on. In charging up case, when PMOS are on, at this time propagation delay low to high (tpLH) is defined in same way. For calculating the average delay, we have to take only average of tpHL and tpLH.

Fig.4 describe only tpHL

Fig:5 describe both tpHL and tpLH

Two more parameter for timing analysis rise time and fall time. Rise time (tr) express by output crossing 0.1 vdd to 0.9 vdd OR 0.2 vdd to 0.8 vdd. Also Fall time(tf) is expressed by output crossing 0.9vdd to 0.1vdd. OR 0.8vdd to 0.2vdd.



Fig.6 Rise time & Fall time Curve

Fig:6 describes about fall time and rise time of waveform. Here we are taking 0.9 vdd to 0.1vdd for fall delay and 0.1 vdd to 0.9 vdd for rise delay.

2.2 CMOS Propagation Delay Summary: When we consider charging case PMOS of CMOS is on and NMOS is off. According to rules of transistor PMOS is working in two stages, firstly in linear region and secondly in saturation region.Fig:8 describe linear region and Fig:7 indicate saturation region of PMOS.

So for different region we get two different delay. Summing this two delay we get total

Now if threshold voltage VT,p is n times(n any integer and<1) of input voltage Vdd then we can write the above equation like

The load capacitor(CL) is discharged via NMOS when PMOS is off. Just like PMOS, NMOS also works at two different region. Following figures are defining of two region of NMOS.

Summing two different delay of two region of NMOS we get falling propagation delay equation

The above expressions for propagation delay can be reduced to the following simplified PMOS delay or tpLH(Low to High).The equation is following form by defining n = VTN/VDD for falling output, we get

In equation (2) & (4) if we put = N = P

equal rise and fall time then we get a common delay equation of CMOS



3. Simplified Delay Analysis Model For Initial Sizing : The emphasis of this paper is on designing high speed custom CMOS chips with as small effort as possible. In order to accurately predict the performance of a circuit or how the performance will change when a delay parameter is twicked, the designer has a better hope of getting meaningful results from more detailed simulation using spice. Any analytical model, agreeing to within 3% of SPICE on inverter delay, takes three pages to express and is far too difficult to use without computer simulation anyway. These models have value in understanding the precise shape of switching waveforms, in analyzing certain analogous circuit structures, and in simulation, but are completely unmanageable for delay estimation in digital circuits.[2] Interestingly, we can predict delay fairly well by just modeling a transistor with an resistance from source to drain when ON and a few capacitances. Figure 9 shows the resistances of several geometries of NMOS and PMOS transistors.

As shown in fig:9.1, for a process node we define a unit size NMOS having 4/2[W/L ratio] with minimum channel length, width and diffusion contact having resistance

R.As we know carrier mobility of PMOS is lower than NMOS so channel resistance is high of PMOS than NMOS. Mobility ratio is typically 2-3.For hand handling method it is convenient to assume that channel resistance of PMOS is 2R.Again at fig:9.2 size of NMOS increased i.e. 8/2.As resistance is inversely proportional to width. Since width is increased by twice so resistance decreased by twice i.e. R/2 for NMOS and R for PMOS. But as per fig: 9.3 channel length has increased. Resistance is directly proportional to the length. So resistance of NMOS having same width and length is 2R. Now we have to calculate the value of resistance R. If drain to source voltage vds is very small, then non-saturation current equation becomes Idsn= N(Vin-Vth)Vdsn So channel resistance R can be defined as R=Vdsn/Idsn=1/[N(Vin-Vth)] Where N=gain of transistor Vin=Input voltage Vth=Threshold voltage The primary capacitance is from gate to source. Parasitic diffusion capacitances on the source and drain are also large enough to be important. For a unit size device, if gate source capacitance is C, and the diffusion capacitance is Cdiff. C typically doesn’t change much between process generations because linearly shrinking channel length and oxide thickness cancel each other . Cdiff depends on doping levels but is usually comparable to C for contacted diffusion regions for advanced process nodes. The value of capacitance C, for unit size device can be found from process team An product of R and C is provides us delay of circuit, denoted by (also called intrinsic delay).The RC delay calculation is hand handling method. Let us take an example of “f” number of fan out inverter. [2]



The delay is computed by the Elmore delay model, which approximates delay as the sum over all circuit nodes of the resistance between that node and VDD/GND times the total capacitance on the node. For an inverter, rising and falling delays are identical, with a resistance R multiplied by the total capacitance of C +2C + 3fC]R =3RC[f+1] 3.1 As a second example we consider a two input Nand gate with h no of fan out. CMOS circuit diagram as follows(fig:12).

For a 2-input NAND gate, there are four transition input possibility[(10,10), (10,1) ,(1,10), (01,01)].For (10,10) input only PMOS will be active and NMOS is off. When input is 01,01 both NMOS are on and PMOS are off. At this condition we can calculate fall time delay. Let us calculate rise time when input is10,10.As two PMOS are in parallel, following figure describe the rise circuit.

From above equation and RC delay definition rise time is

r=(3+2h)RC

Again for 01,01 input, both NMOS are on and we can draw discharging circuit like following

From above circuit we can calculate fall time fC(R/2)+[(6+4h)c(R/2+R/2)]

=(7+4h)RC

Let us consider input transition 10,1,then upper NMOS deactive and one PMOS is also off. So it will also help to calculate the rise time delay. Circuits are look like



For this transition rise time delay is

=(6+4h)RC

.But if input transition is 1,10 ,then one PMOS and upper NMOS are on stage. Figure looks like following

The above circuit also act as a rise circuit but delay or rise time will take more than previous 00, 01 input condition.so rise time for this circuit is

r=(6+4h)CR+(R+R/2)2C

=(9+4h)RC

So for rise time this is the worst case of 2-input NAND gate with f number of fanout and should be used for rise delay calculation.

All above RC delay calculations are done without considering any wire delay. If we consider wire delay , then for rise time and fall time figure will look like fig 13.

3.2 Wire Delay: We can also calculate rise time and fall time with wire capacitance Cw.

Fig 13 describe this one.

From above circuit we can draw rising circuit with resistance and capacitance. Figure is following [when transition are 10,10]

From Fig 14.1 rise time is

r=[(6+n+4h)RC]/2

where Cw=nC

Again for falling circuit we can calculate falling time[when transition is 01,01]

Fall timef=(n+7+4h)RC If we compare this two result with no wire case, this is clear that delay is increasing. So whenever we are calculating delay of circuit then we have to consider wire delay also.



4. PRE-LAYOUT & POST-LAYOUT DESIGN:

FIG:15



The above Block diagram describes a typical VLSI design flow of vlsi design. In above block, there are three parts : pre-layout design, layout creation and post- layout design.

4.1 Pre-Layout Design: This part consists of logic design, circuit design and circuit simulation.

4.1.1 Logic Design: In this step the control flow, word widths, register allocation, arithmetic operation and logic operation of the design that represents the functional design are created in terms of logic gates and verified. The behavioral description of register is called Register Transfer Level(RTL) description. RTL can be expressed in a Hardware Description Language(HDL), such as VHDL or Verilog. This description can also be used for logic simulation and verification.

4.1.2 Circuit Design: The purpose of circuit design is to develop a circuit representation based on the logic design. The boolean expressions of logic gates are converted into a circuit representation by taking into consideration the speed and power requirements of the logic design. Circuit simulation is used to verify the timing of each component. The circuit design is usually expressed in terms of netlist. In many design netlist is created automatically form logic (RTL) description by using logic synthesis tools.

4.1.3 Circuit Simulation: The functional simulation process assume

negligible propagation delay time of the logic gates, which is not true for practical circuits. There is always sometime needed for signals to propagate through logic gates ,therefore it is necessary to take care of propagation delays while designing digital circuit. For this purpose circuit simulation is used, which simulates the actual propagation delays in the technology chosen for implementation of the design. The model is used for circuit simulation must take care of delays associated with the chip. If the target device is CPLD and logic cell for an FPGA target device, and the delay through the interconnection wire.

4.2 Layout Or Physical Design: In this step the circuit representation (or netlist) is converted into geometric representation. This geometric representation of a circuit is called layout that conforms to constraints imposed by the manufacturing process, the design flow, and the performance requirements[3].In many cases, physical design can be completely or partially automated and layout can be generated directly from netlist by Layout Synthesis tools.

4.3 Post-Layout Design: This part consists of design rule checking, parasitic extraction and post layout simulation.

4.3.1 Design Rule Checking(DRC): Design rule checking is a process which verifies that all geometric patterns meet the design rules imposed by fabrication process.



Layout Versus Schematic(LVS):The extracted description is compared with circuit description to verify that then layout is logically equivalent to schematic or not.

4.3.2 Parasitic Extraction & Back Annotation: Once the layout is verified DRC and LVS clean, the parasitic resistance and capacitance from the layout geometry is extracted in terms of parasitic netlist. That netlist is added to prelayout simulation netlist for accurate post layout simulation. This technique is known as Back Annotation.

4.3.3 Post Layout Simulation : In post layout simulation delay performance of VLSI circuit is re-evaluated with parasitic coming from layout extraction. Depending on the result ,solution should be searched with (i) logic only fix, (ii) Circuit sizing and (iii) logic optimization in order of priority to reduce design effort (see Fig: 15).

5 Conclusion: We have shown that for quick and accurate delay estimation, circuit designers need simple delay estimation techniques that are almost equally efficient with respect to detailed simulation. We have explored the RC delay model which with practice allows rapid delay estimation and which provides a simple analytic expression of delay. If we want to calculate accurate delay, then we have to take help of SPICE simulator. We have also shown pre-layout and post-layout VLSI design flow for timing convergence.

6. Acknowledgement : With the utmost gratitude, I

would like to thank Prof. KRISHANU DATTA, Microelectronics & VLSI Department, Heritage Institute of Technology, Kolkata, India, for his keen insight, enduring guidance, and encouragement throughout my studies. His wisdom and knowledge not only in the field of my journal paper have greatly expanded my interest in and enriched my knowledge of VLSI.

7. References:

[1] Design Of Analog Cmos Integrated Circuits, Behzad Razavi, Professor of Electrical Engineering, University of California, LOS Angeles

[2] High Speed CMOS VLSI Design Lecture 1: Gate Delay Model (c) David Harris

[3] www.ieee.org,

[4] www.wikipedia.com

[5] CMOS IC Layout Concepts, Methodologies, and Tools, Dan Clein. Technical Contributor: Gregg Shimokura Boston Oxford Auckland Johannesburg Melbourne New Delhi



VLSI TRANSISTOR & INTERCONNECT SCALING OVERVIEW

Author: Pritam Bhattacharjee & Aniruddha Mukherjee M.Tech (VLSI), ECE-Student (2011-13)

Abstract: In this paper, various types of device & interconnect scaling used for VLSI transistors are mentioned. Advanced device scaling techniques using SOI & FINFET technology are discussed for nano-devices. Keyword: Scaling factor ‘s’; technology or process node; Short-Channel Effects; drain-induced barrier lowering; punch through; surface scattering, velocity saturation; impact ionization; hot electrons; SOI; Floating Body; FinFET. 1. Introduction: The period from about 1964 to the early 1970’s had experienced an extraordinary revolution in integrated-circuit technology. When in 1970, the bipolar junction transistor technologies were seriously challenged by MOSFET’s in integrated circuit applications where chip density and cost factors became of prime importance[1]. Numerous companies, such as Texas Instruments, Fairchild, RCA, Motorola, prepared to exploit the emerging MOSFET technology for logic and memory applications. During this IBM had also pursued aggressive programs in integrated-circuit logic and memory using MOS transistor. IBM Research team was searching for a technology to fill the cost/performance “file gap” between movable head magnetic disks (which had low cost/bit but high latency time) and random access main memory (which had

high performance but high cost/bit) for transaction based systems. The research team had Dale L. Critchlow, Bob Dennard, Fritz Gaensslen & Larry Kuhn as members[1]. Several types of memory circuits were considered: shift registers; the bucket brigade shift register; charge coupled devices; and the one-transistor DRAM cell. As inventor of the one-transistor DRAM cell, Bob was eager to make it a viable candidate and soon proposed a preliminary design, which utilized his cell. Several breakthroughs were required to achieve the technical and cost goals. These included: • shrinking dimensions on the chip to about 1µm, which required advances both in lithography and silicon processes; • Dramatic improvements in yield to allow larger chips and higher resolution lithography which, unfortunately, would print smaller, much more numerous defects; • A means of sensing the very small signals on the bit lines. During the next several years, each of these problems was solved. The experts in advanced electron beam and optical projection in the Research Division provided leadership for the 1-µm lithography. A 5x shrink of the existing technology was needed to achieve 1-µm dimensions. Bob and Critchlow decided that rather than designing the 1-µm technology from scratch, they would scale from some well-characterized devices which had channel



lengths of about 5µm and could be operated with voltages up to 20 V. They observed that if the electric fields were kept constant, the reliability of the scaled devices would not be compromised. In addition, if they could keep the fields in the silicon constant, they would expect fewer problems with short-channel effects and channel length modulation. A few days later, Bob and Fritz Gaensslen had derived the constant-field scaling theory and its limitations. The scaling theory had remarkable implications on circuit performance, circuit power, and power density, as well as the more obvious chip density. A key to the scaling was that all dimensions including wiring and depletion layers and all voltages including thresholds were scaled in concert.

Figure-1: Scaled from 5micrometer to

1micrometer 2. Purpose of Scaling: The path of device scaling came for achieving high packaging density, chip functionality, speed, and power improvements. It helped to obtain reduced dimension components & interconnect in the integrated circuits due to shrunk in size considerably. Thereby it increases more number of transistors onto the chip as mentioned in the publication of famous projection by Gordon Moore in the 1975 IEDM (International Electron Devices Meeting (IEDM) held in 1972) Digest, in which he stated that the number of

transistors on a chip essentially doubles every 18 months[1]. This is now popularly known as “Moore’s Law” described elaborately in figure-2.

Figure-2: Resemblance with Intel’s processing

The scaling results in major MOSFET challenges, including simultaneously maintaining satisfactory Ion (drive current) and Ileak ; high gate leakage current for very thin gate dielectrics; control of short channel effects (SCEs) for very small transistors & circuit power dissipation. 3. Types of Scaling:

i. Interconnect Scaling ii. Device Scaling

3.i.) Interconnect Scaling: All linear dimensions-wire length, width, thickness, spacing, and insulator thickness are scaled down by the same factor,‘s’, as the device scaling factor. Wire lengths (Lw) are reduced by ‘s’ because the linear dimension of the devices and circuits that they connect to is reduced by ‘s’. Both the wire and the



insulator thicknesses are scaled down along with the lateral dimension, for otherwise the fringe capacitance and wire-to-wire coupling (crosstalk) would increase disproportionally[2]. The table below summarizes the rules of interconnect scaling:

All material parameters, such as the metal resistivity ρw and dielectric constant ξins are assumed to remain the same. The wire capacitance then scales down by ‘s’, the same as the device capacitance, while the wire capacitance per unit length, Cw remains unchanged. The wire resistance, on the other hand, scales up by ‘s’, in contrast to the device resistance, which does not change with scaling. The wire resistance per unit length, Rw then scales up by s2, as indicated in the above table. It is also noted that the current density of interconnects increases with ‘s’ which implies that reliability issues such as electro migration may become more serious as the wire dimension is scaled down. The time delay of the device also scales as it is proportional to the product of resistance per unit length & capacitor per unit length & the total wire length control in VLSI technique.

3.ii.) Device Scaling: The scaling technique stated that various components of the device gets shrink by a scaling factor‘s’ (many denote it by different symbols). While this scale down, the corresponding parameters of

the circuit performance changes gradually in proportionality to the scaling factor. The figure-3 shown below will depict the brief of this statement:

Figure-3: Scale down of a device

The scaling factor s is the pre-factor by which dimensions are reduced. It is s>1 ~ 1.4 (for most process node).

4. Types of Device Scaling: a. Constant-Voltage Scaling b. Constant-Field Scaling 4.a.) Constant Voltage Scaling: [2]Here, only the lateral dimensions of the MOSFET are scaled. Thus, constant-voltage scaling is a purely geometrical process as power supply voltage is maintained constant at the time of scaling. Under constant-voltage scaling, the electric field scales up by s and the doping concentration scales up by s2. As the dimension shrinks, gate length & width are reduced by the scaling factor ‘s’. The lateral field increases as the drain-source voltage remains unchanged. It increases approximately by a factor ‘s’, since electric field is directly proportional to the ratio of drain-source to gate length, at least in the ohmic regime. This will eventually lead to very high fields in the channel so that dielectric breakdown (avalanche breakdown) will occur. Moreover, higher fields also cause hot-electron and oxide



reliability problems. For this reason, constant-voltage scaling isn’t preferred in nano-devices.

4.b.) Constant Field Scaling: The principle of constant-field scaling lies in scaling the device voltages and the device dimensions (both horizontal and vertical) by the same factor ‘s’ (s> 1), so that the electric field remains unchanged. This assures that the reliability of the scaled device is not worse than that of the original device. Here, the perpendicular dimensions (gate oxide length) and the voltages (VDS, VGS , Vth) of the MOSFET are scaled. Thus it is not a purely geometrical process.

Now furthermore, scaled down parameters of constant-field scaling can be categorized as:

i. Process parameters ii. Device parameters iii. Circuit parameters

4.b.i.) Process Parameters: The technological processing is done to reduce down the supply voltage, gate length & width, gate-oxide thickness, junction depth by a scaling factor ‘s’. The substrate-doping is increased by the same scaling factor. With the advancement in scaling of the process parameters, evolution of small-size transistors are taking place over the years.

4.b.ii.) Device Parameters: All capacitances (including wiring load) scale down by ‘s’ since they are proportional to area and inversely proportional to thickness. The charge per device (~C x V) scales down by s2, while the inversion-layer charge density (per unit gate area), Qi, remains unchanged after scaling. Since the electric

field at any given point is unchanged, the carrier velocity (v=µξ) at any given point is also unchanged (the mobility is the same for the same vertical field). Therefore, any velocity saturation effects will be similar in the original and the scaled devices. A key implicit assumption is that the threshold voltage also scales down by ‘s’. The depletion layer thickness, drain-current also scales down by the scaling factor ‘s’. But the transconductance (gm) remains unchanged. Thereby, the die area reduces by s2.

4.b.iii.) Circuit Parameters: With both the voltage and the current scaled down by the same factor, it follows that the active channel resistance of the scaled-down device remains unchanged. It is further assumed that the parasitic resistance is either negligible or unchanged in scaling. The circuit delay, which is proportional to RC or CV/I, then scales down by ‘s’. This is the most important conclusion of constant· field scaling: once the device dimensions and the power-supply voltage are scaled down, the circuit speeds up by the same factor. Moreover, power dissipation per circuit, which is proportional to VI, is reduced by s2. Since the circuit density has increased by s2, the power density, i.e., the active power per chip area, remains unchanged in the scaled-down device. The power-delay product of the scaled CMOS circuit shows a dramatic improvement by a factor of s3. The table-1 shown below tabulates the previous data:



5. Constant-Field Scaling Versus Constant -Voltage Scaling:

The essence of constant-field scaling is summarized in Table 1, in which ideal scaling is assumed and the second order effects are neglected. As noted above, all dimensions (including wiring) and voltages of a circuit are scaled in concert by a factor of ‘s’. The doping level of the substrate is increased by so that the depletion layer thickness scales down with ‘s’. The circuit gets faster by ‘s’, the power/circuit is reduced by ‘s2’, the power delay product improves by ‘s3’, and the power/unit area remains constant. As the number of circuits/chip is increased, the space required for interconnections on the chip tends to increase. This can counteract the density, performance, and power improvements. However, in practice, putting more function on a chip has efficiencies, which offset the effects of the extra wiring requirements. In addition, extra levels of wiring can be used. The results are quite different when all of the dimensions, but not the voltages, are scaled. This is referred to in Table-1 as constant-voltage scaling. The circuits are faster by rather than ‘s’ as was the case with constant-field scaling. However, this

neglects the fact that velocity saturation and intrinsic device resistances limit the performance gain, particularly at very small device dimensions. Note that the power/circuit increases by ‘s’ and the power delay product improves only by ‘1/s’. Perhaps most important is that the power-per-unit area increases by ‘s3’ rather than being constant as in constant-field scaling. Since the fields in the gate oxide and silicon increase by ‘s’, critical questions arise regarding gate oxide reliability, velocity saturation, and hot electron damage. In addition, the sharpness of the MOSFET turn-on characteristics and the variation of threshold voltage with channel-length and applied voltages become more problematical. In both types of scaling, the interconnection resistance effects become more pronounced as dimensions are shrunk as shown in Table 2. The RwireCwire of the wiring stays constant while the circuits become faster. The IRwire drops in the ground and power supply lines become larger fractions of the power supply and threshold voltages. 6. Challenges of Scaling: The basic limitations of scaling are: a.) Interconnect Challenges. b.) Device Challenges. 6.a.) Interconnect Scaling Challenge: The limitations occurring are as follows:

• Wires have not been shrunk in the vertical direction as much as scaling rules prescribe. This allows a tradeoff between the wiring resistances and capacitances. • Silicides have been added to polysilicon gates and diffusions to lower the electrode resistances. • Additional levels of metal have been added to contain the RwireCwire problems by



allowing wider, thicker wiring at the higher wiring levels. Similarly, very thick, wide wires are used at the upper levels to provide adequate ground and power distribution systems. • Copper wiring has been introduced recently to reduce resistance. • Design tools have been developed which allow optimization of wiring to minimize the effects of resistance. 6.b.) Device Scaling Challenge:

Short-Channel Devices: A MOSFET device is considered to be short when the channel length is the same order of magnitude as the depletion-layer widths (xdD, xdS) of the source and drain junction. As the channel length L is reduced to increase both the operation speed and the number of components per chip, the so-called short-channel effects arise[3]. Short-Channel Effects: The short-channel effects are attributed to two physical phenomena: i. The limitation imposed on electron drift characteristics in the channel. ii. The modification of the threshold voltage due to the shortening channel length. In particular five different short-channel effects can be distinguished: A. Drain-induced barrier lowering and punch through. B. Surface scattering. C. Velocity saturation. D. Impact ionization. E. Hot electrons.

A. Drain-induced barrier lowering and punch through: When the depletion regions surrounding the drain extends to the source, so that the two depletion layer merge (i.e., when width of depletion region at source side, xdS + width of depletion region at drain side, xdD = L), punch through occurs. Punch through can be minimized with thinner oxides, larger substrate doping, shallower junctions, and obviously with longer channels. The current flow in the channel depends on creating and sustaining an inversion layer on the surface. If the gate bias voltage is not sufficient to invert the surface (VGS<VT0), the carriers (electrons) in the channel face a potential barrier that blocks the flow. Increasing the gate voltage reduces this potential barrier and, eventually, allows the flow of carriers under the influence of the channel electric field. In small-geometry MOSFETs, the potential barrier is controlled by both the gate-to-source voltage VGS and the drain-to-source voltage VDS. If the drain voltage is increased, the potential barrier in the channel decreases, leading to drain-induced barrier lowering (DIBL). The reduction of the potential barrier eventually allows electron flow between the source and the drain, even if the gate-to-source voltage is lower than the threshold voltage. The channel current that flows under this conditions (VGS<VT0) is called the sub-threshold current. B. Surface scattering: As the channel length becomes smaller due to the lateral extension of the depletion layer into the channel region, the longitudinal electric field component ξy increases and the surface



mobility becomes field-dependent. Since the carrier transport in a MOSFET is confined within the narrow inversion layer and the surface scattering (that is the collisions suffered by the electrons that are accelerated toward the interface by ξx) causes reduction of the mobility, the electrons move with great difficulty parallel to the interface, so that the average surface mobility, even for small values of ξy, is about half as much as that of the bulk mobility.

Figure-4: SCE in MOS

C. Velocity saturation: The performance short-channel devices are also affected by velocity saturation, which reduces the transconductance in the saturation mode. At low ξy, the electron drift velocity vde in the channel varies linearly with the electric field intensity. However, as ξy increases above 104 V/cm, the drift velocity tends to increase more slowly, and approaches a saturation value of vde(sat) =107 cm/s around ξy =105 V/cm at 300 K. Note that the drain current is limited by velocity saturation instead of pinch off. This occurs in short-channel devices when the

dimensions are scaled without lowering the bias voltages. Using vde(sat), the maximum gain possible for a MOSFET can be defined as

D. Impact ionization: Another undesirable short-channel effect, especially in NMOS, occurs due to the high velocity of electrons in presence of high longitudinal fields that can generate electron-hole (ξ-h) pairs by impact ionization, that is, by impacting on silicon atoms and ionizing them. It happens as follows: normally, most of the electrons are attracted by the drain, while the holes enter the substrate to form part of the parasitic substrate current. Moreover, the region between the source and the drain can act like the base of an n-p-n transistor, with the source playing the role of the emitter and the drain that of the collector. If the aforementioned holes are collected by the source and the corresponding hole current creates a voltage drop in the substrate material of the order of 0.6V, the normally reversed-biased substrate-source p-n junction will conduct appreciably. Then electrons can be injected from the source to the substrate, similar to the injection of electrons from the emitter to the base. They can gain enough energy as they travel toward the drain to create new ξ-h pairs. The situation can worsen if some electrons generated due to high fields escape the drain field to travel into the substrate, thereby affecting other devices on a chip. E. Hot electrons: Another problem, related to high electric fields, is caused by so-called hot electrons. These high-energy



electrons can enter the oxide, where they can be trapped, giving rise to oxide charging that can accumulate with time and degrade the device performance by increasing Vth and affect adversely the gate’s control on the drain current.

Figure-5: Hot electron effect in MOS

The magnitude of the threshold voltage does not scale well for voltages less than about 1.5 V. In addition, short channel effects, where the threshold varies as the channel length and applied voltages change, cause degradation of circuit performance. Device designers have succeeded in improving the turn-on characteristics of the devices by optimizing the doping profiles of the channel region and the source-drain electrodes using ion implantation. The move to CMOS has made the circuits less sensitive to body, or substrate, effects, allowing higher doping to be used. This approach essentially utilizes a compromise between constant-field scaling and constant-voltage scaling in which the gate oxide thickness and the channel length scale more rapidly than the supply voltage. Circuit designers have learned to design high-speed circuits with larger off-currents and larger short-channel effects. In addition, multiple device designs, for example, the use of different gate oxide thicknesses, each tailored to a specific purpose, can be used on the same chip. SOI (Silicon on Technology) shows promise for achieving improved characteristics. Now, SOI depicts increased demand for high performance, low power

and low area among microelectronic devices is continuously pushing the fabrication process to go beyond the conventional process. It overcomes lots of short channel effects. 7.NEW EMERGING TECHNOLOGIES: The two most popular approaches to overcome device scaling challenges are:

a. SOI b. FINFET 7.a.) Basics of SOI: Floating Body concept- In a Silicon On Insulator (SOI) Fabrication technology Transistors are built on a silicon layer resting on an Insulating Layer of Silicon dioxide (SiO2). The insulating layer is created by flowing oxygen onto a plain silicon wafer and then heating the wafer to oxidize the silicon, thereby creating a uniform buried layer of silicon dioxide. Transistors are encapsulated in SiO2 on all sides. In a standard Bulk CMOS process technology, the p-type body of the NMOS Transistor is held at the ground voltage, while in a PMOS Transistor in the Bulk CMOS process technology is fabricated in an N-well, with the Transistor body held at the Vdd supply voltage by means of a metal contact to the N-well. In Silicon-On-Insulator process technology, the source, body, and drain regions of transistors are insulated from the substrate. The body of each transistor is typically left unconnected and that results in floating body. The floating body can get freely charged/discharged due to the transients (Switching) and this condition affects threshold voltage (Vth) and many



other device characteristics. The transistor area in SOI process is less because there is no need for metal contacts to wells used for making MOS transistors[4].

Partially Depleted & Fully depleted SOI: In an NMOS transistor, applying a positive voltage to the gate depletes the body of P-type carriers and induces an N-type inversion channel on the surface of the body. If the insulated layer of silicon is made very thin, the depletion layer fills the full depth of the body. A technology designed to operate this way is called a “fully depleted” SOI technology. The thin body avoids a floating voltage. On the other hand, if the insulated layer of silicon is made thicker, the depletion region does not extend the full depth of the body. A technology designed to operate this way is called a “partially depleted” SOI technology. The undepleted portion of the body is not connected to anything.

Figure-6: Partially & Fully Depleted SOI

Advantages of SOI:

Higher Performance than CMOS at equivalent Vdd

Reduced Temperature sensitivity Latchup Eliminated Reduced Antenna issues No Body or Well Taps needed Small transistor area saves lot of die

area Power savings

Disadvantages of SOI:

VT dependent switching (History Effect)

Bipolar currents Self Heating Modeling Issues

SOI technology in nano-device structure has been popular especially for 90nm down to 32nm technology. With further reduction of process node (22nm and beyond), new transistor structures of 3D nature has become very popular known as FINFET.

7.b.) FINFET: The scaling of CMOS devices is moving towards the physical limits using standard bulk & SOI technology. FinFETs are seen as the most likely candidate for the successor of the bulk CMOS from the 22 nm node onwards, because of its compatibility with the current CMOS technology. It is expected that sustained scaling during the next decade will see the evolution from the single gate (SG) conventional device to the multiple gate MOSFETs (MuGFETs). Double gate FinFET is a promising candidate because of its quasi-planar structure, excellent roll-off characteristics, drive current and it is close



to its root, the conventional MOSFET in terms of layout and fabrication. FinFET structure shows less short channel effects than bulk MOSFET because of its self-aligned double gate structure and hence good electrostatic integrity. FinFETs have been demonstrated with both overlap and under lap regions structures. FinFETs with graded or abrupt gate overlaps gives a higher Ioff as the technologies are scaled down in deep sub-half micron regime because of this. The under lap structure with optimized doping profile in the underlap regions received a considerable attention in the recent years. Many different ICs with FinFETs have already been demonstrated, ranging from digital logic, SRAM, DRAM to Flash memory. Furthermore, due to their superior subthreshold performance and excellent current saturation they offer advantages for the high-gain analog applications and are steadily reaching better results in RF applications[5].

Figure-7: Types of Multiple Gate Devices

This figure-7 is showing multiple gate devices possible in FinFET technology.

7.b.i.) Scaling of SG FinFET Structure: Since Single Gate FinFET structure is no difference than traditional MOS structure (SOI/Planner) the scaling behavior of single gate FinFET is similar like traditional MOS transistor.

7.b.ii.) Scaling limits of DG FinFET structure: Figure-8 shows the effect of the ratio of gate-length (L) and fin-thickness (Tfin) on DIBL. This ratio limits the scaling of DG FinFET structure. DIBL and subthreshold swing (SS) increases abruptly when the L/Tfin ratio fall below 1.5. This ratio is a most important factor which decides the short channel effects. For DG FinFET structure fin-thickness could be a dominating factor which decides the scaling capabilities.

Figure-8: Short Channel Effects variation with (L/Tfin) ratio

7.b.iii.) Scaling limits of TG FINFET structure: Figure-9 shows the effects of the ratio of effective gate-length (Leff) and fin-thickness (Tfin) on SCEs. DIBL and subthreshold swing (SS) increases as (Leff/Tfin) ratio decreases. This ratio can be reduced to less than 1.5 unlike DG FinFET for the same SCEs. However, as



this ratio approaches to 1 value the SCEs can go beyond acceptable limits. So, the scaling capabilities of TG FinFET structure is more than that of DG FinFET structure.

Figure-10 shows the effects of the ratio of effective gate-length (Leff) and fin-height (Hfin) on SCEs. DIBL and subthreshold swing (SS) increases as (Leff/Tfin) ratio decreases, however, the increment is less than that with the ratio (Heff/Tfin). The reason behind this is that the fin-thickness is more prone to the SCEs than the fin-height. Fin-height can be increased to achieve higher on-current than that of fin-thickness with an acceptable value of SCEs.

Figure-9: Effects of (Leff/Tfin) ratio variation on SCEs at 30nm of Hfin

Figure-10: Effects of (Leff/Hfin) ratio variation on SCEs at 30nm of Tfin

The concept of the wide-channel FinFET built on tall fin-structures with very high aspect-ratio has been proven by the demonstration MOS devices. High anisotropy of the silicon fin-etching was achieved by using the TMAH solution on silicon wafers with (110) surface orientation, which exposes vertical (111) planes. Devices are isolated from the substrate with the thick oxide layer formed by the deposition, planarization and etch-back. The gate-stack consists of approximately 5 nm thick thermal oxide layer and n+ polysilicon gate, which is later patterned by TMAH, followed by the source and drain formation, contacting, annealing and metallization steps. The measured devices exhibit nearly-perfect subthreshold performance, but the threshold voltage needs adjustment to reach the typical values demanded for the CMOS ICs. Output characteristics reveal the higher current of pFETs and this anomaly can be traced to the slightly higher substrate doping and the gate-depletion that can be observed for the nFETs. However, devices show great potential with very high output current per single fin, and can be further improved with the more advanced processing of the gate-stack and the source and drain regions, namely by reducing the gate-length, having shorter annealing times and selective epitaxy in the source and drain regions to reduce the series resistance.

The (Leff/Tfin) ratio limits the scaling capabilities FinFET structure. This ratio is found to be less in case of TG FinFET structure for acceptable SCEs. Simulation results shows that TG FinFET structure is more scalable than that of DG FinFET structure. The relative ratio of Hfin and



Tfin should be maximum at a given Tfin and Leff to get maximum on-current per unit width. However, increasing Hfin degrades the fin stability, increases the difficulty of gate patterning and degrades SCEs. Carefully optimization of Tfin and Hfin is essentially required to get a good performance for scaled FinFET structure at a given gate length.

So, the introduction of FinFET technology has opened new chapters in nanotechnology. Simulation shows that FinFET structure could be scalable down to 10 nm. Formation of ultra thin fin enables suppressed short channel effects. It is an attractive successor to the single gate MOSFET by virtue of its superior electrostatic properties and comparative ease of manufacturability [6].

8. Conclusions: Scaling has been a fundamental driving force in MOSFET circuits for several decades. Although the concepts underlying scaling are rather simple, the implications and results are profound. As the MOSFET technology has matured and more limitations must be considered, the scaling approaches have become much more sophisticated. Following the scaling trend through the years, the devices’ feature size (transition of the parametric coefficients) has improvised a lot as shown below:

Figure-11: Advance process node

The Nano-technology evolution started with the channel length below 100nm process node. From 90nm onwards, both CMOS Bulk technology and SOI technology are used. From 65nm technology node, strained silicon technology started where Ge impurity is introduced inside Si which thereby improved mobility of the device resulting in faster PMOS with respect to NMOS operation. The concept of High K Metal Gate came from process node of 45nm onwards, to control gate leakage through extremely thin gate insulator.

It is observed that technology nodes from 90nm down to 22nm didn’t result in constant reduction in power supply as per scaling factor ‘s’, as it was seen for previous. The reason is risk of exponential channel leakage increase with threshold voltage reduction. In order to achieve Moore’s Law below 32nm technology, new 3D transistor using FinFET technology is introduced from 22nm process node and beyond.



9. Acknowledgement: The help given by Professor Krishanu Datta, Microelectronics & VLSI Department, Heritage Institute of Technology, Kolkata, India is greatly appreciated. He reviewed & scrutinized this paper; provided us valuable suggestions and information. He had been a great guide throughout the publication. We are grateful to his contribution.

The presentation by Professor E.F. Schubert, Rensselaer Polytechnic Institute helped us a lot to initiate our paper work. A cordial thanks to Mr. Subir Maity, M.Tech (VLSI, Jadavpur University) for providing us his valuable advises & few of his documental works. 10. REFERENCES:

[1]. “MOSFET Scaling-The Driver of VLSI Technology” by Professor Dale L. Critchlow, FELLOW, IEEE;

[2]. YUAN TAUR University of

California, San Diego; TAK H. NING IBM T. J. Watson Research Center, New York [3]. Fabio D’Agostino; Daniele Quercia; Project Short-Channel Effects in MOSFETs; December 11th, 2000

[4]. Narayana Murty Kodeti;

Infotech Enterprises Ltd.,White Paper On Silicon On Insulator (SOI) Implementation; June 2009

[5]. V. Jovanović, T. Suligoj, P. Biljanović, L.K. Nanver* University of Zagreb, Department of Electronics, Faculty of Electrical Engineering and Computing, Unska 3, Zagreb; *ECTM-DIMES, Delft University of Technology, P.O. BOX 5053, 2600 GB Delft, The Netherlands FinFET technology for wide-channel devices with ultra-thin silicon body.

[6]. Gaurav Saini, Ashwani K Rana; Department of Electronics and Communication Engineering, National Institute of Technology Hamirpur, Hamirpur, India: Physical Scaling Limits of FinFET Structure: A Simulation Study International Journal of VLSI design & Communication Systems (VLSICS) Vol.2, No.1, March 2011



VLSI STANDARD CELL LAYOUT

Author: Sabnam Koley M.Tech(VLSI), ECE-Student (2011-13)

Abstract : This paper focuses on the guideline and implementation details of physical layout of Standard Cells used in VLSI design. In this paper, the optimization method for standard cell layouts using stick diagram and euler path algorithm is also discussed. Keywords : Standard Cell Layout, Full Custom Layout, Semi Custom Layout, CAD tools, netlist, LVS, DRC. 1. Introduction : The process of creating an accurate physical representation of a structural netlist is called layout creation [1]. Before any logic gate can be constructed from a transistor schematic it is necessary to develop a strategy for the cell's basic layout. Stick Diagram is a way for the design engineer to visualize the cell routing and transistor placement. Standard cell layout synthesis presents new optimization problems meet a specified library height, optimal placement of input/output ports on a wiring grid, and satisfying cell boundary conditions to enable block-level design[2]. 2. Types of Physical Design in VLSI: Physical design of a large VLSI system is a complicated process and cannot be solved in a single step. There are two main design styles: 1.Full Custom Layout

2.Semi-Custom Layout

2.1 Full Custom Layout : This layout style is predominantly manual effort. This type of layout is done on repeated datapath structures like mux, adder, multiplier -- in which require tight control over area, signal noise and bit symmetry. We refer to this type of layout as "datapath layout." Full-custom layout is popular for analog circuitry design like phase-locked loops (PLLs), digital-to-analog converters(DACs) and analog-to-digital converters (ADCs), electrostatic discharge (ESD) structures, regulators, etc. Another popular area for full custom layout design is VLSI memory due to its regular, repetitive structure with highly dense area requirement. Full-custom layout is needed for cell development. Cells are defined as logical building blocks that are part of a family of components that share common abutment rules, performance characteristics or functionality. Examples would include the cells within a standard cell library. This type of layout is known as "cell layout." 2.2 Semi-Custom Layout: This layout is mostly done by layout synthesis tool known as Place & Route tool



where basic cells are used from a standard cell library. Standard cells are pre-designed for general use and the same cells are utilized chip design. 3. VLSI Standard Cell Topology

Figure-3.1: A rendering of a small standard cell with three metal layers (dielectric has been removed). The sand-colored structures are metal interconnect, with the vertical pillars being contacts, typically plugs of tungsten. The reddish structures are polysilicon gates, and the solid at the bottom is the crystalline bulk silicon.

A standard cell is a group of transistor and interconnect structures that provides a boolean logic function (e.g., AND, OR, XOR, XNOR, inverters) or a storage function (flipflop, latch). The simplest cells are direct representations of the elemental NAND, NOR, and XOR boolean function, although cells of much greater complexity are also used (such as a 2-bit full-adder, or muxed D-input flipflop.) The boolean logic function of the cell is called its logical view. Functional behavior is captured in the form of a truth table or Boolean equation (for combinational logic) or a state transition table (for sequential logic).

Usually, the initial design of a standard cell is developed at the transistor level, in the form of a transistor netlist or schematic

view. The netlist is a nodal description of transistors, their connections to each other and their terminals (ports) to the external environment. A schematic view may be generated.

Since the logical and netlist views are only useful for logic and circuit simulation, and not for device fabrication, the physical representation of the standard cell must be designed known as layout view. This is the lowest level of design abstraction in common design practice. From a manufacturing perspective, the standard cell's VLSI layout is the most important view, as it is closest to an actual "manufacturing blueprint" of the standard cell. The layout is organized into base layers which correspond to the different structures of the transistor devices and interconnect wiring layers which join together the terminals of the transistor formations. The interconnect wiring layers are usually numbered and have specific via layers representing specific connections between each interconnect layer.

Once a layout is created then additional CAD tools are often used to perform a number of common validations. A Design Rule Check (DRC) is done to verify that the design meets foundry and other layout requirements. The nodal connections of the schematic netlist are then compared to those of a layout with a Layout Vs Schematic (LVS) procedure to verify that the connectivity models are equivalent.

Finally, powerful Place and Route (PNR) tools may be used to pull everything together and synthesize (generate) Very Large Scale Integration (VLSI) layouts in an automated way from higher level design netlists and floor-plans.



The designer's challenge is to minimize the manufacturing cost of the standard cell's layout (generally by minimizing the die area of the circuit), while still meeting the cell's speed and power performance requirements. Consequently, integrated circuit layout is a highly labor intensive job, despite the existence of design tools to aid this process[3].

4. Design Rule Check in Layout:

Layout Design Rules can be referred to as a prescription for preparing the photomasks used in the fabrication of integrated circuit.

The main objective associated with layout rules is to obtain a circuit with optimum yield in as small an area as possible without compromising reliability of the circuit[4]. The design rules primarily address two issues:

i) The geometrical reproduction of features that can be reproduced by the mask-making and lithographical process.

ii) The interactions between different layers.

Two types of rules are present to create a Standard Cell Layout i.e.

4.1 Micron Rule: Micron design rules are usually given as a list of minimum feature size and spacing for all the masks required in a given process.

4.2 Lambda Rule: The Lambda design rules based on a single parameter λ, which characterizes the linear feature the resolution of the complete wafer implementation process and permits first order scaling[4].

5. VLSI Standard Cell Layout Guidelines[5]: Before creating cell layout we have to do wire planning:

a. Assign preferred direction to each layer

b. Group p’s and n’s c. Determine input/output port

locations d. Power, ground, and clock wires

must be wide Determine cell pitch:

a. Height of tallest cell

b. Number of over-the-cell tracks and wire lengths

Metal usage for wiring: a. Use poly poly gate wiring only b. Use diffusion for connection to

transistors only c. Use lowest metal layers only

Create stick diagram

The distribution of Vdd and ground must be in wide metal

a. Vdd runs near pMOS groups b. Ground runs near nMOS groups

WIRE Properties[5]: Layer Resistance Capacitance Connects

To M2 Low Low M1

M1 Low Low Diff,poly, M2

Poly Medium Low Gate,M1 ndiff Medium High S/D,M1 pdiff Medium High S/D,M1



WIRING Tracks: a. A wiring track is considered as

the space required for a wire. b. Transistors also consume one

poly track

Figure-5.1: Space required for wiring tracks as per MOSIS lambda rule [6]

WELL Spacing: Wells must surround transistors by 6

a. Implies 12 between opposite transistor flavors.

b. Leaves room for one wire track.

Figure-5.2: Showing the space between the wells as per MOSIS lambda rule [6]

AREA Estimation: Area estimation can be done in terms of wire tracks.

Figure-5.3: Showing how area is estimated by the wiring tracks as per MOSIS lambda rule [6]

Transistor Layout[7]:

Transistor Folding: In standard-cell methodology, the height of the cell is fixed. To obtain the specified cell height wide transistors are folded. Transistor folding is the process of splitting a transistor into multiple physical transistors of smaller widths connected in parallel. Intelligent transistor folding is crucial to standard cell layout synthesis because it automatically generates area-optimized cells. In the absence of automated folding, a user must experiment with each cell layout in order to



find the best foldings for PMOS and NMOS transistors [7].

Figure-5.4: Depicting the Transistor Folding (also called Fingering) concept[7]

6. CMOS Inverter Layout Design:

The layout design of a CMOS inverter will be examined step-by-step. The circuit consists of one nMOS and one pMOS transistor.

First, we need to create the individual transistors according to the design rules. Assume that we attempt to design the inverter with minimum-size transistors. The width of the active area is then determined by the minimum diffusion contact size (which is necessary for source and drain connections) and the minimum separation from diffusion contact to both active area edges. The width of the polysilicon line over the active area (which is the gate of the transistor) is typically taken as the minimum poly width (Fig. 5.1). Then, the overall length of the active area is simply determined by the following sum: (minimum poly width) + 2 x (minimum

poly-to- contact spacing) + 2 x (minimum spacing from contact to active area edge). The pMOS transistor must be placed in an n-well region, and the minimum size of the n- well is dictated by the pMOS active area and the minimum n-well overlap over n+. The distance between the nMOS and the pMOS transistor is determined by the minimum separation between the n+ active area and the n-well (Fig. 5.2). The polysilicon gates of the nMOS and the pMOS transistors are usually aligned. The final step in the mask layout is the local interconnections in metal, for the output node and for the VDD and GND contacts (Fig.5.3). Notice that in order to be biased properly, the n-well region must also have a VDD contact.

Figure-6.1: Design rule constraints which determine the dimensions of a minimum-size transistor.

Figure-6.2: Placement of one nMOS and one pMOS transistor.



Figure-6.3: Complete Layout of the CMOS inverter.

By the use of stick diagrams (shown in Fig.5.2) or so-called symbolic layouts we can simplify the initial phase of layout design. Here, the detailed layout design rules are simply neglected and the main features (active areas, polysilicon lines, metal lines) are represented by constant width rectangles or simple sticks. The purpose of the stick diagram is to provide the designer a good understanding of the topological constraints, and to quickly test several possibilities for the optimum layout without actually drawing a complete physical diagram[8].

Figure-6.4: Stick diagram Of the Inverter Circuit

Figure-6.5: Layout of the CMOS Inverter Circuit

7. Layout of CMOS NOR & NAND Gates in VLSI:

The physical designs of CMOS NAND and NOR gates follow the general principles discussed earlier for the CMOS inverter layout. Figure 6.1-6.6 shows the sample layouts of a two- input NOR gate and a two-input NAND gate, using single-layer polysilicon and single-layer metal. Here, the p-type diffusion area for the pMOS transistors and the n-type diffusion area for the nMOS transistors are aligned in parallel to allow simple routing of the gate signals with two parallel polysilicon lines running vertically. Also notice that the two mask layouts show a very strong symmetry, due to the fact that the NAND and the NOR gate are have a symmetrical circuit topology.[8]



NOR 2 :

Figure-7.1: Schematic Diagram of the NOR2

Figure-7.2: Sample Stick Diagram of the NOR 2 Gate

Figure-7.3: Sample Layout of the NOR 2 Gate

NAND 2 :

Figure-7.4: Schematic Diagram of the

NAND 2

Figure-7.5: Sample Stick Diagram of the NAND 2

Figure-7.6: Sample Layout of the NAND 2 Gate



8. Method to Create Complex Standard Cell Layout:

The realization of complex Boolean functions (which may include several input variables and several product terms) typically requires a series-parallel network of nMOS transistors which constitute the so-called pull-down net, and a corresponding dual network of pMOS transistors which constitute the pull-up net. Figure 8.1 shows the circuit diagram and the corresponding network graphs of a complex CMOS logic gate. Once the network topology of the nMOS pull- down network is known, the pull-up network of pMOS transistors can easily be constructed by using the dual-graph concept.

Figure-8.1: A complex CMOS logic gate realizing a Boolean function with 5 input variables.

Now, we will investigate the problem of constructing a minimum-area layout for the complex CMOS logic gate. Figure 8.2 shows the stick-diagram layout of a first-attempt, using an arbitrary ordering of the polysilicon gate columns. Note that in this case, the separation between the polysilicon

columns must be sufficiently wide to allow for two metal-diffusion contacts on both sides and one diffusion-diffusion separation. This certainly consumes a considerable amount of extra silicon area.

If we can minimize the number of active-area breaks both for the nMOS and for the pMOS transistors, the separation between the polysilicon gate columns can be made smaller. This, in turn, will reduce the overall horizontal dimension and the overall circuit layout area. The number of active-area breaks can be minimized by changing the ordering of the polysilicon columns, i.e., by changing the ordering of the transistors.

Figure-8.2: Stick diagram layout of the complex CMOS logic gate, with an arbitrary ordering of the polysilicon gate columns.

A simple method for finding the optimum gate ordering is the Euler-path method: Simply find a Euler path in the pull-down network graph and a Euler path in the pull-up network graph with the identical ordering of input labels, i.e., find a common Euler path for both graphs. The Euler path is defined as an uninterrupted path that traverses each edge (branch) of the graph exactly once. Figure 8.3 shows the construction of a common Euler path for both graphs in our example.



Figure-8.3: Finding a common Euler path in both graphs for the pull-down and pull-up net provides a gate ordering that minimizes the number of active-area breaks. In both cases, the Euler path starts at (x) and ends at (y).

It is seen that there is a common sequence (E-D-A-B-C) in both graphs. The polysilicon gate columns can be arranged according to this sequence, which results in uninterrupted active areas for nMOS as well as for pMOS transistors. The stick diagram of the new layout is shown in Fig.7.4. In this case, the separation between two neighboring poly columns must allow only for one metal-diffusion contact. The advantages of this new layout are more compact (smaller) layout area, simple routing of signals, and correspondingly, smaller parasitic capacitance.

Figure-8.4: Optimized stick diagram layout of the complex CMOS logic gate.

It may not always be possible to construct a complete Euler path both in the pull-down and in the pull-up network. In that case, the best strategy is to find sub-Euler-paths in both graphs, which should be as long as possible. This approach attempts to maximize the number of transistors which can be placed in a single, uninterrupted active area.

Finally, Fig. 8.5 shows the circuit diagram of a CMOS one-bit full adder. The circuit has three inputs, and two outputs, sum and carry_out. The corresponding mask layout of this circuit is given in Fig. 8.6. All input and output signals have been arranged in vertical polysilicon columns. Notice that both the sum-circuit and the carry-circuit have been realized using one uninterrupted active area each.[8]

Figure-8.5: Circuit diagram of the CMOS one-bit full adder.

Figure-8.6: Layout of the CMOS full adder circuit.



9. Conclusion:

Strictly we can say that, a 2-input NAND or NOR function is sufficient to form any arbitrary Boolean function set. But in modern ASIC design, standard-cell methodology is practiced with a sizable library (or libraries) of cells. The library usually contains multiple implementations of the same logic function, differing in area and speed. This variety enhances the efficiency of automated synthesis, place, and route (SPR) tools. Indirectly, it also gives the designer greater freedom to perform implementation trade-offs (area vs. speed vs. power consumption). A complete group of standard-cell descriptions is commonly called a technology library.

Commercially available Electronic Design Automation (EDA) tools use the technology libraries to automate layout using synthesis, placement and routing of a digital ASIC. The technology library is developed and distributed by the foundry operator. The library (along with a design netlist format) is the basis for exchanging design information between different phases of the SPR process.

Design Rule Check (DRC) and Layout Versus Schematic (LVS) are key layout verification processes. Reliable device fabrication at modern deep-submicrometer (0.13 µm and below) requires strict observance of transistor spacing, metal layer thickness, and power density rules. DRC exhaustively compares the physical netlist against a set of "foundry design rules" (from the foundry operator), then flags any observed violations. The LVS process confirms that the layout has the same structure as the associated schematic; this is typically the final step in the layout process. The LVS tool takes as an input a schematic diagram and the extracted view from a

layout. It then generates a netlist from each one and compares them. Nodes, ports, and device sizing are all compared. If they are the same, LVS passes and the designer can continue. 10. Acknowledgement:

With the utmost gratitude, I would like to thank Prof. KRISHANU DATTA, Microelectronics & VLSI Department, Heritage Institute of Technology,Kolkata,India, for his keen insight, enduring guidance, and encouragement throughout my studies. His wisdom and knowledge not only in the field of my journal paper have greatly expanded my interest in and enriched my knowledge of VLSI. 11. References : [1] CMOS IC LAYOUT Concepts, Methodologies, and Tools, Dan Clein. Technical Contributor: Gregg ShimokuraBoston Oxford Auckland Johannesburg Melbourne New Delhi

[2] A Fully Automatic Layout Synthesis System for Standard Cell Libraries, Unified Design System Laboratory Motorola, Inc. Austin, Texas [3] Standard cell – Wikipedia [4] Principles of CMOS VLSI Design, Neil H. E. Weste & Kamran Eshranghian [5] Cell Design and Layout, Kenneth Yun UC San Diego [6] Introduction to CMOS VLSI Design, David Harris, Harvey Mudd College Spring 2004 [7]Cell Design and Layout,Kenneth Yun UC San Diego

[8] Y. Leblebici, a joint production of Integrated System Center & Microelectronics group.



VLSI FABRICATION OVERVIEW Author: Abhishek Roy

M.Tech (VLSI), ECE-Student (2011-13)

Abstract: Fabrication is the process used to create the integrated circuits that are present in everyday electronic devices. It is a multiple-step sequence of lithography and chemical or mechanical processing steps during which electronic circuits are gradually created on silicon wafer.

Keywords: Wafers, fabrication, lithography, etching, oxide growth, annealing.

1. INTRODUCTION: When device feature size was about 5-10 micrometers, purity was not big of an issue than what it is today in device manufacturing. As devices became more tiny and integrated, clean rooms needed to be even cleaner. Today, the fabs are pressurized with filtered air to remove even the smallest possible particles, which could come to rest on the wafers and contribute to defects. The workers in a semiconductor fabrication facility are required to wear clean room suits to protect the devices from human contamination.

Integrated circuits were first demonstrated on silicon independently by Jack Kilby at Texas Instruments and Robert Noyce at Fairchild Semiconductor in the late 1950s. In an effort to increase profits, semiconductor device manufacturing has spread from Texas and California in the

1960s to the rest of the world, such a s Europe, Middle East, and Asia. The leading semiconductor manufacturers typically have fabrication facilities all over the world. Intel, the world's largest chip manufacturer has facilities in Europe and Asia as well as the U.S. Other top manufacturers include Taiwan Semiconductor Manufacturing Company (Taiwan), STMicroelectronics (Europe), Analog Devices (US), Integrated Device Technology (US), Atmel (US/Europe), Free scale Semiconductor (US), Samsung (Korea), Texas Instruments (US), IBM (US), Toshiba (Japan), NEC Electronics(Japan), Infineon (Europe), Renesas (Japan), Fujitsu (Japan/US), NXP Semiconductors (Europe and US)

2. Fabrication Material: Materials used for integrated circuit fabrication can

be classified into three main categories (Fig.1) regarding their electrical conduction properties [3]:

INSULATORS CONDUCTORS SEMICONDUCTORS



FIG1: CATEGORIES OF MATERIAL

3. Mask Views: Six key types of masks are needed for IC fabrication [3]:

n-well Polysilicon n+ diffusion p+ diffusion Contact Metal

4. Semiconductor Processing Basics: 4. a. Integrated Circuit Process Flow Chart: Few critical steps of process flow are mentioned below.

Courtesy: Prof. Krishanu Datta

FINAL TEST

DATA BOOK SPECIFICATION RELIABILITY SHIPPING AND

PACKING

ASSEMBLY

SAWING MOUNTING BONDING ENCAPSULATING

SYMBOLIZING

MULTI PROBE TEST

CONTINUITY FUNCTIONAL PARAMETRIC

OTHER PROCESS

DIFFUSION ION IMPLANTATION PLASMA PROCESS

ETCH FILM

WET DRY

PHOTO LITHOGRAPHY

APPLY PHOTO RESIST EXPOSE DEVELOP

DEPOSIT OR GROW FILM

EPITAXY CVD METALLIZATION OXIDATION

MATERIAL PREPARATION

PURIFY SILICON GROW IGNOT PREPARE WAFER

Metal

Polysilicon

Contact

n+ Diffusion

p+ Diffusion

n well



4 b. Wafer Preparation:

FIG2: CZ TECHNIQUE All mainstream semiconductor integrated-circuit processes start with a thin slice of silicon, known as a wafer which is sliced from ingot (fig.4) of silicon creates using Czochralski process (Fig.2). Silicon wafer is circular and ranges from 4 to 12 inches in diameter and is approximately 1 mm thick [2]. Each wafer is polished to its final thickness with atomic smoothness (Fig.2 (a)).

FIG2 (a): A single 4-inch silicon wafer. Note the mirror-like surface of wafer. Integrated circuit designs fit into a few square centimeters of silicon area; known as a die. After fabrication, the wafer is cut to produce independent, rectangular dies which are then packaged to produce the final IC chip.

FIG3: INGOT PREPARATION

4c. Deposit or grow film:

i. Grow film: Oxidation: Grow SiO2 on top of Si wafer 900 – 1200 C with H2O or O2 in oxidation furnace.

Insulating dielectric layers are a key element in semiconductor fabrication which provides isolation between conductive layers on the surface of the wafer [4]. In fact, one of the most important reasons Silicon has become such a successful medium for integrated microelectronics is the fact that silicon has a good native oxide, which means that Silicon oxidizes (combined with elemental Oxygen) to from a dielectric oxide called silicon dioxide, SiO2. SiO2 is a good insulating layer and can be created by exposing Si to an O2 environment. At elevated temperatures (~10000C) the oxide grows quickly, which is another characteristic that makes it useful in semiconductor fabrication. Native oxides grown at elevated temperatures are referred to as thermal oxides. An advantage of native thermal oxides such as SiO2 is that they have similar material properties (e.g. thermal expansion coefficient, lattice size, etc.) of the native material. This means that oxides can be grown without creating significant stresses in the material which could lead to serious fabrication problems including circuit failure. Thermal oxide grown in Si can be masked by PR, although



better results can be obtained if SiN is used to mask thermal oxidation.

Since material (O2) is being added to the wafer, the wafer grows in thickness, and ~ 50% of the oxide grows beneath the silicon surface and the other half on top of the (original) surface.

Native oxide growth is used in MOS fabrication to grow the field oxide (the region outside of the active region) and to create the gate oxide layer, the thickness of which can be well controlled.

ii. Grow film: Epitaxy: Epitaxial growth describes the process of creating single-crystal Silicon from a thick layer of deposited Silicon (polysilicon) [4]. When the deposited Silicon is uniformly doped, the resulting single-crystal epitaxial layer will have a constant doping profile (unlike that created by diffusion or implantation) which is necessary for the formation of some semiconductor devices. An "epi" layer can be formed after initial diffusion processes to leave a "buried layer" often used in making bipolar transistors (BJTs). The process for creating a single-crystal epitaxial layer from the deposited material is somewhat complex and involves the use of a "seed crystal" that allows an annealing process to align the crystals of the deposited material.

iii. Oxide Deposition: The addition of material to the top of the wafer to form interconnects and insulators between interconnect layers requires process step generally referred to as deposition. A variety of material can be deposited including conductors, metallization, insulators, and semiconductor materials. Several different techniques are employed for deposition, but the process can

generally be thought of as a uniform sprinkling of material on the surface of the wafer, either over the entire surface or through a masking layer (for deposition on selective areas). The technique of film deposition is known as Chemical Vapor Deposition (CVD) which chemically vaporizes the material to be deposited. The vaporized material then floats down to the wafer surface where it solidifies to form the deposited layer

4d. Lithography: The method by which we transfer mask pattern to silicon is known as lithography.[1] Lithography has evolved much over the last 40 years and will continue to do so. Modern lithography uses complex computation, specialized materials and optical devices to achieve the very high resolutions required to reach modern feature sizes. In (Fig.5) we have shown a photo lithography exposure tool, where dies are exposed for the lithography process.

Fig5:-Photo Lithography Exposure Tool Conceptually lithography is simply a stencil process. In an old-fashioned stencil process, when a plastic sheet with cut-out letters or numbers is laid on a flat surface and painted, only the cutout areas would be painted. Once the stencil is removed, the design left behind consists of only the painted areas with clean edges and a uniform surface. Given a flat wafer, we first apply a thin coating of liquid polymer known as photo resist (PR) (Fig.7).



This layer usually is several hundred nanometers thick and is applied by placing a drop in the center of the wafer and then spinning the wafer very fast (1000 to 5000 rpm) so that the drop spreads out evenly over the surface. Once coated, the PR is heated (usually between 60 to 1000C), this process is known as baking; it allows the PR to solidify slightly to a plastic-like consistency. Once baked and when exposed to ultraviolet (UV) light, the bonds that hold the PR molecules together are “chopped” up; this makes it easy to wash away the UV-exposed areas. This type of PR is known as +ve PR. (some varieties of PR (-ve PR) behave in exactly the opposite manner: UV light makes the PR very strong or cross-linked). In lithography, UV light is focused through a glass plate with mask patterns on it; this is known as exposure. These patterns act as a “light stencil” for the PR. Wherever UV light hits the PR, that area subsequently can be washed away in a process called development. After development, the PR film remains behind with holes in certain areas.

FIG.6: SILICON FOUNDRY How is this helpful? Let’s look at how the modifications presented earlier can be masked with PR to produce patterned effects (Fig.12). In each case, we first use lithography to pattern areas onto the wafer (Fig.12 (a)) then we perform one of our three processes (Fig.12 (b)), and finally, we use a strong solvent such as acetone (nail

polish remover) to completely wash away the PR (Fig.12(c)). The PR allows us to implant, deposit, or etch only in defined areas.

4e. Etching: Etching refers to the removal of material from the surface of a wafer using a chemical or mechanical process, or a combination of both [4]. Etching processes are needed to pattern deposited layers and form contact opening in dielectric layers Chemical etching processes attack (etch) some materials more quickly than others while mechanical etching etches all material equally. Both processes require a masking layer (generally either PR or oxide) to block regions where etching is not desired. Different types of etching are shown in Fig.8

FIG8: TYPES OF ETCH



i. Chemical Etching: Most materials, including Si, SiO2, PR, Polysilicon, and Metal, can be selectively removed by chemical processes that attack only the desired material. This allows the etching process to stop once the desired material has been removed (rather than continuing all the way through the wafer!). Chemical etchants can typically be blocked by masking layers of PR or oxide of appropriate thickness. The primary disadvantage of chemical etching is that it is isotropic (i.e. it etches in all directions -not just vertically) and will undercut the masking layer which is undesirable in many cases.

ii. Mechanical Etching: Purely mechanical etching processes such as Ion Milling, are not material selective; they bombard the wafer surface and remove any material they strike. However, a thick high-strength mask material can be used to block the etching process from the surface so that only specific areas on the wafer are etched. Because there is no chemical undercutting, mechanical etching creates a straight vertical etching profile.

iii. Chemical-Mechanical Etching: Reactive ion etching (RIE) is the primary technique of chemical-mechanical etching in which ions in plasma bombard the surface and etch away material. The plasma can be chosen to selectively etch one material more than another. RIE is a very common process in modern semiconductor fabrication and is typically used for contact openings through dielectrics.

Fig.9: AFTER ETCHING

4f. Doping: The operation of semiconductor devices requires that specific regions of the substrate be doped n-type or p-type with specific dopant concentrations. There are two primary methods used to add impurities to the substrate: (i) diffusion, (ii) implantation.

i. Diffusion: A masking layer (e.g. insulator) is used to block the wafer surface except where the dopants are desired (Fig.10). The wafer is placed in a high-temperature furnace (~10000C) where the atmosphere contains the desired impurity in gaseous form. Through the process of diffusion, impurity atoms, which are in high concentration in the atmosphere, will diffuse into the substrate, where they have a low concentration (initially zero). After some time (~0.5 – 10 hours) the impurity atoms are uniformly distributed into the exposed wafer surface at a shallow depth (0.5 - 5µ m) at a concentration that can be reliably controlled (~1012-1019 cm-3).

FIG.10: DIFFUSION



ii. Implantation: Implantation is functionally similar to diffusion, but here the atoms are "shot" into the wafer at high velocity (Fig.10) (across a very strong electric field) and they embed themselves into the wafer surface. A short (~10min.) annealing step at elevated temperatures (~8000 C) is used to fit the new atoms into the substrate crystal lattice. Implantation is more uniform across the wafer than diffusion and allows for very precise control of where the impurities will be. In addition, its peak concentration can be beneath the wafer surface, and it does not require a long period of time at high temperature (which can be harmful). However, an implanted junction must remain near the surface of the wafer (~ 0.1 – 2µ m) and cannot go as deep as a diffused junction. The impurity concentration profile (concentration vs. depth) is different for diffusion and implantation.

FIG.11: ION IMPLANTATION

5. Fabricating a Diode: Conceptually, the simplest diode is made from two slabs of silicon each implanted with different atoms pressed together such that they share a boundary (Fig.13).

The n and p areas are pieces of silicon that have been implanted with atoms (known as impurities) that increase or decrease the number of electrons capable of flowing freely through the silicon[2]. This changes the semiconducting properties of the silicon and creates an electrically active boundary (called a junction) between the n and the p areas of silicon. If both the n and p pieces of silicon are connected to metal wires, this two-terminal device exhibits the diode i–v curve. The process for making a single diode is shown in (Fig.14). During oxidation, the silicon wafer is heated to > 10000C in an oxygen atmosphere. At this temperature, the oxygen atoms and the silicon react and form a layer of SiO2 on the surface (this layer is often called an oxide layer). SiO2 is a type of glass and is used as an insulator. Wires are made by depositing metal layers on top of the device; these are called interconnects. Modern ICs have ~10 such interconnect layers (Fig. 15). These layers are used to make electrical connections between all of the various components in the IC in the same way that macroscopic



wires are used to link components on a breadboard.

6. CONCLUSION: The integrated circuits that are present in everyday electronic devices are manufactured by fabrication process. Fabrication is basically done on silicon wafer through lithography

process which is similar to the process of printing press. On each fabrication step different materials are deposited or etched. 7. ACKNOWLEDGEMENT: The help given by Professor Krishanu Datta, Microelectronics & VLSI Department, Heritage Institute of Technology, Kolkata, India is greatly appreciated. He reviewed the paper and provided excellent suggestions and information’s 8. REFERENCES:

1) Principles of cmos VLSI design by Weste and Eshraghian.

2) Wikipedia

3) www-micro.deis.unibo.it/~ masetti /dida01/teccmos.pdf

4) http://www.engr.uky.edu/~ee461g/index.html



FABRICATION OF CMOS INVERTER USED IN VLSI Author: Tamoghna Purkayastha

M.Tech(VLSI), ECE-Student (2011-13)

Abstract- This paper covers the basic fabrication steps to create CMOS inverter used in VLSI chip. Keywords-Photolithography, Oxidation, Etching, CVD, Epitaxy 1) Introduction: By Fabrication we mean the process by which different circuitries are created over silicon wafer. Before going into the details of CMOS (integrated circuit) fabrication steps let us discuss the basic steps involved for an IC processing. It is known as the “IC process flow” The basic diagram representing the “IC process flow” is as shown below:-

1. MATERIAL PREPARATION

2. DEPOSITE OR GROW FILM

3. PHOTOLITHOGRAPHY

4. ETCHING FILM

5. DIFFUSION/ION IMPLANTATION

6. MULTIPROBE TEST

7. ASSEMBLY

8. FINAL TEST

Here the steps 2 to 5 involve the basic fabrication process. These steps are repeated for each mask layer (about 10-12 times) until the circuit is completely fabricated.

In this paper we will discuss about the fabrication process of a CMOS inverter. However before going into the details of the fabrication process, let us get familiar with few fabrication steps which we will come across quite often. 2) Oxidation: Many of the structures and manufacturing techniques used to make silicon integrated circuits rely on the properties of the oxide of silicon, namely silicon dioxide (SiO2). Oxidation of silicon is achieved by heating silicon wafers in an oxidizing atmosphere such as oxygen or water vapors. There are two types of oxidation: 2. a) Wet Oxidation: When oxidizing atmosphere contains water vapor. The temperature is usually between 900C-1000C. 2. b) Dry Oxidation: When oxidizing atmosphere is pure oxygen. Temperature is in the region of approximately 1200C. [1] 3) Epitaxy and CVD: Epitaxy is the process of growing a single crystal film on the silicon surface by subjecting the silicon



wafer surface to elevated temperature and a source of dopant material. There are two types of epitaxy process: 3.a) Homoepitaxy: In homoepitaxy, a crystalline film is grown on a substrate or film of the same material. 3.b) Heteroepitaxy: In heteroepitaxy, a crystalline film grows on a crystalline substrate or film of a different material. [1] 4. Chemical Vapor Deposition (CVD): Another process by which amorphous or polycrystalline film is deposited is known as Chemical Vapor Deposition (CVD). It is a chemical process used to produce high-purity, high-performance solid materials. The process is often used in the semiconductor industry to produce thin films. There are three types of CVD process, viz.: 4. a)Atmospheric Pressure CVD (APCVD): CVD processes at atmospheric pressure. It is used to give back oxide capacitance 4. b)Low Pressure CVD (APCVD): CVD processes at sub-atmospheric pressures. Most modern CVD processes are LPCVD. It is used to develop Polysilicon Film from SiH4.It is carried out at 600C-800C. 4.c) Plasma Enhanced CVD (PECVD): CVD processes that utilize plasma to enhance chemical reaction rates of the precursors. PECVD processing allows deposition at lower temperatures, about 350C to 450C. It is used to develop Oxide or Nitride films. [1] 5) Photolithography: It is a process that uses light to transfer a geometric pattern from a photo-mask to Si wafer

through a light-sensitive chemical "photoresist” Mask plate- Is a glass plate coated by opaque material on surface. The opaque material contains pattern of layout layers. Photoresist- It is a light-sensitive material containing polymer resin based material, photoactive sensitizer, solvent. There are two types of photoresist, viz positive and negative. In positive photoresist the exposed portion is soluble, thus it transfer exact pattern of mask. In negative photoresist the exposed portion is hardened, thus it transfer negative image of musk pattern.

Figure 1- process of Photolithography There are three types of Photolithography- 5. a)Contact: The mask plate and the resist are kept in contact. 5. b)Proximity: In this type of printing there is a small gap between mask plate and resist. 5.c)Projection: Unlike contact or proximity masks, which cover an entire wafer, projection happens only on one die or an array of die (known as a "field"). Projection exposure systems (steppers or scanners) project the mask onto the wafer many times to create the complete pattern. [1]



Figure 2- difference between the three types of printing. 6) Ion Implantation: This process is used to create n type or p type region in a silicon substrate. It is the process by which ions of a doping material are accelerated in an electrical field and impacted into wafer. [2]

Figure 3- device used for ion implantation. 7) Etching: It refers to the physical or chemical removal of insulator or metallic

films using the resist pattern as a mask. Etching with liquid chemicals is called "wet etching" and etching using plasma gas is called "dry etching". [2] Figure 4- dry etching. 8) Fabrication of a CMOS Inverter: The basic raw material used for fabrication in modern semiconductor plants is a Si wafer, also known as the “disc of silicon”. It varies from 75mm to 230mm in diameter and less than 1mm in thickness.

Figure 5 – silicon wafer The common approach to CMOS fabrication process is to start with a lightly doped p-type substrate on wafer, create the n type well for



p-channel devices, and build the n-channel transistor in the native p-substrate. [1]

Figure 6-oxidation on the substrate and applying photo resist. The first step of CMOS fabrication involves oxidation. The picture of the above figure shows that the bare silicon wafer is ready for oxidation. After oxidation a SIO2 layer is formed all over the wafer surface. Now the wafer is ready to undergo photolithography in order to create the N-well. So a photo resist is applied on the oxidized wafer, the resist is spun on the wafer by high speed-spinner.

Figure 7- photo-resist application Afterwards the wafer is baked at about 80C-100C to drive the solvent out of the resist and to harden the resist to improve adhesion. [4]

Figure 8- photolithography In the photolithography process the photoresist material is exposed to ultraviolet (UV) light, the exposed areas become soluble so that the they are no longer resistant to etching solvents. To selectively expose the photoresist, we have to cover some of the areas on the surface with a mask during exposure. Thus, when the structure with the mask on top is exposed to UV light, areas which are covered by the opaque features on the mask are shielded. In the areas where the UV light strikes the photoresist, it is “exposed” and becomes soluble in certain solutes. [1] After the photolithography process the wafer is again baked for 20 minutes, to enhance the adhesion and improve the resistance to the subsequent etching process. The unprotected SIO2 surface is removed by the etching process. Using buffered hydrofluoric acid (HF).Lastly the removed resist is stripped away by a chemical solution or an oxygen plasma system. Now the wafer is ready to undergo ion implantation of n+ ion in the N-well region.



Figure 9-MASK-1 N-well diffusion. In the ion-implantation process at first phosphorous ion is implanted to form the N-well region. At this time a mask is used which defines the N-well.

Figure 10-MASK-2 Active region The next mask is used to create the active region where PMOS and NMOS will be created. The thick oxide (shown in the above picture) provides the isolation between the MOSFETS. This thick film of oxide is grown using construction technique known as Local Oxidation On Silicon (LOCOS) method. [3]

Figure 11-MASK 3 polysilicon gate Our next task will be to form the gate region of the transistor. For this we require polysilicon. At first a high quality thin oxide is grown in the active area. The thickness of the oxide is about 10-30 Armstrong. After that plysilicon film is deposited by CVD technique and Mask 3 is used to deposit polysilicon gate as shown in the figure 11. After the polysilicon gate is formed, the process of photolithography and ion implantation is done again on the wafer, this time to create the n+ region. Using 4th mask, known as the n+ mask. The mask 4 control the heavy arsenic implants and create the source and drain region of the n-channel devices.

Figure 12- MASK4 n+diffusion, crossectional and top-level view



After that boron implantation is done to create p+ diffusion. Mask 5 is used to control heavy boron implants. To define p+ region.

Figure 13- MASK5 p+diffusion, crossectional and top-level view The above structure is popularly known as “self-aligned” structure. After the diffusion (n+ and p+) process is complete a thick oxide layer is deposited (CVD) over the entire wafer.

Figure 14-deposition of thick oxide over the entire surface. Now the circuit is ready, but connections are to be made through the contact holes. But before that the wafer undergoes another process known as “annealing”. Annealing is the process where the wafer is heated at high temperature (about 600C). This process repairs some of the internal damages in the wafer and also it pushes the n+ and p+ regions deeper into the wafer.

After annealing the contact cuts are defined. This source/drain involves etching any SiO2 down to the diffusion surface to be contacted. At this step, Mask 6 is used to pattern the contact holes.

Figure 15-Mask 6 defines contact holes. The contact holes which are created are now filled by a thin layer of Aluminum. It is either evaporated or sputtered onto the wafer. The Aluminum provide the circuit connection. This process is known as metallization. In this process the 7th mask is required to pattern the interconnection.

Figure 16-mask 7 metallization, crossectional and top-level view. Finally the wafer is passivated and openings to the bond pads are etched to allow for wire bonding. Passivation protects the silicon surface against the ingress of contaminants that can modify circuit behavior in deleterious ways.



Figure 17- crossection of a CMOS inverter. 9) Conclusion: This paper deals with the key IC fabrication process steps. In this paper the steps for the fabrication of CMOS inverter are shown. 10) Acknowledgement: I would like to express my heartfelt gratitude to Prof. Krishanu Datta for giving his valuable guidance and insight while writing this paper work. 11) References: [1] Principles of CMOS VLSI design-by Neil.H.E.Weste & Kamran Eshraghian. [2] http://www.wikipedia.org/ [3] http://nptel.iitm.ac.in/ [4] Fundamental of Semiconductor Fabrication by May&Sze



INTRODUCTION TO VHDL FOR VLSI DESIGN : A TUTORIAL WITH FPGA

SYNTHESIS

Author: Abhirup Banik and Judhajit Chowdhury M.Tech(VLSI), ECE-Student (2011-13)

Abstract: In this paper, the structure of VHDL language for VLSI Front-End design is described using behavioral design and test bench verification for a ripple carry adder. Xilinx ISE simulation software and FPGA flow is also demonstrated using same example.

Keywords: Entity, behavioral description, VHDL simulator, XILINX ISE, FPGA.

1. Introduction : VHDL is a hardware description language to describe the hardware associated with a digital system. The VHDL coding is similar to the coding of an ordinary program, like C program. The C program is compiled to make an executable file, whereas the VHDL design is simulated to test the validity of the hardware design (Figure 1). Why bother learning how to write a VHDL specification for a digital circuit ? The behavior or design specification of a VLSI system needs to be logically verified before actual logic/circuit implementation using custom or semi-custom design methodology. VHDL is one such globally accepted hardware language to capture VLSI design specification.

2. Design Hierarchy :

VHDL designs may contain one or more levels, as indicated by the 4-bit ripple adder example (Figure 2). The design method of starting with the topmost level and adding new levels with implementation details is called top-down design. Let’s walk through the design of a 4-bit ripple adder, beginning with the topmost level.



3. Interface Of VHDL Coding :

Design entities, which contain an interface and a body, are the objects used to specify what hardware components are used and how they are connected. The interface describes the number and type of signals used by the entity, and the body describes how the signals are connected and related. The interface for the 4-bit ripple adder is shown in Figure 3. The name of the entity is RIP4. The 4-bit ripple adder adds two groups of four bits together along with a carry input bit, CI and generates a 4-bit sum and a carry output bit (CO). Inputs and outputs to the entity are identified by the port keyword. A port can be an input, output, or bidirectional signal. The A and B inputs to RIP4 are specified as 4-bit vectors, groups of bits that share common properties. The “3 downto 0” portion of the statement indicates that four input bits are needed for A and B, numbered 3, 2, 1, and 0, from left to right. Bit 3 is the MSB and bit 0 is the LSB. Note that there is nothing in the entity interface to indicate how the ripple adder does its job. This is left for the second part of the design entity, the body portion.

4.THE

4. Body Of VHDL Coding :

The body portion of a design entity describes how the function of the entity is performed. The body portion of the RIP4 design entity is responsible for specifying the number and type of components used by the entity (see Listing 1). The component portion of RIPADD indicates that the Full Adder [1] design entity will be used to implement the ripple adder (Figure 4). The port information of the Full Adder entity is required to make the necessary connections as indicated by Figure 4. The signal keyword is used to define internal signals that allow the four full adders to be cascaded by connecting the carry output signals (CYO) to the carry inputs (CYI). We are able to reuse the single Full Adder component by instantiating four copies of it (FA0 through FA3). Each copy is instantiated differently in the four port-mapping statements, with actual signal names being substituted for the desired input/output connections.



4.1. Full Adder Component : As previously mentioned, more detail is added to the design as we proceed to the lower levels. Now that we have an idea of how the full adders are used inside the ripple adder, it’s time to work on the Full Adder entity construction (see Listing 2). As usual, the entity interface indicates only the associated input and output signals. The implementation details are left for the body statements. In Figure 5 we can see the internal structure of the full adder. Two half adders (HA) and an OR gate are required to implement a full adder. Three internal signals complete the wiring scheme. The last statement before the end of the code snippet, (where the CYO output is generated) is called a signal assignment statement. Here only a single logic gate is needed to combine signals, so a simple Boolean expression can be used. The logical operations available in VHDL are NOT, AND, NAND, OR, NOR, XOR, and XNOR (VHDL-93 only). All operations have the same precedence except NOT, which has the highest precedence. This means that parenthesis must be used to enforce a particular order of operations.

4.2. Half Adder Component : The last piece of the sample design is the Half Adder entity (Figure 6), which is instantiated twice in each full Adder entity. The interface and body of the half adder are shown in Listing 3. Again, signal assignment statements are used to generate the required logic function [3].

The Hierarchical ripple carry adder design is coded and simulated using XILINX ISE



Simulator for FPGA implementation (Listing 4).

5. Introduction To FPGA

Field Programmable Gate Array (FPGA) is a reconfigurable hardware. Field Programmable means that the FPGA's function is defined by a user's program rather than by the manufacturer of the device. A typical integrated circuit performs a particular function defined at the time of manufacture. In contrast, the FPGA's function is defined by a program written by someone other than the device manufacturer. Depending on the particular device, the program is either 'burned' in permanently or semi-permanently as part of a board assembly process, or is loaded from an external memory each time the device is powered up. This user programmability gives the user access to complex integrated designs without the high engineering costs associated with application specific integrated circuits.

FPGA is an integrated circuit that contains many (64 to over 10,000) identical logic cells that can be viewed as standard components. The individual cells are interconnected by a matrix of wires and programmable switches. A user's design is implemented by specifying the simple logic function for each cell and selectively closing the switches in the interconnect matrix. The array of logic cells and interconnect form a fabric of basic building blocks for logic circuits. Complex designs are created by combining these basic blocks to create the desired circuit [5].

5.1. FPGA Architectures

Each FPGA vendor has its own FPGA architecture, but in general terms they are all variations of that shown in Fig 8. The architecture consists of configurable logic blocks, configurable I/O blocks, and programmable interconnect. Also, there will be clock circuitry for driving the clock signals to each logic block. Additional logic resources such as ALUs, memory, and decoders may also be available. The three basic types of programmable elements for an FPGA are static RAM, anti-fuses, and flash EPROM.

Figure 8: Generic FPGA architecture



Configurable Logic Blocks (CLBs)

These blocks contain the logic for the FPGA. In the large-grain architecture used by all FPGA vendors today, these CLBs contain enough logic to create a small state machine as illustrated in Figure 9. The block contains RAM for creating arbitrary combinatorial logic functions, also known as lookup tables (LUTs) [6]. It also contains flip-flops for clocked storage elements, along with multiplexers in order to route the logic within the block and to and from external resources. The multiplexers also allow polarity selection and reset and clear input selection.

FPGA PROGRAMS :

FPGA software (i.e; XILINX ISE) synthesizes textual hardware description language code and then completes placement and routing of the translated design (Figure 10). These software packages have hooks to allow the user to fine tune implementation, placement and routing to obtain better performance and utilization ofthe device. Libraries of more complex function macros further simplify the design process by providing common circuits that are already optimized for speed or area. FPGAs have gained rapid acceptance and growth over the past decade because they can be applied to a very wide range of applications. A list of typical applications includes: random logic, integrating multiple SPLDs, device controllers, communication encoding and filtering, small to medium sized systems with SRAM blocks, and many more.Other interesting applications of FPGAs are prototyping of designs later to be implemented in ASIC and also emulation of entire large hardware systems.

Figure 9: FPGA Configurable logic





6. FPGA SYNTHESIS OF A 4-BIT RIPPLE CARRY ADDER

As discussed before, a 4-bit ripple carry adder can be created by simply stringing four 1-bit adders together with a carry path,

thus the most logical starting place would be the design of a single bit and the expansion to 4-bits (Figure 11) [2].

Figure 10: FPGA Design Flow

Figure 11: Block Diagram of RIPPLE CARRY ADDER



Listing 4: Working behavioral flow of 4-bit RIPPLE CARRY ADDER

entity ripplec is

port (num1,num2: in std_logic_vector(3 downto 0);

cin: in std_logic;

sum: out std_logic_vector(3 downto 0);

cout:out std_logic);

end ripplec;

architecture Behavioral of ripplec is

signal c1,c2,c3: std_logic;

begin

FA1: entity adder port map (num1(0),num2(0),cin,sum(0),c1);

FA2: entity adder port map (num1(1),num2(1),c1,sum(1),c2);

FA3: entity adder port map (num1(2),num2(2),c2,sum(2),c3);

FA4: entity adder port map (num1(3),num2(3),c3,sum(3),cout);

end Behavioral;

-- Component Declaration

entity adder is

port (a,b,cin: in std_logic;

sum,carry: out std_logic);

end adder;

architecture Behavioral of adder is

begin

sum <= (a xor b xor cin);

carry <= (a and b) or (b and cin) or (cin and a);

end Behavioral;



Listing 5: Test bench code of 4-bit RIPPLE CARRY ADDER

ENTITY ripplec_tbw_vhd IS

END ripplec_tbw_vhd;

ARCHITECTURE behavior OF ripplec_tbw_vhd IS

COMPONENT ripplec

PORT( num1 : IN std_logic_vector(3 downto 0); num2 : IN std_logic_vector(3 downto 0);

cin : IN std_logic;

sum : OUT std_logic_vector(3 downto 0); cout : OUT std_logic);

END COMPONENT;

SIGNAL cin : std_logic;

SIGNAL num1 : std_logic_vector(3 downto 0) := (others=>'0');

SIGNAL num2 : std_logic_vector(3 downto 0) := (others=>'0');

SIGNAL sum : std_logic_vector(3 downto 0);

SIGNAL cout : std_logic;

BEGIN

uut: ripplec PORT MAP(num1 => num1, num2 => num2, cin => cin, sum => sum, cout => cout);

tb : process

begin

num1<="0010"; --num1 =2

num2<="1001"; --num2 =9

cin<='1';

wait for 200 ns;

num1<="1010"; --num1 =10

num2<="0110"; --num2 =6

cin<='0';

wait for 200 ns;

end process tb;

END;



Figure 12: Testbench using XILINX ISE

Figure 13: Schematic View



The VHDL codes for a specific circuit (eg : ripple carry adder) are entered in XILINX ISE, and then the code is simulated by creating a Testbench as shown in Figure 12. Figure 13 shows the RTL relation of the circuit. Testbench helps the user to feed the inputs for the desired outputs. After that, the code is downloaded into the FPGA kit using serial ports like JTAG and RS-232 cables. After downloading the code, we have to prepare a netlists which acts as a user guide and shows the pins for inputs and outputs. While downloading, if it shows ‘PROGRAM SUCCEEDED’, then it means the code is downloaded successfully. Hence, the FPGA (hardware) is configured by the VHDL code, through Xilinx ISE (software) .

7. Conclusion In this tutorial, we have shown the various structures of VHDL languages used for VLSI front-end design and test bench verification using 4-bit ripple carry adder as example. We have also demonstrated XILINX ISE simulation software and FPGA implementation using same example.

8. Acknowledgement The help given by Professor Krishanu Datta, Microelectronics & VLSI Department, Heritage Institute of Technology, Kolkata, India is greatly appreciated. He reviewed the paper and provided excellent suggestions and informations.

9. References [1] Introduction to VHDL; by Dr. Yaser Khalifa; Electrical and Computer Engineering Department; State University of New York at New Paltz. [2] "Digital Systems: Principles and Applications 10th Edition" ; Tocci, Widmer, and Moss. [3] VHDL - A Starters Guide, 2nd Edition, ©2005 (Sudhakar Yalamanchili). [4] The VHDL Cookbook, First Edition ©1990-1997 (Peter J. Ashenden). [5] Designing with FPGAs and Cplds with CDROM [ Paperback 2002 ]; Bob Zeidman , Robert Zeidman. [6] FPGA Simulation: A Complete Step- By- Step Guide [ Paperback 2009 ]; Ray Salemi. [7] Circuit Design with VHDL: [ Hardcover 2004 ]; Volnei A Pedroni.



Embedded real time VLSI systems

Author:Arghya kusum Mukherjee & Samaresh Garai M.Tech(VLSI), ECE-Student (2011-13)

Abstract: This paper describes scheduling periodic real time tasks on FPGA, a reconfigurable hardware device. In this paper, Preemptive scheduling algorithm “Earliest deadline first-next fit” which is adapted to FPGA model is discussed.

Keywords: Real time Embedded system, scheduler, EDF-NF algorithm, FPGA.

1. Introduction: Embedded system uses a microprocessor or microcontroller to do dedicated task. Many embedded system must meet real time constraints. A system is called a real-time system, when we need quantitative expression of time (i.e. real-time) which are carried out using a physical clock to describe the behavior of the system.

2 . Hardware of Embedded systems:

Figure 1: Shows various hardware parts of Embedded System.

2.1 Memory block: Memory contains software to control embedded system[1].

2.2 Analog to digital converter block: ADC provides interface to analog sensor. Embedded system is expected to receive sensor input through ADC.

2.3 Digital to analog converter block: DAC block provide interface to actuator. Embedded system which is situated in an environment is expected to interface with analog external environment through actuator[1].

2.4 FPGA/ASIC block: This block is because in many cases CPU may not have ability to execute software satisfying real time constraints, under those circumstances we may need special hardware to interface with CPU[1].

2.5 Human interface block: CPU needs to implement human interface block like if any control function to be altered by human, hence this is an important component[1].



2.6 Diagnostic tools block: The embedded system is expected to work forever. There are probabilities of failure and if failure occur then for the repair we need diagnostic tools to check whether all its parts are functioning properly or not. So it is expected to do some self test. Hence diagnostic tools are important[1].

2.7 Auxillary system block: It deals with power. If there is power dissipation then cooling become essential component[1].

2.8 Casing block: Whole system should be properly packaged. Packaging should be as per the requirement of external environment where embedded system is expected to be placed. If packaging issue is not well then a good design may also fail[1].

3. Characteristics of Real-time systems: The characteristics of real-time systems are as follows:

3.1 Time constraints: Every real-time task is associated with some time constraints. One form of time constraints that is very common is deadlines associated with tasks. It is the responsibility of the real-time operating system (RTOS) to ensure that all tasks meet their respective time constraints[2].

3.2 New Correctness Criterion: In real-time systems, correctness implies not only logical correctness of the results, but the time at which the results are produced[2].

3.3 Safety-Criticality: A safe system is one that does not cause any damage even

when it fails. A reliable system on the other hand, is one that can operate for long durations of time without exhibiting any failures. A safety-critical system is required to be highly reliable[2].

3.4 Concurrency: A real-time system needs to respond to several independent events within very short and strict time bounds[2].

3.5 Stability: Under overload conditions, real-time systems need to continue to meet the deadlines of the most critical tasks, though the deadlines of non-critical tasks may not be met[2].

4. Types of Real time tasks:

Real time tasks can be grouped into the following types:

4.1 Hard Real-Time Tasks: A hard real-time task is one that is constrained to produce its results within certain predefined time bounds. The system is considered to have failed whenever any of its hard real-time tasks does not produce its required results before the specified time bound[2].

4.2 Firm Real-Time Tasks: Firm real-time task is associated with some predefined deadline before which it is required to produce its results. However, unlike a hard real-time task, even when a firm real-time task does not complete within its deadline, the system does not fail. The following are two examples of firm real-time tasks: Video confeencing and satellite-based tracking of enemy movements[2].



4.3 Soft Real-Time Tasks: Soft real-time tasks also have time bounds associated with them. An example of a soft real-time task is web browsing. Normally, after an URL (Uniform Resource Locater) is clicked, the corresponding web page is fetched and displayed within a couple of seconds on the average. However, when it takes several minutes to display a requested page, we still do not consider the system to have failed, but merely express that the performance of the system has degraded[2].

4.4 Non-Real-Time Tasks: A non-real-time task is not associated with any time bounds. A few examples of non-real-time tasks are: batch processing jobs, e-mail[2].

5. Real time scheduling: Scheduling involves the allocation of resources and time to tasks in such a way that certain performance requirements are met. Basic thing in real time systems is to make sure that tasks meet their time constraints.

Scheduler is a part of operating system that decides which process/task to run next. Scheduler uses a scheduling algorithm that enforces some kind of policy that is designed to meet some criteria.

5.1 Types of Scheduling: There are two categories of scheduling. Offline and Online scheduling.

For the offline scheduling,the scheduling decisions are taken at compile time. A table stores the start time of each task

and the program follows this table to select the task to be executed.

Online scheduling makes scheduling decisions at runtime. Whenever there is a change in task status,scheduling decisions are made by comparing priorities of tasks.

Online scheduling is divided into Static Priority and Dynamic priority scheduling.

If task priorities are static,t he priorities of tasks are fixed and do not change. Example RMA which assigns high priority to tasks with shorter periods.

In Dynamic priority scheduling,the priority of tasks may change during run time. Example EDF, which schedules the tasks with earliest absolute deadline.

6. Adaptation of a scheduling algorithm to FPGA: FPGA is a popular reconfigurable device, that can be reprogrammed after fabrication.

In this paper we devise scheduling algorithm for periodic real time task on FPGA. Here preemptive scheduling algorithm Earliest deadline first-next fit is adapted to FPGA model.

A program code for Earliest deadline first-next fit algorithm is written in VHDL(Very High Speed IC Hardware Description Language). After that the program code is logically verified and program file in .bit is generated. This



bitstreams are then programmed into FPGA.

6.1 Earliest deadline first-next fit scheduling: EDF-NF is an offline scheduling algorithm that require high number of FPGA configuration and is dispatched at runtime.

It keeps a list of all released tasks in a ready queue. Ready queue is sorted by increasing absolute task deadlines. To determine the set R running task, EDF-NF scans through ready list. Ti, an instance of task, is added to set of running task R, as long as the sum of the area of all running task<= 1.

When next task cant be added, EDF-NF move in ready queue and try to add task with longer deadline improving device utilization.

If no more task can be added, running set of tasks is closed and is compiled to an FPGA configuration.

When a new task instance is released or running instance of task terminate, FPGA configuration can then be changed[3].

6.2 Assumptions for the scheduling algorithm: We consider a set of periodic tasks T. Each task Ti belongs to T.

The instance of each task Ti is Ti,j are released with period Pi.

Ci is worst case computation time of all task instances.

The finishing time of task instances Ti,j is Fi,j.

Assume real time tasks with deadlines equal to periods. So, Deadline of a task instance Ti,j is Ri,j+1.

Amount of reconfigurable logic resources, a task requires is Ai.

Assume no single task requires more resource than available. The configurable logic block of FPGA is refer as area of device. Normalize this area to 1.

The device can execute any set R which is a subset of T of tasks simultaneously, as long as the amount of resources required by the task set does not exceed available area. R is set of running tasks.

A running instance of task Ti can be preempted by another task Tj before its completion and later on resumed.

Runtime system has to interrupt execution of R save the contexts of all tasks Ti belongs to R. Then FPGA is fully reconfigured with a new configuration including all tasks Tj belongs to R^.

When R is scheduled for execution again, the previously saved contexts of Ti belonging to R are restored and R is restarted.

In the scheduling analysis preemption time and restore progress is neglected.

The schedule can easily be proven feasible, because every task instance meets its deadline for the entire hyperperiod of tasks set. Hyperperiod is LCM of all task period in the task set[3].



6.3 Algorithm of Earliest Deadline first-next fit:

ALG. Earliest deadline first-Next fit

Require:list Q of ready tasks,sorted by increasing absolute deadlines.

1: procedure EDF-NF(Q)

2: R φ

3: Arunning = 0

4: for i 1, |Q| do

5: if Arunning +Ai <=1 then

6: R R U Ti

7: Arunning = Arunning +Ai

8: end if

9: end for

10: return R end procedure[3].

7.Acknowledgement: We would like to express my heartfelt gratitude to PROF. Krishanu Datta for giving his valuable guidance and insight while writing this paper work. We also acknowledge and appreciate the help rendered by Miss

Ananya Mukherjee, Assistant Proffessor, TIT University, Agartala, Tripura. Thanks are also due to my friends for their constant support and encouragement in finishing this paper work.

8. Conclusion: We have discussed real time scheduling onto reconfigurable hardware device called FPGA and have presented scheduling algorithm Earliest deadline first-next fit. EDF-NF algorithm is efficient because it can generate feasible schedules for task sets with higher system utilization.

The drawback of EDF-NF algorithm is that it requires high number of FPGA configurations.

9. References:

[1] NPTEL video lecture on Embedded system by Prof.Shantanu Chaudhury.

[2] NPTEL IITK web courses on Real time scheduling.

http://nptel.iitm.ac.in/courses.php?disciplineId=108

[3] http://wenku.baidu.com/view

E-Journal of VLSI

Documents

vlsi system design

vlsi design flow

advent of vlsi

memory design

physical design

vlsi fabrication overview

vlsi memory overview

vlsi related articles