Top Banner
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682 Vol.23, No.23, March 2014 310 Energy Efficient NANO Ram Cell for the Energy Efficient Operation X. ARUN BROMIO Dr. A.RAJARAM PG SCHOLAR ASSOCIATE PROFESSOR DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING KARPAGAM UNIVERSITY, COIMBATORE AbstractWe presented a hybrid CMOS/nanotechnology reconfigurable architecture, earlier. It was based on CMOS logic and nano RAMs. It used the concept of temporal logic folding and fine-grain (e.g., cycle-level) dynamic reconfiguration to increase logic density by an order of magnitude. This dynamic reconfiguration is done intra-circuit rather than inter-circuit. However, the previous design of NATURE required fine-grained distribution of nano RAMs throughout the field-programmable gate array (FPGA) architecture. Since the fabrication process of nano RAMs is not mature yet, this prevents immediate exploitation of NATURE. In this paper, we present a NATURE architecture that is based on CMOS logic and CMOS SRAMs that are used for on-chip dynamic reconfiguration. We use fast and low-power SRAM blocks that are based on 10T SRAM cells. We have also laid out the various FPGA components in a 65-nm technology to evaluate the FPGA performance. We hide the dynamic reconfiguration delay behind the computation delay through the use of shadow SRAM cells. Experimental results show more than an order of magnitude improvement in logic density and improvement in the area-delay product relative to a traditional baseline FPGA architecture that does not use the concept of logic folding. Index TermsField-programmable gate arrays (FPGAs), integrated circuits, logic folding, nanotechnology reconfigurable architecture (NATURE). INTRODUCTION With every new technology node, the share of leakage in total power consumption of cache and other SRAM-based memories considerably increases since technology scaling reduces dynamic power but increases leakage. Among major leakage components, subthreshold leakage is dominant in cache and other SRAM-based memories in nanometer technologies [1]. One naïve solution for reducing SRAM leakage is to increase transistor threshold voltage to exponentially reduce subthreshold leakage power, but this negatively affects SRAM access delay. Traditionally, access delays of all SRAM cells of a memory module on a chip have been the same, and hence, increasing would have resulted in almost all cells violating the original timing. Due to within-die variations in sub- 90 nm technologies, however, similar SRAM cells in a single cell array show different delays; delay of some of them is close to the delay specified for the entire cell array, while most of them are farther away. Consequently, leakage can be effectively reduced without delay penalty if reverse body bias (RBB) is applied to transistors of SRAM cells, and the resulting few delay-violating (slow) cells are replaced by redundancies; the cost is an eligible area, delay, and dynamic power penalty due to redundant row/columns. Process variation results in changes in circuit parameters(such as transistor gate length, threshold voltage, circuit delay ,and the like) during manufacturing such that the manufactured device may differ from the designed device in some features. The within-die variation refers to variations in circuit parameters within a single die, which means that similar circuits indifferent parts of the die may have different features. Die-to-dievariations, on the other hand, are the variations that happen from one die to the other, from wafer to wafer, and from wafer lot to wafer lot. Our focus is on uncorrelated random within-dievariation of SRAM cell delay. While die-to- die variations have been observed for several years, within-die variations have more recently started to intensify [2]. Empirical studies in [3]show that 3.54% random within-die delay variation is observed for a single logic element (approximately equivalent to a single SRAM cell) of 180-nm field-programmable
10

Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

Apr 25, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

310

Energy Efficient NANO Ram Cell for the

Energy Efficient Operation

X. ARUN BROMIO Dr. A.RAJARAM

PG SCHOLAR ASSOCIATE PROFESSOR

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

KARPAGAM UNIVERSITY, COIMBATORE

Abstract— We presented a hybrid

CMOS/nanotechnology reconfigurable

architecture, earlier. It was based on CMOS logic

and nano RAMs. It used the concept of temporal

logic folding and fine-grain (e.g., cycle-level)

dynamic reconfiguration to increase logic density

by an order of magnitude. This dynamic

reconfiguration is done intra-circuit rather than

inter-circuit. However, the previous design of

NATURE required fine-grained distribution of

nano RAMs throughout the field-programmable

gate array (FPGA) architecture. Since the

fabrication process of nano RAMs is not mature

yet, this prevents immediate exploitation of

NATURE. In this paper, we present a NATURE

architecture that is based on CMOS logic and

CMOS SRAMs that are used for on-chip dynamic

reconfiguration. We use fast and low-power

SRAM blocks that are based on 10T SRAM cells.

We have also laid out the various FPGA

components in a 65-nm technology to evaluate the

FPGA performance. We hide the dynamic

reconfiguration delay behind the computation

delay through the use of shadow SRAM cells.

Experimental results show more than an order of

magnitude improvement in logic density and

improvement in the area-delay product relative to

a traditional baseline FPGA architecture that does

not use the concept of logic folding.

Index Terms—Field-programmable gate arrays

(FPGAs), integrated circuits, logic folding,

nanotechnology reconfigurable architecture

(NATURE).

INTRODUCTION

With every new technology node, the share of

leakage in total power consumption of cache and

other SRAM-based memories considerably increases

since technology scaling reduces dynamic power but

increases leakage. Among major leakage

components, subthreshold leakage is dominant in

cache and other SRAM-based memories in

nanometer technologies [1]. One naïve solution for

reducing SRAM leakage is to increase transistor

threshold voltage to exponentially reduce

subthreshold leakage power, but this negatively

affects SRAM access delay. Traditionally, access

delays of all SRAM cells of a memory module on a

chip have been the same, and hence, increasing

would have resulted in almost all cells violating the

original timing. Due to within-die variations in sub-

90 nm technologies, however, similar SRAM cells in

a single cell array show different delays; delay of

some of them is close to the delay specified for the

entire cell array, while most of them are farther

away. Consequently, leakage can be effectively

reduced without delay penalty if reverse body bias

(RBB) is applied to transistors of SRAM cells, and

the resulting few delay-violating (slow) cells are

replaced by redundancies; the cost is an eligible area,

delay, and dynamic power penalty due to redundant

row/columns. Process variation results in changes in

circuit parameters(such as transistor gate length,

threshold voltage, circuit delay ,and the like) during

manufacturing such that the manufactured device

may differ from the designed device in some features.

The within-die variation refers to variations in circuit

parameters within a single die, which means that

similar circuits indifferent parts of the die may have

different features. Die-to-dievariations, on the other

hand, are the variations that happen from one die to

the other, from wafer to wafer, and from wafer lot to

wafer lot. Our focus is on uncorrelated random

within-dievariation of SRAM cell delay. While die-to-

die variations have been observed for several years,

within-die variations have more recently started to

intensify [2]. Empirical studies in [3]show that 3.54%

random within-die delay variation is observed for a

single logic element (approximately equivalent to a

single SRAM cell) of 180-nm field-programmable

Page 2: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

311

gate array (FPGAs).Furthermore, this within-die

variation is predicted to rise when further

approaching atomic sizes with every new technology

node [2], [3]. Within-die variations are commonly

modeled by Gaussian distribution [4], which also

well matches empirical results in [3].With the

increasing within-die variation in current and future

technology nodes, this delay distribution widens (the

standard deviation increases). Consequently, in order

to ensure that only a few SRAM cells violate the

timing, the delay for the entire cell array has to be set

farther away from the nominal cell delay. In other

words, the cell array delay is set near the tail end of

the delay distribution diagram (say at, assuming and

are, respectively, mean and standard deviation of

cells delay distribution), and hence, most cells show

substantially less delay than target delay.

Row/column redundancy has long been used to repair

faults in RAM memories [5] and has more recently

become in evitable in high-density SRAM memories

to obtain acceptable manufacturing yield [6], [7], but

we use the redundancy differently: instead of

replacing unpredictable manufacturing faults, we

replace expected delay-violating cells caused by the

RBB intentionally chosen at design time. In cache

memories, several previous work addresses

improving timing yield in the presence of process

variation by proposing process-tolerant cache

architectures[4], [8] and code-placement compiler

techniques [9], but they actually reduce the useful

capacity of the cache by marking and avoiding using

too-slow cache lines; furthermore, they do not

concern power reduction. Authors in [10] use

variable latency access to different parts of cache,

and also turn off to slow as well as too leaky parts of

the cache; the latter reduces power, but on the

contrary, we do not turn off leaky parts to save

power: we use higher by applying RBB, and then

replace slow parts. Meng and Joseph [11] consider

the within-die variation in leakage of different cache

ways when turning off un usedcache ways, as

proposed in [12]. Other highly cited work exists to

reduce cache static power [12]–[14], but they do not

consider process variation. None of the

aforementioned techniques improves leakage of the

portions of the cache that are being used. In this

paper, we propose an optimization technique for

SRAM-based memory design that is applied at design

time of the SRAM array, and reduces leakage power

by RBB at the cost of extra area for redundant

rows/columns of SRAM. We:1) keep supply voltage

untouched (to avoid its quadratic impact on dynamic

power) and 2) based on known within-die delay

distribution, and given yield and speed of the

memory module, we choose number of redundant

rows/columns such that the leakage is minimized (by

RBB) without reducing useful memory size. A

negligible delay overhead is imposed due to the

programmability needed in row/column decoders to

remaps low rows/columns to spare ones. Also, longer

word lines orbit lines (corresponding to column or

row redundancies, respectively)result in a marginal

higher dynamic energy per access. Results of our

SRAM design on a commercial 90-nm process shows

that less than 4% redundancy (ten columns added to

a256 256 memory) reduces the leakage by 40% while

dynamic power, delay, and area increase by only

1.3%, 0.6%, and 3.4%respectively; timing yield and

memory capacity are kept intact. Parts of this archival

work are published in our previous work[15].

Fig.1. Conventional SRAM Cell Layout

Fig. 3 shows a SEM photograph and the

expected layout of a 0.614- m SRAM cell using a

conventional layout for the poly-Si and diffusion

layers .Generally; we call the ratio of the driver

transistor channel width to an access transistor the

“ratio” of the layout. A conventional SRAM cell has

a hammer-head of poly-Si layer and a convex

diffusion layer shape.

Page 3: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

312

Fig.2. Conventional SRAM module

There are two reasons for this. The first is to

maintain the poly-Si-contact extension margin under

the contact layer. In this way, the poly-Si layer under

the contact layer is wider than that of the channel

area [4], [5]. The second is to maintain a sufficient

ratio. In the conventional SRAM cell layout, the ratio

is around 1.2–1.5 to obtain a sufficient static noise

margin (SNM).Fig. 3 shows the reverse narrow width

effect of SRAM NMOS transistors in 180-nm

technology. In the conventional SRAM layout, the

Vth of the access transistor is lower than that of the

driver transistor because of this reverse narrow

channel effect.

Fig. 2 shows the definition of the electric

ratio. At low Vdd, the on-current of the access

transistor is larger than that of the driver transistors

due to Vdd Vth being larger than Vdd Vth.

Therefore, the electric ratio is less than 1.0at low

Vdd. To obtain a large electric ratio using the

conventionally out, we would have to adopt a wider

driver transistor channel width, which would result in

a larger cell size. Fig. 2 shows the occurrence of

asymmetric access transistor sizes in case of

misalignments using the conventional SRAM cell

layout. In our 180-nm technology, the poly-Si layer is

aligned to the diffusion layer and the contact layer is

aligned to the poly-Si layer. This figure shows the

poly-Si layer mis aligned to the diffusion layer. In the

misalignment in the X direction, one access

transistor’s is shorter than the other’s. In the

misalignment in the Y direction, one access

transistor’s is wider than the other’s. Therefore, when

manufacturing misalignment occurs, the access

transistor’s electrical characteristics are asymmetric.

In 180-nm technology with a large local Vth

variability, the asymmetric access transistor’s

electrical characteristics have a significant impact on

the stable operation of the SRAM. Furthermore, from

the DFM point of view, the complicated transistor

shape of the conventional layout enlarges the global

Vth variability and causes the asymmetric transistor’

electrical characteristics.

B. Straight Diffusion and Poly-Si Layers SRAM

Cell Layout

Fig. 2 shows a conventional SRAM module and the

expected layout ofa 0.494- m SRAM cell using

straight diffusion and poly-Silayers. We will call this

0.494- m SRAM cell using straight diffusion and

poly-Si layers “the straight cell” below and the0.614-

m SRAM cell shown in Fig. 3 “the conventional cell.

The sizes of each SRAM transistor are summarized

in Table I.In this cell layout, the diffusion and poly-Si

layers are straighter than the conventional layout [7],

[17]. Because the straight cell layout has a simple and

lithography-friendly transistor shape, the global Vth

variability of this cell is relatively small. The driver

transistor’s channel width is equal to the access

transistor’s channel width. Therefore, the ratio is

equal to 1.0. The electric ratio is also equal to 1.0

[17]. In this layout, a wider access transistor channel

width can be used. Therefore, a large access

transistor’s on-current can be expected. Using

straight diffusion and poly-Si layers, we can realize

symmetrical electric characteristics even if a photo-

mask misalignment occurs. Fig. 2 shows the SNM

distribution of the conventional and straight cells. T

he SNM of the straight cell at high Vdd is smaller

than that of the conventional cell (see Fig. 2).

However, the (median) is 6.7 for the conventional

cell, while the is7.4 for the straight cell. The large

means the tolerance to the variability of the SNM is

large. Therefore, the straight diffusion and poly-Si

layers SRAM cell layout is variability-tolerant. This

is because the straight cell uses a simple layout and a

larger access transistor that has a smaller local Vth

variability sigma. Fig. 3 show the Vdd dependence of

the SNM of conventional and straight cells.

Fig.3. Proposed SRAM structure

Page 4: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

313

The SNM of the conventional cell without a read

assist circuit (R-AC, see Section III) increases

continuously as Vdd increases. On the other hand, the

SNM of the straight cell without R-AC peaks at

around 1.2 V. At low Vdd, the straight cell’s SNM

without R-AC is larger than that of the conventional

cell without R-AC, because of its larger electric ratio.

However, at high Vdd, the straight cell’s SNM

without R-AC is smaller than that of the conventional

cell without R-AC because the electric ratio is

smaller.

Classic Techniques for SRAM Leakage Reduction

Traditional techniques do not concern within-die

variations mainly because such variations have not

been a major issue at the time. Most of these

techniques rely on turning off the SRAM cells or

putting them into a low-power mode when not

accessed. Cache memories are among the most

widely used SRAM based memories. Turning off

unused parts of the cache [12],[13], [17], or putting

them in a low-energy “drowsy” mode using two

different supply voltages [14] are well-known

techniques to reduce cache leakage, but they target

unused parts (i.e., parts not accessed for a long time)

of the cache and cannot handle the cache parts being

accessed. In sleep mode, source biasing by a virtual

ground [18] and also RBB [19] can effectively reduce

leakage when processor is idle, but they are not used

in the active mode since they increase cells delay and

result in lower performance and/or reduced timing

yield. We also use RBB to reduce leakage, and

furthermore, we apply it in the active mode in

addition to sleep mode; however, due to within-die

delay distribution, only a few of the SRAM cells

actually violate the timing and are then replaced by

redundancies. Dual and dual values are used in [20]

to reduce leakage by setting appropriate and for each

individual SRAM cell based on its closeness to the

sense amplifier. We take advantage of finer control

possibly by RBB and apply it uniformly to all cells in

the cell array.

LARGE-SCALE SRAM DESIGN METRICS

The major drawback of the conventional DC

read/write margin metrics presented in the previous

section is the in ability to measure them in dense

functional SRAM arrays because of the metal spacing

constraints for routing out internal storage nodes and

the significant area overhead associated with the

switch array. This results in an insufficient number of

data points for failure analysis of large cache

memories. To increase the sample size, the SRAM

array must stay intact; in this case, SRAM read

stability and write ability must be characterized by

accessing only the bit-lines, the word-line, and the

cell supply voltages. As an example, bit-line access

has been previously applied to detect and isolate

faulty SRAM cells in memory arrays [12]. Similarly,

large-scale performance of the SRAM cells has been

characterized through distributions of per-cell

minimum operating voltages [5], [6]. In this section,

we introduce a method for characterizing the SRAM

cell read stability and write ability in functional

SRAM arrays by taking advantage of direct bit-line

measurements while adjusting bit-line, word-line, and

cell supply voltages. Furthermore, we show how

direct bit-line measurements can be adopted to

characterize the

SRAM during standby, read, and write cycles.

A. Read Stability Measurement

1) Supply Read Retention Voltage (SRRV):

During the read cycle, both bit-lines float around

while the word-line is driven high, and the cell state

is retained by keeping the cell supply sufficiently

high. The SRAM read stability in functional SRAM

arrays can be gauged by the lowest cell supply

voltage for data retention during a read cycle,

denoted as the supply read retention voltage (SRRV)

[3]. Fig. 2 shows the measurement setup for SRAM

along with the analysis going to be taken for the

system. To capture the SRAM read stability, both BL

and BLC are pre charged and WL is driven by the

operating voltage to emulate a read cycle. The BL

current at the ‘0’ storage node is monitored while

ramping down the SRAM cell supply. When the cell

supply is dropped sufficiently low, the SRAM cell

loses its ability for data retention when dominates so

that CL, originally holding ‘0’, rises above the trip

point of inverter. The measured transfer curves, of

versus, are plotted in Fig. 2. The difference between

and the value of causing to suddenly drop represents

the SRRV of the SRAM cell. When the SRAM cell

is biased for a nominal read operation with WL,

BL,BLC and all biased at . Indicates that can be

dropped below without disturbing the data.

Therefore, SRRV represents the maximum tolerable

DC noise voltage at the cell supply before causing a

read upset. Intrinsic mismatch of transistors within an

SRAM cell typically results in an asymmetry in the

cell robustness to read upset between storing a ‘1’

(when CL holds ‘0’) and storing a ‘0’(when CH holds

‘0’). Depending on the degree of asymmetry in the

SRAM cell, a data disturbance, in the form of a bit

flip, either occurs on both sides of the SRAM cell or

only on the less read-stable side as the cell supply is

dropped. Fig. 3 and shows the SRRV transfer curves

Page 5: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

314

for SRAM cells with worse read stability when CL

holds ‘0’—i.e., lower cell -ratio (defined as the

strength ratio of pull-down to pass-gate transistors)at

the CL node. Fig. 3(a) shows that when ‘0’ is stored

at the less read-stable CL node, all curves exhibit a

sharp fall-off in the BL current , suggesting a clear

SRAM cell data disturbance in the form of a bit flip.

However, when ‘0’ is stored at the more read-stable

CH node, only some of the curves exhibit a sharp

fall-off in the BLC current , corresponding to SRAM

cells with a smaller asymmetry between the two

halves, while other curves show a smooth bending in

the measured [Fig. 3], corresponding to SRAM cells

with a larger asymmetry between the two halves. In

the latter case, due to a heavily skewed read stability

favoring the node CH, a clear data disturbance, in the

form of a bit flip, does not occur when the cell supply

is dropped beyond data retention and the SRAM cell

enters a meta stable state. Fig. 3 shows similar SRRV

transfer curves for SRAM cells with worse read

stability when CH holds ‘0’—i.e., lower cell -ratio at

the CH node. To gauge SRAM read stability, the

SRRV value extracted from the less read-stable

storage node is used.

2) Word-Line Read Retention Voltage (WRRV):

When the word-line is driven high during a

read/write cycle, both the SRAM cell under direct

read access and all un accessed SRAM cells driven

by the asserted word-line undergo a read stress. This

read stress can be exacerbated by boosting the word-

line voltage beyond. Therefore, the read stability of

an SRAM cell can also be gauged by the largest

word-line boost without upsetting cell data retention,

denoted as the word-line read retention voltage

(WRRV). Fig. 4 shows the measurement setup for

WRRV. To capture the WRRV of the SRAM cell, the

SRAM cell supply is biased at with both BL and

BLC precharged. WL voltage is ramped above, and

kept below the gate-oxide breakdown voltage set by

the technology, while the BL current at the ‘0’

storage node is monitored.

When the WL voltage is boosted sufficiently

high above, the SRAM cell state is disturbed due to

an exacter bated read stress as dominates and pulls

above the trip point of inverter. The cell disturbance

is captured as a sudden drop in the measured current.

The measured transfer curves, of versus WL voltage,

are plotted in Fig. 4. The WRRV of an SRAM cell is

quantified as the difference between the WL voltage

causing to suddenly drop and. Similar to SRRV,

when, the SRAM cell is biased for a nominal read

operation with WL,BL, BLC and all biased at .

Indicates that can be boosted above without

disturbing the data. Therefore, WRRV represents the

maximum tolerable DC voltage rise on the WL

before causing a read upset. When the read stability

of the SRAM cell becomes heavily skewed towards

either storage node CH or CL, data stored in the more

robust node will be preserved even under very high

WL boost, and the measured current never drops

significantly. Fig. 5 plots the WRRV transfer

curvesfor SRAM cells with worse read stability when

CL holds‘0’—i.e., lower cell -ratio at CL node. Fig.

5(a) shows that when ‘0’ is stored at the less read-

stable CL node, all curves exhibit a sharp drop in ,

suggesting a clear SRAM cell data disturbance.

However, when ‘0’ is stored at the more read-stable

CH node, only some curves exhibit a sharp drop in

corresponding to SRAM cells with a smaller

asymmetry between the two halves, while other

curves show either continued increase or a slight

bending in the measured [Fig. 5], corresponding to

SRAM cells with a to a heavily skewed read stability

favoring the node CH, the cell state is not disturbed

by the overdriven WL. Fig. 5 and plots similar

WRRV transfer curves for SRAM cells with worse

read stability when CH holds ‘0’—i.e., lower cell -

ratio at CH node. To gauge SRAM read stability, the

WRRV value extracted from the less read-stable

storage node is used.

1) Bit-Line Write Trip Voltage (BWTV):

During the write cycle, bit-lines are configured

according to the new data and the word-line is driven

high. The write ability of an SRAM cell in a

functional SRAM array can be gauged by the

maximum bit-line voltage, at the ‘1’ storage node,

able to flip the cell state during a write cycle [3], [4],

[13], [14], denoted as the bit-line write trip voltage

(BWTV). Fig. 6 shows the measurement setup for

BWTV. To capture the BWTV of an SRAM cell, the

cell supply, WL, and BL (at the ‘0’ storage node) are

biased at. BL current at the ‘0’ storage node is

monitored while ramping down the BLC voltage. As

BLC is ramped low, the pass-gate overcomes and the

‘1’ storage is dropped below the inverter trip point,

resulting in a successful write, signified by a sudden

drop in. Fig. 6 plots the measured transfer curves of

versus BLC voltage. The BWTV is quantified as the

BLC voltage that induces a sudden change in. When,

the SRAM cell is biased for a nominal write

operation with WL, BL(or BLC), and biased at and

BLC (or BL) biased at. Indicates that a successful

write operation can take place even with a BLC (or

BL) voltage higher than. Therefore, BWTV

represents the maximum bit-line voltage slack that

can be tolerated to successfully write the cell since

Page 6: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

315

the measurement setup for BWTV requires first

exerting a read stress to the SRAM cell under test

(CUT), a read disturb may occur in the CUT before

the BWTV can be observed when testing at lower

supply voltages. Due to intrinsic mismatch of

transistors within an SRAM cell, a read disturbance

at a high enough supply voltage typically happens to

only one side of the CUT while a read disturbance at

the other side either may happen at a lower supply

voltage or not at all(Fig. 3). In this case, BWTV can

still be characterized for the more read-stable side of

the CUT, which typically corresponds to the less

writeable side. Fig. 7 plots the measured transfer

curves with for SRAM cells undergoing read

disturbance. SRAM cells undergoing read

disturbance on both sides were not identified with as

low as 0.4Vfor the particular chip tested. Over 10

variation in the measured BL current before the

stored data of the CUT flips was observed, indicating

transistor operation in the onset of weak-inversion.

Further reduction of the supply voltage may result in

SRAM cells undergoing read disturbance on both

sides, in which case the BWTV can no longer be

characterized for either side.

2) Word-Line Write Trip Voltage:

The write ability of an SRAM cell can also be

captured by first configuring the bit-lines to write the

data and then ramping up the word-line[3], [14]. The

minimum word-line voltage able to flip the cell state

during a write cycle, denoted as the word-line write

trip voltage (4), can be used to gauge the SRAM

write ability. Fig. 8 presents the measurement setup

for resistances. The cell supply and BL are biased at

while BLC is biased at. As is ramped high, the

measured current initially resembles the curve of the

pass-gate. Fig. 8 plots the measured transfer curves of

versus WL voltage. The is quantified as the value,

where is the minimum WL voltage causing the

sudden drop in. Similar to the SRAM cell is biased

for a nominal write operation with WL, BL (or BLC),

and biased at and BLC (or BL) biased at. Therefore,

represents the maximum WL voltage slack that can

be tolerated to successfully write the cell. The most

notable advantage of the measurement is that, unlike

during the BWTV measurement, the SRAM CUT is

not put under read stress at the onset of the

measurement (With). Therefore, can continue to be

characterized for SRAM cells under aggressively

scaled supply voltages.

C. Characterization

In addition to read stability and write ability

characterization, the direct bit-line access scheme can

be adopted to characterize the minimum DC

operating voltage of each SRAM cell during standby,

read, and write cycles. Fig. 6 shows the flow-chart

diagram for measuring SRAM during astatic read

operation. Each iteration of this measurement

startswith a data initialization under the nominal

supply voltage—1.1 V in this technology. The

SRAM cell is then configured for a low voltage read

operation with all set to a lower supply voltage,

which is gradually reduced for each miteration of the

measurement process.Finally, the supply voltage is

raised for a read operation at and the BL current at

the ‘0’-initialized storage node is measured. The

measured current should be high—equal to —if no

data disturbance took place during the low voltage

read. Therefore, can be characterized as the

maximum operation voltage before drops [Fig. 5].

SRAM during standby can be characterized using the

same procedure as in Fig. 9(a) by keeping at during

each low voltage read operation, to emulate a low

voltage hold operation.

To eliminate accidental data disturbance,

WL is turned off between low voltage and high

voltage operations. Furthermore, is reduced by 200

mV during the high voltage read operation.

Fig. 5 shows the flow chart diagram for measuring

SRAM during a static write operation. The procedure

is very similar to that for the read characterization—

but instead of a low voltage read operation, each

iteration performs a low voltage write operation

where , at the ‘0’-initializedstorage node, and are set

to a lower supply voltage and at the ‘1’-initialized

storage node is set at . Each low voltage write is

immediately followed by a high voltage read where

the BL current at the ‘0’-initialized storage node is

measured. The measured current should be low if

data is successfully written during the low voltage

write. Can be characterized as the maximum

operation voltage while remains low. Note that

characterization using direct bit-line measurements is

slower than the typical on-chip digital SRAM tester,

using similar read-after read and read-after-write

sequences described above, because of the need to

monitor the BL current. However, since the direct

bit-line characterization can be performed alongside

the large-scale read stability and write ability

measurements with no additional hardware overhead,

it is used, in this work, to establish correlations

between cell failure and the cell read and write

characteristics.

Vdd AND Vth SCALING FOR SRAM

A static RAM with six transistors, making a flip-flop

circuit with bistable states is widely used. The

bistability of the SRAM cell can be observed using

Page 7: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

316

its eye property. In order to hold data, the static noise

margin (SNM), defined by the size of the eye, and

should be kept large. The specification of SNM is

such that are liable eye property is maintained despite

the process fluctuations, variations in the operating

conditions such as temperature and voltage, and bit-

line noise. Since SNM becomes small with the

reduction of the supply voltage, it becomes weaker

against the threshold-voltage variation. In order to

obtain high SNM, higher threshold voltage and high

beta ratio are beneficial, as shown in Fig. 7. By

increasing the beta ratio, the slope becomes steeper

and the eye becomes larger. This increase of beta

ratioresults in an area increase. Higher threshold

voltage makes theeye larger, though it must be kept

lower than half of Vdd. IfVth becomes much larger

than half of Vdd, the eyes disappearand SRAM does

not work properly. Moreover, in the case of amemory

array (for example, 512 cells connected together on a

single bit line), the OFF-state current and the gate-

leakage current of the transfer gate will appear from

each bit in a bitline despite the word line being off.

When the integral value of this OFF-state current and

the gate-leakage current becomes comparable to the

cell current, which is supposed to be turned on by the

word line, the reading operation will fail. Therefore

both small leakage of the transfer gate and large cell

current are required. A longer gate length for transfer

transistor and a wide width for driver are stable but

result in a reduced density. In this study, a high-

density SRAM with a cell size of0.56 μm2 developed

for 65-nm generation system LSIs [11] is

investigated. Symmetry cells have simple straight

patterns of active region and gate to suppress the

variation of the critical dimension. Figs. 8 and 9

indicate the yield of 2-MB SRAM arrays while

varying the Vdd and Vth. The data are shown at room

temperature as well as at a high temperature of 125

◦C.Four types of SRAM having different beta ratios

and cell sizesare investigated. These are 0.56 μm2

(beta ratio: 1), 0.598 μm2(beta ratios: 1.5 and 1.67),

and 0.6292 μm2 (beta ratio: 2.17).Vth can be tuned

by changing the channel doping. The SRAMyield at

low-voltage operation improves by lowering the

thresholdvoltage to between 0.15 and 0.25 V. It

should be noted thata degradation in the yield is

found at high-Vdd operation for aVth = 0.15 V. In

high-Vdd region, the OFF-current of each

cellincreases and becomes comparable with the cell

current, thus causing a failure. On the other hand,

SRAM with a low betaratio significantly degrades

yield in low-Vdd operation. Fromthis, a low Vth is

disadvantageous from the cell-size viewpoint.This is

caused by degradation in SNM through the

narrowchanneleffect in the driver and the transfer

transistors. ForVth = 0.35 V case, beta = 1 depicts the

best yield. For a higherVth case (Vth = 0.45 V), yield

degrades because the thresholdvoltage is close to half

of the supply voltage, and the eyecollapses in low-

Vdd operation because of the fluctuation ofthe

threshold voltage.

From these considerations, by applyingVth

= 0.3-0.4 V, the minimum cell size is achievable

whilekeeping stable operation even at 0.7 V. At high

temperatures,the increase in the OFF-current

significantly affects the yield of Vth (0.15 and 0.25

V) wafers. This cannot be secured by increasing the

cell ratio. On the other hand, the yield forVth = 0.35

V wafer is stable. From these results, the lower limit

of Vdd for SRAM is 0.7 V, and the lower boundary

for threshold is around 0.3 V because of Ioff increase.

Vdd and Vth scaling will stop at these voltages.

Therefore, it is necessary

to form SRAM using L transistors that do not require

supply voltage scaling.

RESULTS

A. Read/Write Margin Metric Comparison

Read/Write Metric Correlations:

To investigate the correlations between the various

read stability metrics and between the various write

ability metrics, scatter plots are generated for WRRV

versus RSNM [Fig. 5], SRRV versus WRRV [Fig.

15], versus [Fig. 6], and BWTV versus [Fig. 6].

Each pair of metrics in the scatter plots is measured

for the same set of SRAM cells first at and then at to

expose low read stability and pairs [Fig. 6] are

measured from SRAM macros with externally

accessible storage nodes; SRRV-WRRV pairs [Fig.

15] and BWTV- pairs [Fig. 6] are measured from

functional SRAM arrays using direct bit-line access.

A 200 mVN-well bias is applied for the case of when

measuring the write ability in the SRAM macros—

using and [Fig. 6]; this is done to further reduce

write ability and expose cell failures by decreasing

the of the PMOS pull-up transistors. For ,each

measured metric sits comfortably above and a slight

d is persion is observed in the measured data of each

metric pair. This dispersion is generally smaller at

lower measured values and larger at higher measured

values. However, when the supply is dropped5 to 0.5

V and the SRAM cell is pushed to the edge of

stability, excellent agreement is established within

eachmetric pair, especially near the zero crossing (the

origin). This demonstrates that RSNM, SRRV, and

WRRV (BWTV, and) share the same point of

failure—zero crossing—and have excellent

agreement near failure, suggesting all metrics can be

used for SRAM failure estimation. Furthermore, this

Page 8: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

317

al so suggests that the value, especially taken at

highersupply voltages, is highly dependent on the

read/write metricused and is therefore unsuitable for

estimating yield.

B. Read Current Measurements

Fig. 4 shows the normal probability and DC

characteristics plot for measured at nominal voltage.

At the nominal supply of 1.1 V, has an early linear

dependence on the of the pass-gate transistor

operating in velocity saturation and the pull-down

transistor operating in the linear mode; therefore its

measurement exhibits good normality up to more

than [4]. As the supply is dropped to 0.7 V, some

pass-gate transistors are no longer velocity saturated

while some pull-down transistors also enter the

saturation mode (due to a rise in the ‘0’ storage node

andhigh transistor ), causing to no longer

linearlydepend on the of either pass-gate or pull-

down transistors.Thus, the lower tail of the

distribution at 0.7 V exhibits a significant right-

skewed deviation from a normal distribution [Fig.

19].

Fig.4. DC Characteristics

When the supply is further dropped to 0.5 V,

distribution exhibits even greater right-skewed

deviations from a normal distribution as transistors

enter the brink of weak inversion.

Fig.5. Transient response

Fig.6. Power taken on analysis

Fig.7. Energy calculated in the proposed SRAM

cell

C. Within-Cell Mismatch and Cell Orientation

Page 9: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

318

SRAM cells are typically mirrored both horizontally

and vertically to maximize the array density, yielding

4 different cell orientations. In this 4-cell cluster,

orientations A and D share the same layout, with

reversed storage nodes; likewise, orientations B and

C share the same layout, with reversed storage

nodes.8 summarizes the effect of within cell

mismatch and cell orientation on SRAM stability and

performance. The measurement results from two test

chips scattered across the same wafer are highlighted.

Since within-cell mismatch causes (at least) one side

of the SRAM cell to flip when is dropped or WL is

overdriven, the frequency of read disturbance as a

function of cell storage node and cell orientation is

plotted in Fig. 22plots the normalized of the

measured and as a function of cell storage node and

orientation. Measurement data reveals up to 4X

difference in read disturb frequency, 4% shift in the

measured , and 8% shift in the of measured when the

data polarity of the cell is changed. The shift is

consistent throughout the chip, suggesting a

systematic mismatch between the two halves of the

SRAM cell which may be attributed to a difference in

the direction of the notches in the NMOS active

region [24]–[26]. The direction of this systematic

mismatch can be the same or opposite between two

test chips taken from the same wafer, as highlighted

in Fig. 22.The measurement data also indicates that

the directions of the shifts in read disturb frequency, ,

and are correlated—i.e., a higher read disturb

frequency typically corresponds to a higher write

ability and . As the SRAM cells are mirrored across

the direction of the poly (from A/D to B/C), a slight

alteration in the degree of the within-cell mismatch is

observed, suggestive of a slight poly gate to active

source/drain misalignment [24], [25]. Since the

NMOS active region is continuous throughout the

array whereas the PMOS active region is segmented,

the channel width of the PMOS transistors is

expected to fluctuate more with poly gate

misalignment than NMOS transistors, due to corner

rounding effects. As the poly lines are shifted to the

right or left, both NMOS pull-down and pass-gate

transistors on the same storage side of the SRAM cell

experience either a common increase or a common

decrease in channel width, due to rounding at the

corners of the active region, depending on the cell

orientation. As long as the degree of poly gate

misalignment stays fairly uniform throughout the

SRAM array, the fluctuations in the cell-ratio (i.e.,

the strength ratio of pull-down to pass-gate

transistors)should be small. Therefore, the observed

alteration in the degree of within-cell mismatch

(between orientations A/D and orientations B/C) is

the smallest for the read disturb frequency. This

alteration is slightly greater for , which has a more

direct dependence on the pass-gate drive strength

than the pull-down drive strength; and it is most

pronounced in the cell write ability, which directly

depends on the strength of the PMOS pull-up

transistor.

CONCLUSION

In this paper, a CMOS SRAM-based fine-grain

dynamically reconfigurable architecture was

presented. The architecture uses low-power10T

SRAM as storage for the configuration bits. The

peripheral circuits of the 10T SRAM were simplified

to reduce power consumption. Efficient dynamic

reconfiguration was achieved by hiding the

reconfiguration delay with a shadow SRAM scheme.

Experimental results show that the area-delay product

is improved with competitive power consumption.

The architecture also allows various tradeoffs among

area, delay, and power consumption, providing good

design flexibility.

REFERENCE

[1] Achiranshu Garg, and Tony Tae-Hyoung Kim,

SRAM Array Structures for Energy Efficiency

Enhancement,” IEEE Transactions On Circuits And

Systems—Ii: Express Briefs, Vol. 60, No. 6, June

2013 pp 351-355.

[2] S. Cserveny, L. Sumanen, J. M. Masgonty, and C.

Piguet, “Locally switched and limited source-body

bias and other leakage reduction techniques for a

low-power embedded SRAM,” IEEE Trans. Circuits

Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 636–640,

Oct. 2005.

[3] B. H. Calhoun and A. Chandrakasan, “A 256kB

subthreshold SRAM using 65nm CMOS,” in Proc.

Int. Solid-State Circuits Conf., Feb. 2006, pp. 2592–

2601.

[4] B. H. Calhoun and A. P. Chandrakasan, “A 256-

kb 65-nm sub-threshold SRAM design for ultra-low-

voltage operation,” IEEE J. Solid-State Circuits, vol.

42, no. 3, pp. 680–688, Mar. 2007.

[5] M. Yamaoka, N. Maeda, Y. Shinozaki, Y.

Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and

T. Kawahara, “90-nm process-variation adaptive

embedded SRAM modules with power-line-floating

write technique,” IEEE J. Solid-State Circuits, vol.

41, no. 3, pp. 705–711, Mar. 2006.

Page 10: Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX

International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682

Vol.23, No.23, March 2014

319

[6] T. H. Kim, J. Liu, J. Keane, and C. H. Kim, “A

0.2 V, 480 kb subthreshold SRAM with 1 k cells per

bitline for ultra-low-voltage computing,” IEEE J.

Solid-State Circuits, vol. 43, no. 2, pp. 518–529, Feb.

2008.

[7] R. J. Evans and P. D. Franzon, “Energy

consumption modeling and optimization for

SRAM’s,” IEEE J. Solid-State Circuits, vol. 30, no.

5, pp. 571–579, May 1995.

[8] L. Chang, R. K. Montoye, Y. Nakamura, K. A.

Batson, R. J. Eickemeyer, R. H. Dennard,W.

Haensch, and D. Jamsek, “An 8T-SRAM for

variability tolerance and low-voltage operation in

high-performance caches,” IEEE J. Solid-State

Circuits, vol. 43, no. 4, pp. 956–963, Apr. 2008.

[9] V. Joshi, R. Kanj, and V. Ramadurai, “A novel

column-decoupled 8T cell for low-power differential

and domino-based SRAM design,” IEEE Trans. Very

Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp.

869–882, May 2011.

[10] B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M.

Minuth, R. Helfand, S. Pant, D. Blaauw, and T.

Austin, “A 2.60pJ/Inst subthreshold sensor processor

for optimal energy efficiency,” in VLSI Symp. Tech.

Dig., 2006, pp. 154–155.

[11] A. Wang and A. Chandrakasan, “A 180-mV

subthreshold FFT processor using a minimum energy

design methodology,” IEEE J. Solid-State Circuits,

vol. 40, no. 1, pp. 310–319, Jan. 2005.

[12] A. T. Do, J. Y. S. Low, J. Y. L. Low, Z. H.

Kong, X. Tan, and K. S. Yeo, “An 8T differential

SRAM with improved noise margin for bit-

interleaving in 65 nm CMOS,” IEEE Trans. Circuits

Syst. I, Reg. Papers, vol. 58, no. 6, pp. 1252–1263,

Jun. 2011.

[13] N. Verma and A. P. Chandrakasan, “A 65nm 8T

sub-Vt SRAM employing sense-amplifier

redundancy,” in Proc. IEEE ISSCC Dig. Tech.

Papers, Feb. 11–15, 2007, pp. 328–606.

[14] T. H. Kim, J. Liu, J. Keane, and C. H. Kim, “A

high-density subthreshold SRAM with data-

independent bitline leakage and virtual ground

replica scheme,” in Proc. IEEE ISSCC Dig. Tech.

Papers, Feb. 11–15, 2007, pp. 330–606.

X.Arun Bromio pursuing M.E in

Electronics and Communication

Engineering Karpagam

University,Coimbatore. B.E., in

Electronics and Communication

Engineering from Government College of

Technology,Coimbatore.Anna University, Chennai.

DECE in Electronics and Communication from

Christ the King Polytechnic College, Coimbatore.

Teaching Experience : Park College of Technology,

Coimbatore. from June’2012 to Till date.Area of

Interest includes VLSI (Low Power VLSI).

Rajaram A received the BE

degree in electronics

and communication Engineering

from the Govt.,

College of technology,

Coimbatore, Anna University,

Chennai, India, in 2006, the ME degree in

electronics and communication engineering (Applied

Electronics) from the Govt., college of Technology,

Anna University, Chennai, India, in 2008 and he

received the Ph.D. degree in electronics and

communication engineering from the Anna

University of Technology, Coimbatore, India in

March 2011. He is currently working as a Associate

Professor, ECE Department in Karpagam University,

Coimbatore, India. His research interests include

mobile adhoc networks, wireless communication

networks (WiFi, WiMax HighSlot GSM), novel

VLSI NOC Design approaches to address issues such

as low-power, cross-talk, hardware acceleration,

Design issues includes OFDM MIMO and noise

Suppression in MAI Systems, ASIC design, Control

systems, Fuzzy logic and Networks, AI, Sensor

Networks.