International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682 Vol.23, No.23, March 2014 310 Energy Efficient NANO Ram Cell for the Energy Efficient Operation X. ARUN BROMIO Dr. A.RAJARAM PG SCHOLAR ASSOCIATE PROFESSOR DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING KARPAGAM UNIVERSITY, COIMBATORE Abstract— We presented a hybrid CMOS/nanotechnology reconfigurable architecture, earlier. It was based on CMOS logic and nano RAMs. It used the concept of temporal logic folding and fine-grain (e.g., cycle-level) dynamic reconfiguration to increase logic density by an order of magnitude. This dynamic reconfiguration is done intra-circuit rather than inter-circuit. However, the previous design of NATURE required fine-grained distribution of nano RAMs throughout the field-programmable gate array (FPGA) architecture. Since the fabrication process of nano RAMs is not mature yet, this prevents immediate exploitation of NATURE. In this paper, we present a NATURE architecture that is based on CMOS logic and CMOS SRAMs that are used for on-chip dynamic reconfiguration. We use fast and low-power SRAM blocks that are based on 10T SRAM cells. We have also laid out the various FPGA components in a 65-nm technology to evaluate the FPGA performance. We hide the dynamic reconfiguration delay behind the computation delay through the use of shadow SRAM cells. Experimental results show more than an order of magnitude improvement in logic density and improvement in the area-delay product relative to a traditional baseline FPGA architecture that does not use the concept of logic folding. Index Terms—Field-programmable gate arrays (FPGAs), integrated circuits, logic folding, nanotechnology reconfigurable architecture (NATURE). INTRODUCTION With every new technology node, the share of leakage in total power consumption of cache and other SRAM-based memories considerably increases since technology scaling reduces dynamic power but increases leakage. Among major leakage components, subthreshold leakage is dominant in cache and other SRAM-based memories in nanometer technologies [1]. One naïve solution for reducing SRAM leakage is to increase transistor threshold voltage to exponentially reduce subthreshold leakage power, but this negatively affects SRAM access delay. Traditionally, access delays of all SRAM cells of a memory module on a chip have been the same, and hence, increasing would have resulted in almost all cells violating the original timing. Due to within-die variations in sub- 90 nm technologies, however, similar SRAM cells in a single cell array show different delays; delay of some of them is close to the delay specified for the entire cell array, while most of them are farther away. Consequently, leakage can be effectively reduced without delay penalty if reverse body bias (RBB) is applied to transistors of SRAM cells, and the resulting few delay-violating (slow) cells are replaced by redundancies; the cost is an eligible area, delay, and dynamic power penalty due to redundant row/columns. Process variation results in changes in circuit parameters(such as transistor gate length, threshold voltage, circuit delay ,and the like) during manufacturing such that the manufactured device may differ from the designed device in some features. The within-die variation refers to variations in circuit parameters within a single die, which means that similar circuits indifferent parts of the die may have different features. Die-to-dievariations, on the other hand, are the variations that happen from one die to the other, from wafer to wafer, and from wafer lot to wafer lot. Our focus is on uncorrelated random within-dievariation of SRAM cell delay. While die-to- die variations have been observed for several years, within-die variations have more recently started to intensify [2]. Empirical studies in [3]show that 3.54% random within-die delay variation is observed for a single logic element (approximately equivalent to a single SRAM cell) of 180-nm field-programmable
10
Embed
Energy Efficient NANO Ram Cell for the Energy ... - CiteSeerX
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
310
Energy Efficient NANO Ram Cell for the
Energy Efficient Operation
X. ARUN BROMIO Dr. A.RAJARAM
PG SCHOLAR ASSOCIATE PROFESSOR
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
KARPAGAM UNIVERSITY, COIMBATORE
Abstract— We presented a hybrid
CMOS/nanotechnology reconfigurable
architecture, earlier. It was based on CMOS logic
and nano RAMs. It used the concept of temporal
logic folding and fine-grain (e.g., cycle-level)
dynamic reconfiguration to increase logic density
by an order of magnitude. This dynamic
reconfiguration is done intra-circuit rather than
inter-circuit. However, the previous design of
NATURE required fine-grained distribution of
nano RAMs throughout the field-programmable
gate array (FPGA) architecture. Since the
fabrication process of nano RAMs is not mature
yet, this prevents immediate exploitation of
NATURE. In this paper, we present a NATURE
architecture that is based on CMOS logic and
CMOS SRAMs that are used for on-chip dynamic
reconfiguration. We use fast and low-power
SRAM blocks that are based on 10T SRAM cells.
We have also laid out the various FPGA
components in a 65-nm technology to evaluate the
FPGA performance. We hide the dynamic
reconfiguration delay behind the computation
delay through the use of shadow SRAM cells.
Experimental results show more than an order of
magnitude improvement in logic density and
improvement in the area-delay product relative to
a traditional baseline FPGA architecture that does
not use the concept of logic folding.
Index Terms—Field-programmable gate arrays
(FPGAs), integrated circuits, logic folding,
nanotechnology reconfigurable architecture
(NATURE).
INTRODUCTION
With every new technology node, the share of
leakage in total power consumption of cache and
other SRAM-based memories considerably increases
since technology scaling reduces dynamic power but
increases leakage. Among major leakage
components, subthreshold leakage is dominant in
cache and other SRAM-based memories in
nanometer technologies [1]. One naïve solution for
reducing SRAM leakage is to increase transistor
threshold voltage to exponentially reduce
subthreshold leakage power, but this negatively
affects SRAM access delay. Traditionally, access
delays of all SRAM cells of a memory module on a
chip have been the same, and hence, increasing
would have resulted in almost all cells violating the
original timing. Due to within-die variations in sub-
90 nm technologies, however, similar SRAM cells in
a single cell array show different delays; delay of
some of them is close to the delay specified for the
entire cell array, while most of them are farther
away. Consequently, leakage can be effectively
reduced without delay penalty if reverse body bias
(RBB) is applied to transistors of SRAM cells, and
the resulting few delay-violating (slow) cells are
replaced by redundancies; the cost is an eligible area,
delay, and dynamic power penalty due to redundant
row/columns. Process variation results in changes in
circuit parameters(such as transistor gate length,
threshold voltage, circuit delay ,and the like) during
manufacturing such that the manufactured device
may differ from the designed device in some features.
The within-die variation refers to variations in circuit
parameters within a single die, which means that
similar circuits indifferent parts of the die may have
different features. Die-to-dievariations, on the other
hand, are the variations that happen from one die to
the other, from wafer to wafer, and from wafer lot to
wafer lot. Our focus is on uncorrelated random
within-dievariation of SRAM cell delay. While die-to-
die variations have been observed for several years,
within-die variations have more recently started to
intensify [2]. Empirical studies in [3]show that 3.54%
random within-die delay variation is observed for a
single logic element (approximately equivalent to a
single SRAM cell) of 180-nm field-programmable
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
311
gate array (FPGAs).Furthermore, this within-die
variation is predicted to rise when further
approaching atomic sizes with every new technology
node [2], [3]. Within-die variations are commonly
modeled by Gaussian distribution [4], which also
well matches empirical results in [3].With the
increasing within-die variation in current and future
technology nodes, this delay distribution widens (the
standard deviation increases). Consequently, in order
to ensure that only a few SRAM cells violate the
timing, the delay for the entire cell array has to be set
farther away from the nominal cell delay. In other
words, the cell array delay is set near the tail end of
the delay distribution diagram (say at, assuming and
are, respectively, mean and standard deviation of
cells delay distribution), and hence, most cells show
substantially less delay than target delay.
Row/column redundancy has long been used to repair
faults in RAM memories [5] and has more recently
become in evitable in high-density SRAM memories
to obtain acceptable manufacturing yield [6], [7], but
we use the redundancy differently: instead of
replacing unpredictable manufacturing faults, we
replace expected delay-violating cells caused by the
RBB intentionally chosen at design time. In cache
memories, several previous work addresses
improving timing yield in the presence of process
variation by proposing process-tolerant cache
architectures[4], [8] and code-placement compiler
techniques [9], but they actually reduce the useful
capacity of the cache by marking and avoiding using
too-slow cache lines; furthermore, they do not
concern power reduction. Authors in [10] use
variable latency access to different parts of cache,
and also turn off to slow as well as too leaky parts of
the cache; the latter reduces power, but on the
contrary, we do not turn off leaky parts to save
power: we use higher by applying RBB, and then
replace slow parts. Meng and Joseph [11] consider
the within-die variation in leakage of different cache
ways when turning off un usedcache ways, as
proposed in [12]. Other highly cited work exists to
reduce cache static power [12]–[14], but they do not
consider process variation. None of the
aforementioned techniques improves leakage of the
portions of the cache that are being used. In this
paper, we propose an optimization technique for
SRAM-based memory design that is applied at design
time of the SRAM array, and reduces leakage power
by RBB at the cost of extra area for redundant
rows/columns of SRAM. We:1) keep supply voltage
untouched (to avoid its quadratic impact on dynamic
power) and 2) based on known within-die delay
distribution, and given yield and speed of the
memory module, we choose number of redundant
rows/columns such that the leakage is minimized (by
RBB) without reducing useful memory size. A
negligible delay overhead is imposed due to the
programmability needed in row/column decoders to
remaps low rows/columns to spare ones. Also, longer
word lines orbit lines (corresponding to column or
row redundancies, respectively)result in a marginal
higher dynamic energy per access. Results of our
SRAM design on a commercial 90-nm process shows
that less than 4% redundancy (ten columns added to
a256 256 memory) reduces the leakage by 40% while
dynamic power, delay, and area increase by only
1.3%, 0.6%, and 3.4%respectively; timing yield and
memory capacity are kept intact. Parts of this archival
work are published in our previous work[15].
Fig.1. Conventional SRAM Cell Layout
Fig. 3 shows a SEM photograph and the
expected layout of a 0.614- m SRAM cell using a
conventional layout for the poly-Si and diffusion
layers .Generally; we call the ratio of the driver
transistor channel width to an access transistor the
“ratio” of the layout. A conventional SRAM cell has
a hammer-head of poly-Si layer and a convex
diffusion layer shape.
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
312
Fig.2. Conventional SRAM module
There are two reasons for this. The first is to
maintain the poly-Si-contact extension margin under
the contact layer. In this way, the poly-Si layer under
the contact layer is wider than that of the channel
area [4], [5]. The second is to maintain a sufficient
ratio. In the conventional SRAM cell layout, the ratio
is around 1.2–1.5 to obtain a sufficient static noise
margin (SNM).Fig. 3 shows the reverse narrow width
effect of SRAM NMOS transistors in 180-nm
technology. In the conventional SRAM layout, the
Vth of the access transistor is lower than that of the
driver transistor because of this reverse narrow
channel effect.
Fig. 2 shows the definition of the electric
ratio. At low Vdd, the on-current of the access
transistor is larger than that of the driver transistors
due to Vdd Vth being larger than Vdd Vth.
Therefore, the electric ratio is less than 1.0at low
Vdd. To obtain a large electric ratio using the
conventionally out, we would have to adopt a wider
driver transistor channel width, which would result in
a larger cell size. Fig. 2 shows the occurrence of
asymmetric access transistor sizes in case of
misalignments using the conventional SRAM cell
layout. In our 180-nm technology, the poly-Si layer is
aligned to the diffusion layer and the contact layer is
aligned to the poly-Si layer. This figure shows the
poly-Si layer mis aligned to the diffusion layer. In the
misalignment in the X direction, one access
transistor’s is shorter than the other’s. In the
misalignment in the Y direction, one access
transistor’s is wider than the other’s. Therefore, when
manufacturing misalignment occurs, the access
transistor’s electrical characteristics are asymmetric.
In 180-nm technology with a large local Vth
variability, the asymmetric access transistor’s
electrical characteristics have a significant impact on
the stable operation of the SRAM. Furthermore, from
the DFM point of view, the complicated transistor
shape of the conventional layout enlarges the global
Vth variability and causes the asymmetric transistor’
electrical characteristics.
B. Straight Diffusion and Poly-Si Layers SRAM
Cell Layout
Fig. 2 shows a conventional SRAM module and the
expected layout ofa 0.494- m SRAM cell using
straight diffusion and poly-Silayers. We will call this
0.494- m SRAM cell using straight diffusion and
poly-Si layers “the straight cell” below and the0.614-
m SRAM cell shown in Fig. 3 “the conventional cell.
The sizes of each SRAM transistor are summarized
in Table I.In this cell layout, the diffusion and poly-Si
layers are straighter than the conventional layout [7],
[17]. Because the straight cell layout has a simple and
lithography-friendly transistor shape, the global Vth
variability of this cell is relatively small. The driver
transistor’s channel width is equal to the access
transistor’s channel width. Therefore, the ratio is
equal to 1.0. The electric ratio is also equal to 1.0
[17]. In this layout, a wider access transistor channel
width can be used. Therefore, a large access
transistor’s on-current can be expected. Using
straight diffusion and poly-Si layers, we can realize
symmetrical electric characteristics even if a photo-
mask misalignment occurs. Fig. 2 shows the SNM
distribution of the conventional and straight cells. T
he SNM of the straight cell at high Vdd is smaller
than that of the conventional cell (see Fig. 2).
However, the (median) is 6.7 for the conventional
cell, while the is7.4 for the straight cell. The large
means the tolerance to the variability of the SNM is
large. Therefore, the straight diffusion and poly-Si
layers SRAM cell layout is variability-tolerant. This
is because the straight cell uses a simple layout and a
larger access transistor that has a smaller local Vth
variability sigma. Fig. 3 show the Vdd dependence of
the SNM of conventional and straight cells.
Fig.3. Proposed SRAM structure
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
313
The SNM of the conventional cell without a read
assist circuit (R-AC, see Section III) increases
continuously as Vdd increases. On the other hand, the
SNM of the straight cell without R-AC peaks at
around 1.2 V. At low Vdd, the straight cell’s SNM
without R-AC is larger than that of the conventional
cell without R-AC, because of its larger electric ratio.
However, at high Vdd, the straight cell’s SNM
without R-AC is smaller than that of the conventional
cell without R-AC because the electric ratio is
smaller.
Classic Techniques for SRAM Leakage Reduction
Traditional techniques do not concern within-die
variations mainly because such variations have not
been a major issue at the time. Most of these
techniques rely on turning off the SRAM cells or
putting them into a low-power mode when not
accessed. Cache memories are among the most
widely used SRAM based memories. Turning off
unused parts of the cache [12],[13], [17], or putting
them in a low-energy “drowsy” mode using two
different supply voltages [14] are well-known
techniques to reduce cache leakage, but they target
unused parts (i.e., parts not accessed for a long time)
of the cache and cannot handle the cache parts being
accessed. In sleep mode, source biasing by a virtual
ground [18] and also RBB [19] can effectively reduce
leakage when processor is idle, but they are not used
in the active mode since they increase cells delay and
result in lower performance and/or reduced timing
yield. We also use RBB to reduce leakage, and
furthermore, we apply it in the active mode in
addition to sleep mode; however, due to within-die
delay distribution, only a few of the SRAM cells
actually violate the timing and are then replaced by
redundancies. Dual and dual values are used in [20]
to reduce leakage by setting appropriate and for each
individual SRAM cell based on its closeness to the
sense amplifier. We take advantage of finer control
possibly by RBB and apply it uniformly to all cells in
the cell array.
LARGE-SCALE SRAM DESIGN METRICS
The major drawback of the conventional DC
read/write margin metrics presented in the previous
section is the in ability to measure them in dense
functional SRAM arrays because of the metal spacing
constraints for routing out internal storage nodes and
the significant area overhead associated with the
switch array. This results in an insufficient number of
data points for failure analysis of large cache
memories. To increase the sample size, the SRAM
array must stay intact; in this case, SRAM read
stability and write ability must be characterized by
accessing only the bit-lines, the word-line, and the
cell supply voltages. As an example, bit-line access
has been previously applied to detect and isolate
faulty SRAM cells in memory arrays [12]. Similarly,
large-scale performance of the SRAM cells has been
characterized through distributions of per-cell
minimum operating voltages [5], [6]. In this section,
we introduce a method for characterizing the SRAM
cell read stability and write ability in functional
SRAM arrays by taking advantage of direct bit-line
measurements while adjusting bit-line, word-line, and
cell supply voltages. Furthermore, we show how
direct bit-line measurements can be adopted to
characterize the
SRAM during standby, read, and write cycles.
A. Read Stability Measurement
1) Supply Read Retention Voltage (SRRV):
During the read cycle, both bit-lines float around
while the word-line is driven high, and the cell state
is retained by keeping the cell supply sufficiently
high. The SRAM read stability in functional SRAM
arrays can be gauged by the lowest cell supply
voltage for data retention during a read cycle,
denoted as the supply read retention voltage (SRRV)
[3]. Fig. 2 shows the measurement setup for SRAM
along with the analysis going to be taken for the
system. To capture the SRAM read stability, both BL
and BLC are pre charged and WL is driven by the
operating voltage to emulate a read cycle. The BL
current at the ‘0’ storage node is monitored while
ramping down the SRAM cell supply. When the cell
supply is dropped sufficiently low, the SRAM cell
loses its ability for data retention when dominates so
that CL, originally holding ‘0’, rises above the trip
point of inverter. The measured transfer curves, of
versus, are plotted in Fig. 2. The difference between
and the value of causing to suddenly drop represents
the SRRV of the SRAM cell. When the SRAM cell
is biased for a nominal read operation with WL,
BL,BLC and all biased at . Indicates that can be
dropped below without disturbing the data.
Therefore, SRRV represents the maximum tolerable
DC noise voltage at the cell supply before causing a
read upset. Intrinsic mismatch of transistors within an
SRAM cell typically results in an asymmetry in the
cell robustness to read upset between storing a ‘1’
(when CL holds ‘0’) and storing a ‘0’(when CH holds
‘0’). Depending on the degree of asymmetry in the
SRAM cell, a data disturbance, in the form of a bit
flip, either occurs on both sides of the SRAM cell or
only on the less read-stable side as the cell supply is
dropped. Fig. 3 and shows the SRRV transfer curves
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
314
for SRAM cells with worse read stability when CL
holds ‘0’—i.e., lower cell -ratio (defined as the
strength ratio of pull-down to pass-gate transistors)at
the CL node. Fig. 3(a) shows that when ‘0’ is stored
at the less read-stable CL node, all curves exhibit a
sharp fall-off in the BL current , suggesting a clear
SRAM cell data disturbance in the form of a bit flip.
However, when ‘0’ is stored at the more read-stable
CH node, only some of the curves exhibit a sharp
fall-off in the BLC current , corresponding to SRAM
cells with a smaller asymmetry between the two
halves, while other curves show a smooth bending in
the measured [Fig. 3], corresponding to SRAM cells
with a larger asymmetry between the two halves. In
the latter case, due to a heavily skewed read stability
favoring the node CH, a clear data disturbance, in the
form of a bit flip, does not occur when the cell supply
is dropped beyond data retention and the SRAM cell
enters a meta stable state. Fig. 3 shows similar SRRV
transfer curves for SRAM cells with worse read
stability when CH holds ‘0’—i.e., lower cell -ratio at
the CH node. To gauge SRAM read stability, the
SRRV value extracted from the less read-stable
storage node is used.
2) Word-Line Read Retention Voltage (WRRV):
When the word-line is driven high during a
read/write cycle, both the SRAM cell under direct
read access and all un accessed SRAM cells driven
by the asserted word-line undergo a read stress. This
read stress can be exacerbated by boosting the word-
line voltage beyond. Therefore, the read stability of
an SRAM cell can also be gauged by the largest
word-line boost without upsetting cell data retention,
denoted as the word-line read retention voltage
(WRRV). Fig. 4 shows the measurement setup for
WRRV. To capture the WRRV of the SRAM cell, the
SRAM cell supply is biased at with both BL and
BLC precharged. WL voltage is ramped above, and
kept below the gate-oxide breakdown voltage set by
the technology, while the BL current at the ‘0’
storage node is monitored.
When the WL voltage is boosted sufficiently
high above, the SRAM cell state is disturbed due to
an exacter bated read stress as dominates and pulls
above the trip point of inverter. The cell disturbance
is captured as a sudden drop in the measured current.
The measured transfer curves, of versus WL voltage,
are plotted in Fig. 4. The WRRV of an SRAM cell is
quantified as the difference between the WL voltage
causing to suddenly drop and. Similar to SRRV,
when, the SRAM cell is biased for a nominal read
operation with WL,BL, BLC and all biased at .
Indicates that can be boosted above without
disturbing the data. Therefore, WRRV represents the
maximum tolerable DC voltage rise on the WL
before causing a read upset. When the read stability
of the SRAM cell becomes heavily skewed towards
either storage node CH or CL, data stored in the more
robust node will be preserved even under very high
WL boost, and the measured current never drops
significantly. Fig. 5 plots the WRRV transfer
curvesfor SRAM cells with worse read stability when
CL holds‘0’—i.e., lower cell -ratio at CL node. Fig.
5(a) shows that when ‘0’ is stored at the less read-
stable CL node, all curves exhibit a sharp drop in ,
suggesting a clear SRAM cell data disturbance.
However, when ‘0’ is stored at the more read-stable
CH node, only some curves exhibit a sharp drop in
corresponding to SRAM cells with a smaller
asymmetry between the two halves, while other
curves show either continued increase or a slight
bending in the measured [Fig. 5], corresponding to
SRAM cells with a to a heavily skewed read stability
favoring the node CH, the cell state is not disturbed
by the overdriven WL. Fig. 5 and plots similar
WRRV transfer curves for SRAM cells with worse
read stability when CH holds ‘0’—i.e., lower cell -
ratio at CH node. To gauge SRAM read stability, the
WRRV value extracted from the less read-stable
storage node is used.
1) Bit-Line Write Trip Voltage (BWTV):
During the write cycle, bit-lines are configured
according to the new data and the word-line is driven
high. The write ability of an SRAM cell in a
functional SRAM array can be gauged by the
maximum bit-line voltage, at the ‘1’ storage node,
able to flip the cell state during a write cycle [3], [4],
[13], [14], denoted as the bit-line write trip voltage
(BWTV). Fig. 6 shows the measurement setup for
BWTV. To capture the BWTV of an SRAM cell, the
cell supply, WL, and BL (at the ‘0’ storage node) are
biased at. BL current at the ‘0’ storage node is
monitored while ramping down the BLC voltage. As
BLC is ramped low, the pass-gate overcomes and the
‘1’ storage is dropped below the inverter trip point,
resulting in a successful write, signified by a sudden
drop in. Fig. 6 plots the measured transfer curves of
versus BLC voltage. The BWTV is quantified as the
BLC voltage that induces a sudden change in. When,
the SRAM cell is biased for a nominal write
operation with WL, BL(or BLC), and biased at and
BLC (or BL) biased at. Indicates that a successful
write operation can take place even with a BLC (or
BL) voltage higher than. Therefore, BWTV
represents the maximum bit-line voltage slack that
can be tolerated to successfully write the cell since
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
315
the measurement setup for BWTV requires first
exerting a read stress to the SRAM cell under test
(CUT), a read disturb may occur in the CUT before
the BWTV can be observed when testing at lower
supply voltages. Due to intrinsic mismatch of
transistors within an SRAM cell, a read disturbance
at a high enough supply voltage typically happens to
only one side of the CUT while a read disturbance at
the other side either may happen at a lower supply
voltage or not at all(Fig. 3). In this case, BWTV can
still be characterized for the more read-stable side of
the CUT, which typically corresponds to the less
writeable side. Fig. 7 plots the measured transfer
curves with for SRAM cells undergoing read
disturbance. SRAM cells undergoing read
disturbance on both sides were not identified with as
low as 0.4Vfor the particular chip tested. Over 10
variation in the measured BL current before the
stored data of the CUT flips was observed, indicating
transistor operation in the onset of weak-inversion.
Further reduction of the supply voltage may result in
SRAM cells undergoing read disturbance on both
sides, in which case the BWTV can no longer be
characterized for either side.
2) Word-Line Write Trip Voltage:
The write ability of an SRAM cell can also be
captured by first configuring the bit-lines to write the
data and then ramping up the word-line[3], [14]. The
minimum word-line voltage able to flip the cell state
during a write cycle, denoted as the word-line write
trip voltage (4), can be used to gauge the SRAM
write ability. Fig. 8 presents the measurement setup
for resistances. The cell supply and BL are biased at
while BLC is biased at. As is ramped high, the
measured current initially resembles the curve of the
pass-gate. Fig. 8 plots the measured transfer curves of
versus WL voltage. The is quantified as the value,
where is the minimum WL voltage causing the
sudden drop in. Similar to the SRAM cell is biased
for a nominal write operation with WL, BL (or BLC),
and biased at and BLC (or BL) biased at. Therefore,
represents the maximum WL voltage slack that can
be tolerated to successfully write the cell. The most
notable advantage of the measurement is that, unlike
during the BWTV measurement, the SRAM CUT is
not put under read stress at the onset of the
measurement (With). Therefore, can continue to be
characterized for SRAM cells under aggressively
scaled supply voltages.
C. Characterization
In addition to read stability and write ability
characterization, the direct bit-line access scheme can
be adopted to characterize the minimum DC
operating voltage of each SRAM cell during standby,
read, and write cycles. Fig. 6 shows the flow-chart
diagram for measuring SRAM during astatic read
operation. Each iteration of this measurement
startswith a data initialization under the nominal
supply voltage—1.1 V in this technology. The
SRAM cell is then configured for a low voltage read
operation with all set to a lower supply voltage,
which is gradually reduced for each miteration of the
measurement process.Finally, the supply voltage is
raised for a read operation at and the BL current at
the ‘0’-initialized storage node is measured. The
measured current should be high—equal to —if no
data disturbance took place during the low voltage
read. Therefore, can be characterized as the
maximum operation voltage before drops [Fig. 5].
SRAM during standby can be characterized using the
same procedure as in Fig. 9(a) by keeping at during
each low voltage read operation, to emulate a low
voltage hold operation.
To eliminate accidental data disturbance,
WL is turned off between low voltage and high
voltage operations. Furthermore, is reduced by 200
mV during the high voltage read operation.
Fig. 5 shows the flow chart diagram for measuring
SRAM during a static write operation. The procedure
is very similar to that for the read characterization—
but instead of a low voltage read operation, each
iteration performs a low voltage write operation
where , at the ‘0’-initializedstorage node, and are set
to a lower supply voltage and at the ‘1’-initialized
storage node is set at . Each low voltage write is
immediately followed by a high voltage read where
the BL current at the ‘0’-initialized storage node is
measured. The measured current should be low if
data is successfully written during the low voltage
write. Can be characterized as the maximum
operation voltage while remains low. Note that
characterization using direct bit-line measurements is
slower than the typical on-chip digital SRAM tester,
using similar read-after read and read-after-write
sequences described above, because of the need to
monitor the BL current. However, since the direct
bit-line characterization can be performed alongside
the large-scale read stability and write ability
measurements with no additional hardware overhead,
it is used, in this work, to establish correlations
between cell failure and the cell read and write
characteristics.
Vdd AND Vth SCALING FOR SRAM
A static RAM with six transistors, making a flip-flop
circuit with bistable states is widely used. The
bistability of the SRAM cell can be observed using
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
316
its eye property. In order to hold data, the static noise
margin (SNM), defined by the size of the eye, and
should be kept large. The specification of SNM is
such that are liable eye property is maintained despite
the process fluctuations, variations in the operating
conditions such as temperature and voltage, and bit-
line noise. Since SNM becomes small with the
reduction of the supply voltage, it becomes weaker
against the threshold-voltage variation. In order to
obtain high SNM, higher threshold voltage and high
beta ratio are beneficial, as shown in Fig. 7. By
increasing the beta ratio, the slope becomes steeper
and the eye becomes larger. This increase of beta
ratioresults in an area increase. Higher threshold
voltage makes theeye larger, though it must be kept
lower than half of Vdd. IfVth becomes much larger
than half of Vdd, the eyes disappearand SRAM does
not work properly. Moreover, in the case of amemory
array (for example, 512 cells connected together on a
single bit line), the OFF-state current and the gate-
leakage current of the transfer gate will appear from
each bit in a bitline despite the word line being off.
When the integral value of this OFF-state current and
the gate-leakage current becomes comparable to the
cell current, which is supposed to be turned on by the
word line, the reading operation will fail. Therefore
both small leakage of the transfer gate and large cell
current are required. A longer gate length for transfer
transistor and a wide width for driver are stable but
result in a reduced density. In this study, a high-
density SRAM with a cell size of0.56 μm2 developed
for 65-nm generation system LSIs [11] is
investigated. Symmetry cells have simple straight
patterns of active region and gate to suppress the
variation of the critical dimension. Figs. 8 and 9
indicate the yield of 2-MB SRAM arrays while
varying the Vdd and Vth. The data are shown at room
temperature as well as at a high temperature of 125
◦C.Four types of SRAM having different beta ratios
and cell sizesare investigated. These are 0.56 μm2
(beta ratio: 1), 0.598 μm2(beta ratios: 1.5 and 1.67),
and 0.6292 μm2 (beta ratio: 2.17).Vth can be tuned
by changing the channel doping. The SRAMyield at
low-voltage operation improves by lowering the
thresholdvoltage to between 0.15 and 0.25 V. It
should be noted thata degradation in the yield is
found at high-Vdd operation for aVth = 0.15 V. In
high-Vdd region, the OFF-current of each
cellincreases and becomes comparable with the cell
current, thus causing a failure. On the other hand,
SRAM with a low betaratio significantly degrades
yield in low-Vdd operation. Fromthis, a low Vth is
disadvantageous from the cell-size viewpoint.This is
caused by degradation in SNM through the
narrowchanneleffect in the driver and the transfer
transistors. ForVth = 0.35 V case, beta = 1 depicts the
best yield. For a higherVth case (Vth = 0.45 V), yield
degrades because the thresholdvoltage is close to half
of the supply voltage, and the eyecollapses in low-
Vdd operation because of the fluctuation ofthe
threshold voltage.
From these considerations, by applyingVth
= 0.3-0.4 V, the minimum cell size is achievable
whilekeeping stable operation even at 0.7 V. At high
temperatures,the increase in the OFF-current
significantly affects the yield of Vth (0.15 and 0.25
V) wafers. This cannot be secured by increasing the
cell ratio. On the other hand, the yield forVth = 0.35
V wafer is stable. From these results, the lower limit
of Vdd for SRAM is 0.7 V, and the lower boundary
for threshold is around 0.3 V because of Ioff increase.
Vdd and Vth scaling will stop at these voltages.
Therefore, it is necessary
to form SRAM using L transistors that do not require
supply voltage scaling.
RESULTS
A. Read/Write Margin Metric Comparison
Read/Write Metric Correlations:
To investigate the correlations between the various
read stability metrics and between the various write
ability metrics, scatter plots are generated for WRRV
versus RSNM [Fig. 5], SRRV versus WRRV [Fig.
15], versus [Fig. 6], and BWTV versus [Fig. 6].
Each pair of metrics in the scatter plots is measured
for the same set of SRAM cells first at and then at to
expose low read stability and pairs [Fig. 6] are
measured from SRAM macros with externally
accessible storage nodes; SRRV-WRRV pairs [Fig.
15] and BWTV- pairs [Fig. 6] are measured from
functional SRAM arrays using direct bit-line access.
A 200 mVN-well bias is applied for the case of when
measuring the write ability in the SRAM macros—
using and [Fig. 6]; this is done to further reduce
write ability and expose cell failures by decreasing
the of the PMOS pull-up transistors. For ,each
measured metric sits comfortably above and a slight
d is persion is observed in the measured data of each
metric pair. This dispersion is generally smaller at
lower measured values and larger at higher measured
values. However, when the supply is dropped5 to 0.5
V and the SRAM cell is pushed to the edge of
stability, excellent agreement is established within
eachmetric pair, especially near the zero crossing (the
origin). This demonstrates that RSNM, SRRV, and
WRRV (BWTV, and) share the same point of
failure—zero crossing—and have excellent
agreement near failure, suggesting all metrics can be
used for SRAM failure estimation. Furthermore, this
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
317
al so suggests that the value, especially taken at
highersupply voltages, is highly dependent on the
read/write metricused and is therefore unsuitable for
estimating yield.
B. Read Current Measurements
Fig. 4 shows the normal probability and DC
characteristics plot for measured at nominal voltage.
At the nominal supply of 1.1 V, has an early linear
dependence on the of the pass-gate transistor
operating in velocity saturation and the pull-down
transistor operating in the linear mode; therefore its
measurement exhibits good normality up to more
than [4]. As the supply is dropped to 0.7 V, some
pass-gate transistors are no longer velocity saturated
while some pull-down transistors also enter the
saturation mode (due to a rise in the ‘0’ storage node
andhigh transistor ), causing to no longer
linearlydepend on the of either pass-gate or pull-
down transistors.Thus, the lower tail of the
distribution at 0.7 V exhibits a significant right-
skewed deviation from a normal distribution [Fig.
19].
Fig.4. DC Characteristics
When the supply is further dropped to 0.5 V,
distribution exhibits even greater right-skewed
deviations from a normal distribution as transistors
enter the brink of weak inversion.
Fig.5. Transient response
Fig.6. Power taken on analysis
Fig.7. Energy calculated in the proposed SRAM
cell
C. Within-Cell Mismatch and Cell Orientation
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
318
SRAM cells are typically mirrored both horizontally
and vertically to maximize the array density, yielding
4 different cell orientations. In this 4-cell cluster,
orientations A and D share the same layout, with
reversed storage nodes; likewise, orientations B and
C share the same layout, with reversed storage
nodes.8 summarizes the effect of within cell
mismatch and cell orientation on SRAM stability and
performance. The measurement results from two test
chips scattered across the same wafer are highlighted.
Since within-cell mismatch causes (at least) one side
of the SRAM cell to flip when is dropped or WL is
overdriven, the frequency of read disturbance as a
function of cell storage node and cell orientation is
plotted in Fig. 22plots the normalized of the
measured and as a function of cell storage node and
orientation. Measurement data reveals up to 4X
difference in read disturb frequency, 4% shift in the
measured , and 8% shift in the of measured when the
data polarity of the cell is changed. The shift is
consistent throughout the chip, suggesting a
systematic mismatch between the two halves of the
SRAM cell which may be attributed to a difference in
the direction of the notches in the NMOS active
region [24]–[26]. The direction of this systematic
mismatch can be the same or opposite between two
test chips taken from the same wafer, as highlighted
in Fig. 22.The measurement data also indicates that
the directions of the shifts in read disturb frequency, ,
and are correlated—i.e., a higher read disturb
frequency typically corresponds to a higher write
ability and . As the SRAM cells are mirrored across
the direction of the poly (from A/D to B/C), a slight
alteration in the degree of the within-cell mismatch is
observed, suggestive of a slight poly gate to active
source/drain misalignment [24], [25]. Since the
NMOS active region is continuous throughout the
array whereas the PMOS active region is segmented,
the channel width of the PMOS transistors is
expected to fluctuate more with poly gate
misalignment than NMOS transistors, due to corner
rounding effects. As the poly lines are shifted to the
right or left, both NMOS pull-down and pass-gate
transistors on the same storage side of the SRAM cell
experience either a common increase or a common
decrease in channel width, due to rounding at the
corners of the active region, depending on the cell
orientation. As long as the degree of poly gate
misalignment stays fairly uniform throughout the
SRAM array, the fluctuations in the cell-ratio (i.e.,
the strength ratio of pull-down to pass-gate
transistors)should be small. Therefore, the observed
alteration in the degree of within-cell mismatch
(between orientations A/D and orientations B/C) is
the smallest for the read disturb frequency. This
alteration is slightly greater for , which has a more
direct dependence on the pass-gate drive strength
than the pull-down drive strength; and it is most
pronounced in the cell write ability, which directly
depends on the strength of the PMOS pull-up
transistor.
CONCLUSION
In this paper, a CMOS SRAM-based fine-grain
dynamically reconfigurable architecture was
presented. The architecture uses low-power10T
SRAM as storage for the configuration bits. The
peripheral circuits of the 10T SRAM were simplified
to reduce power consumption. Efficient dynamic
reconfiguration was achieved by hiding the
reconfiguration delay with a shadow SRAM scheme.
Experimental results show that the area-delay product
is improved with competitive power consumption.
The architecture also allows various tradeoffs among
area, delay, and power consumption, providing good
design flexibility.
REFERENCE
[1] Achiranshu Garg, and Tony Tae-Hyoung Kim,
SRAM Array Structures for Energy Efficiency
Enhancement,” IEEE Transactions On Circuits And
Systems—Ii: Express Briefs, Vol. 60, No. 6, June
2013 pp 351-355.
[2] S. Cserveny, L. Sumanen, J. M. Masgonty, and C.
Piguet, “Locally switched and limited source-body
bias and other leakage reduction techniques for a
low-power embedded SRAM,” IEEE Trans. Circuits
Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 636–640,
Oct. 2005.
[3] B. H. Calhoun and A. Chandrakasan, “A 256kB
subthreshold SRAM using 65nm CMOS,” in Proc.
Int. Solid-State Circuits Conf., Feb. 2006, pp. 2592–
2601.
[4] B. H. Calhoun and A. P. Chandrakasan, “A 256-
kb 65-nm sub-threshold SRAM design for ultra-low-
voltage operation,” IEEE J. Solid-State Circuits, vol.
42, no. 3, pp. 680–688, Mar. 2007.
[5] M. Yamaoka, N. Maeda, Y. Shinozaki, Y.
Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and
T. Kawahara, “90-nm process-variation adaptive
embedded SRAM modules with power-line-floating
write technique,” IEEE J. Solid-State Circuits, vol.
41, no. 3, pp. 705–711, Mar. 2006.
International Journal of Advanced Information Science and Technology (IJAIST) ISSN: 2319:2682
Vol.23, No.23, March 2014
319
[6] T. H. Kim, J. Liu, J. Keane, and C. H. Kim, “A
0.2 V, 480 kb subthreshold SRAM with 1 k cells per
bitline for ultra-low-voltage computing,” IEEE J.
Solid-State Circuits, vol. 43, no. 2, pp. 518–529, Feb.
2008.
[7] R. J. Evans and P. D. Franzon, “Energy
consumption modeling and optimization for
SRAM’s,” IEEE J. Solid-State Circuits, vol. 30, no.
5, pp. 571–579, May 1995.
[8] L. Chang, R. K. Montoye, Y. Nakamura, K. A.
Batson, R. J. Eickemeyer, R. H. Dennard,W.
Haensch, and D. Jamsek, “An 8T-SRAM for
variability tolerance and low-voltage operation in
high-performance caches,” IEEE J. Solid-State
Circuits, vol. 43, no. 4, pp. 956–963, Apr. 2008.
[9] V. Joshi, R. Kanj, and V. Ramadurai, “A novel
column-decoupled 8T cell for low-power differential
and domino-based SRAM design,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp.
869–882, May 2011.
[10] B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M.