Top Banner
A read-decoupled gated-ground SRAM architecture for low-power embedded memories Wasim Hussain a,n , Shah M. Jahinuzzaman b a Electrical and Computer Engineering, Concordia University, 1515 St. Catherine Street West, Montreal, QC, Canada H3G 2W1 b Logic Technology Development Q&R, Intel Corporation, Hillsboro, OR 97124, USA article info Available online 6 December 2011 Keywords: SRAM bit-cell Read noise margin Leakage power Soft error abstract In this work, a gated ground SRAM architecture based on a seven transistor (7T) bit-cell is proposed. The proposed cell shows higher data stability and yield under varying process, voltage, and temperature (PVT) conditions than the conventional 6T cell. A single-ended sense amplifier is also presented to read from the proposed cell while a unique write mechanism is used to reduce the write power to less than half of the write power of the 6T cell. The proposed cell consumes similar silicon area and leakage power as the 6T cell when laid out and simulated using a commercial 65-nm CMOS technology. The ground gating is done by selectively controlling the column virtual ground (CVG) of accessed word in a row. This significantly reduces the leakage power consumption and enables implementing multiple words per row, which lowers multiple-bit data upset in the event of radiation induced single event upset or soft error. In addition, the proposed cell inherently has a 30% larger soft error critical charge, making its soft error rate (SER) less than the half of that of the 6T cell. & 2011 Elsevier B.V. All rights reserved. 1. Introduction The advancements of semiconductor technology have boosted the rapid growth of very large scale integrated (VLSI) systems. Now microprocessors are being used for all type of applications in every aspect of our life. To meet the increasing demand of performance in various applications, it is required to have large amount of memories on the same die as the microprocessor itself. Primarily, static random access memory (SRAM) is used to realize these embedded memories because of SRAM’s compatibility with the logic process, high speed of operation, and low leakage power consumption. The bit-cell that stores the data in an SRAM typically consists of a back-to-back inverter latch and two access NMOS transistors [1], as shown in Fig. 1. For more than four decades, this six transistor (6T) SRAM cell has been the workhorse for embedded memory solutions. However, as the CMOS technol- ogy continues to scale in sub-100 nm regime, the increasing process variations and reducing voltage headroom poses signifi- cant design challenge in terms of data stability, leakage-power consumption, and yield. The 6T SRAM cell suffers from a problem that is inherent in its structure. The more stable the cell is during read operation, the more difficult it is to write the data into the cell. During a read operation, the bit lines are typically precharged to the supply voltage (V DD ). If Q ¼ ‘0’ (and QB ¼ ‘1’), asserting WL pulls up the voltage level of Q (V Q ) to DV, which depends on the relative strength of M AL and M NL . If DV exceeds the tripping voltage (V TRIP ) of the inverter, the data bit flips (QB becomes ‘0’). In order to keep DV below V TRIP , M AL is made weaker than M NL , and so is M AR than M NR to keep the symmetry of the cell. On the other hand, the write operation is accomplished by the voltage division between the access-transistor and the pull-up transistor (M AL and M PL or M AR and M PR ). If Q ¼ ‘0’ and we want to make Q ¼ ‘1’, BLB is discharged to ‘0’, BL is kept at V DD , and WL is asserted. Conse- quently, M AR causes a current flow through itself and M PR , low- ering the voltage level of QB (V QB ). However, V Q merely remains at DV due to the design constraint of the read operation. Thus, to successfully write into the cell, V QB must fall below V TRIP , and M AR has to be stronger than M PR (M AL stronger than M PL ). This delicate balance of the ratios of transistor strengths is severely impacted by device variations in terms of threshold voltage (V t ) variations due to random dopant fluctuations and channel length and width variations due to line edge roughness (LER) in scaled technologies. Usually, in order to meet the high density requirements, an SRAM cell uses the minimum possible sizes for the transistors, which experience the variations even more. The situation is aggravated by lower supply voltages of scaled technologies as the V t variation consumes a larger fraction of the voltage margins. Variations thus limit the minimum operating voltage of the 6T-cell based SRAMs. The operating voltage of SRAMs is often reduced to lower the leakage power consumption, which is a growing concern in scaled CMOS technologies. In order to address these concerns, we propose an SRAM architecture based on a 7T SRAM bit-cell (Fig. 2). The read operation of the cell being Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/vlsi INTEGRATION, the VLSI journal 0167-9260/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.vlsi.2011.11.016 n Corresponding author. E-mail address: [email protected] (W. Hussain). INTEGRATION, the VLSI journal 45 (2012) 229–236
8

A read-decoupled gated-ground SRAM architecture for low-power embedded memories

Mar 10, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

INTEGRATION, the VLSI journal 45 (2012) 229–236

Contents lists available at SciVerse ScienceDirect

INTEGRATION, the VLSI journal

0167-92

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/vlsi

A read-decoupled gated-ground SRAM architecture for low-powerembedded memories

Wasim Hussain a,n, Shah M. Jahinuzzaman b

a Electrical and Computer Engineering, Concordia University, 1515 St. Catherine Street West, Montreal, QC, Canada H3G 2W1b Logic Technology Development Q&R, Intel Corporation, Hillsboro, OR 97124, USA

a r t i c l e i n f o

Available online 6 December 2011

Keywords:

SRAM bit-cell

Read noise margin

Leakage power

Soft error

60/$ - see front matter & 2011 Elsevier B.V. A

016/j.vlsi.2011.11.016

esponding author.

ail address: [email protected] (W. H

a b s t r a c t

In this work, a gated ground SRAM architecture based on a seven transistor (7T) bit-cell is proposed.

The proposed cell shows higher data stability and yield under varying process, voltage, and

temperature (PVT) conditions than the conventional 6T cell. A single-ended sense amplifier is also

presented to read from the proposed cell while a unique write mechanism is used to reduce the write

power to less than half of the write power of the 6T cell. The proposed cell consumes similar silicon area

and leakage power as the 6T cell when laid out and simulated using a commercial 65-nm CMOS

technology. The ground gating is done by selectively controlling the column virtual ground (CVG) of

accessed word in a row. This significantly reduces the leakage power consumption and enables

implementing multiple words per row, which lowers multiple-bit data upset in the event of radiation

induced single event upset or soft error. In addition, the proposed cell inherently has a 30% larger soft

error critical charge, making its soft error rate (SER) less than the half of that of the 6T cell.

& 2011 Elsevier B.V. All rights reserved.

1. Introduction

The advancements of semiconductor technology have boostedthe rapid growth of very large scale integrated (VLSI) systems.Now microprocessors are being used for all type of applicationsin every aspect of our life. To meet the increasing demand ofperformance in various applications, it is required to have largeamount of memories on the same die as the microprocessor itself.Primarily, static random access memory (SRAM) is used to realizethese embedded memories because of SRAM’s compatibility withthe logic process, high speed of operation, and low leakage powerconsumption. The bit-cell that stores the data in an SRAMtypically consists of a back-to-back inverter latch and two accessNMOS transistors [1], as shown in Fig. 1. For more than fourdecades, this six transistor (6T) SRAM cell has been the workhorsefor embedded memory solutions. However, as the CMOS technol-ogy continues to scale in sub-100 nm regime, the increasingprocess variations and reducing voltage headroom poses signifi-cant design challenge in terms of data stability, leakage-powerconsumption, and yield.

The 6T SRAM cell suffers from a problem that is inherent in itsstructure. The more stable the cell is during read operation, themore difficult it is to write the data into the cell. During a readoperation, the bit lines are typically precharged to the supplyvoltage (VDD). If Q¼ ‘0’ (and QB¼ ‘1’), asserting WL pulls up the

ll rights reserved.

ussain).

voltage level of Q (VQ) to DV, which depends on the relativestrength of MAL and MNL. If DV exceeds the tripping voltage (VTRIP)of the inverter, the data bit flips (QB becomes ‘0’). In order to keepDV below VTRIP, MAL is made weaker than MNL, and so is MAR thanMNR to keep the symmetry of the cell. On the other hand, thewrite operation is accomplished by the voltage division betweenthe access-transistor and the pull-up transistor (MAL and MPL orMAR and MPR). If Q¼ ‘0’ and we want to make Q¼ ‘1’, BLB isdischarged to ‘0’, BL is kept at VDD, and WL is asserted. Conse-quently, MAR causes a current flow through itself and MPR, low-ering the voltage level of QB (VQB). However, VQ merely remains atDV due to the design constraint of the read operation. Thus, tosuccessfully write into the cell, VQB must fall below VTRIP, and MAR

has to be stronger than MPR (MAL stronger than MPL). This delicatebalance of the ratios of transistor strengths is severely impactedby device variations in terms of threshold voltage (Vt) variationsdue to random dopant fluctuations and channel length and widthvariations due to line edge roughness (LER) in scaled technologies.

Usually, in order to meet the high density requirements, anSRAM cell uses the minimum possible sizes for the transistors,which experience the variations even more. The situation isaggravated by lower supply voltages of scaled technologies asthe Vt variation consumes a larger fraction of the voltage margins.Variations thus limit the minimum operating voltage of the6T-cell based SRAMs. The operating voltage of SRAMs isoften reduced to lower the leakage power consumption, whichis a growing concern in scaled CMOS technologies. In order toaddress these concerns, we propose an SRAM architecture basedon a 7T SRAM bit-cell (Fig. 2). The read operation of the cell being

Page 2: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

Fig. 1. Conventional 6T SRAM cell.

Fig. 2. Proposed 7T SRAM cell.

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236230

single-ended, a single-ended sense amplifier is also proposed.CVG technique has also been applied to the proposed bit-cell forarray implementation to accommodate multiple words per row.

2. Background

Previously, a number of design techniques have been proposedto enable low-voltage operation of 6T cells. Refs. [2,3] haveproposed techniques where the read bit lines are decoupled fromthe write bit lines. However, those designs incur significant areapenalty (30% in [2] and 37.8% in [3]). In addition, these designs arebasically extension of the 6T cells and write operation is similar tothe 6T cell. Thus, they consume comparatively high power duringwrite operation, since the bit lines have to be discharged by alarge amount to ensure successful writing.

Techniques have been proposed where a higher supply voltagededicated to the SRAM array was used to ensure sufficientmargins with scaling of the logic supply voltage. Instead of beingtied to a fixed higher supply, SRAM arrays could also usedynamically modulated supplies that are pulsed to differentlevels when a read or write event occurs [4,5]. Though suchtechnique adds design complexities, it decouples read and writeevents from the standby condition such that the optimum biascondition can be utilized in each case.

Fig. 3. Worst-case read SNM for 7T –SRAM is simply that for two cross-coupled

inverters.

3. Proposed 7T SRAM bit-cell

As shown in Fig. 2, the proposed cell is based on the portless fivetransistor SRAM cell proposed in [6]. However, using transistors RAX1

and RAX2, we have decoupled the read bit line from the write bit lines.Transistor RAX1 is controlled by a read word line (RWL) whiletransistor WAX is controlled by a write word line (WWL) in readand write operations, respectively. Conversely, a single transistorsimilar to WAX was used in [6] for both read and write operations.As a result, the sizing of that transistor in [6] was very critical. It hadto be strong enough to ensure a successful write in all corners while ithad to be weak enough for data retention during the read operation.In addition, if the transistor was weak, the write operation wouldhave required the bit lines to be discharged by a significant amount.

This would have resulted in higher power consumption due to thesubsequent pre-charge of the bit lines. In our proposed 7 T cell, thewrite access transistor (WAX) is only used for write operation andhence can be made as strong as required. In fact, by making WAX

strong, we have limited the required bit line discharge during thewrite operation, thus making the write power consumption less thanhalf of the write power consumed by the 6 T cell.

On the other hand, the read operation does not affect thecontents of the proposed 7 T cell as the read bit line is decoupled.The worst-case static noise margin (SNM), as defined in [7], for thecell is simply that for two cross-coupled inverters (Fig. 3) as the logic‘0’ node does not form any voltage divider with the read bit line.In addition, as the cell does not use multiple Vt, which is oftenemployed to improve cell stability or reduce cell leakage [8], the cellis suitable to realize in the standard CMOS process without anyadditional process steps like implant masks, gate oxides, etc.

3.1. Principle of cell operation

The write operation is done by asserting WWL (Fig. 2) signaland discharging BL (for ‘0’ write) or BLB (for ‘1’ write). Assuming,Q¼ ‘0’ and we want to make Q¼ ‘1’, we will assert the WWL. Thiswill pull up the voltage level of Q from 0 V and pull down thevoltage level of QB from VDD. But the pulled down level of QB willstill be above the pulled up level of Q. Then BLB will begin to bedischarged and as a result pulled down level of QB will decreaseeven more. When the level of QB falls below the pulled up level ofQ, WWL will be turned off. Subsequently Q will latch to VDD whileQB latches to 0 V and a successful write operation will beaccomplished. The stronger the write-access transistor is, theweaker the cell becomes when WWL is asserted and easier it is towrite data in the cell. ‘Easier’ means less discharge (of BL or BLB)will be required for successful write operation. This fact is utilizedin our cell to make it low-power relative to other cells.

The read operation is initiated by asserting RWL. If QB¼ ‘1’ (Q¼0),the RBL discharges indicating ‘0’ read. If Q¼ ‘1’, RBL does notdischarge, indicating ‘1’ read. The read discharge path is similar tothe read discharge path of a 6T SRAM cell since both constitute of twominimum sized NMOS. Thus, in terms of discharge speed the 7T cellhas similar performance. Unlike 6T cell, the read mechanism is single-ended and thus incurs some noise sensitivity. That can be solved byusing a slightly larger NMOS for RAX1 and RAX2 (Fig. 2), ensuring largerdischarge than is usually done for differential sensing.

3.2. Principle of array operation

The array implementation of the proposed 7T SRAM cellrequires a second set of WL drivers. But this does not add to the

Page 3: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236 231

area since these word lines run horizontally. And to accommodatethese two word lines the height of the cell did not need to beincreased.

Since the 7 T cell reduces the write power by using a method ofwriting where the cell is intentionally made weak during writingtime window, the 7 T cell by itself cannot support multiple wordsin a row. Because, if multiple word are implemented in a row thenduring write operation of one word bit-cells belonging to the otherwords will be exposed to ‘‘half-selected state’’ (half-selected stateis when WAX is ON but BL and BLB are held at VDD). Due to the cell’sextreme vulnerability in half selected state, the data may bedestroyed. As a result, modifications will be required in the arrayorganization to implement multiple words per row.

Multiple words per row can be used to improve array efficiencyby multiplexing adjacent columns into shared sense amplifiers.It allows the banks to be larger which lessens the required numberof banks. And that lessens the required decoding circuitry. It alsoenables protection from multi-bit soft error events. Because bitsfrom different words can be physically interleaved (Fig. 4) to ensurethat a multi-bit error can affect at most one bit per word—an errorthat can be easily detected or corrected with simple parity checkingor error correcting codes (ECC).

The 7 T SRAM cell can support multiple words in a row if thearray is implemented by column virtual grounding (CVG) techni-ques, as was proposed in [9] because the vulnerability of the‘‘half-selected’’ state is removed by the CVG techniques. Theprinciple of the CVG technique is that all the cells in a columnshare a common VGND line (Fig. 5). During hold mode the VGND ofall the columns are kept at a non-zero value, namely VBIAS. Duringa write operation (as well as read operation) VGND of only thecolumns containing the targeted words are pulled down to 0 Vfrom VBIAS. And the respective BL and BLB are discharged accord-ing to the data intended to be written. However, the activatedWWL signal also turn on the WAX of the other cells in the samerow. But their VGND remain at VBIAS. Even though reverse bodybias is applied to MNL, MNR and WAX, WAX becomes comparativelyweaker than MNL and MNR because higher body bias is applied toit. As a result, the cells belonging to the other words in the samerow do not flip. The situation of the half-selected cells (belongingto the columns whose VGND remain at VBIAS) becomes tantamountto using a WAX with longer channel length.

Fig. 4. (a) Floor plan, in which column select allows physical interleaving of bits

from different words. (b) Floor plan, in which all bits of each word are spatially

adjacent. Additional parity or ECC bits are needed to prevent multi-bit error.

For read operation, similarly only the columns containing thetargeted word are pulled down to 0 V. And the respective read bitline discharges (or not) according to the stored data. The cellsbelonging to the unselected columns do not have sufficientoverdrive in RAX1 and RAX2 since their VGND is kept at VBIAS. Thus,their respective read bit line discharges by small amount, whichsaves the subsequent precharge energy.

4. The proposed single ended sense-amplifier

An inherent problem of the sense amplifier is the ‘‘memory’’ fromthe previous evaluation. Let us assume, in the previous evaluationperiod the sense amplifier made an evaluation of OUTþ¼ ‘1’(OUT�¼ ‘0’) as shown in Fig. 6(a) and in the next evaluation periodthe sense amplifier should make an evaluation of OUTþ¼ ‘0’ (OUT-¼ ‘1’). That means the latching mechanism inside the sense amplifierhas to be flipped. But due to mismatch between the transistors, thelatching mechanism can be biased towards OUTþ¼ ‘1’ or thegenerated voltage differential between the bit lines can be too smallfor a ‘‘successful’’ evaluation. To remove the sense amplifier’s mem-ory, all nodes in the sense amplifier are driven to a known voltage.None of the nodes are kept floating or dynamically charged, becausekeeping a node floating can result it into being charged or dischargedfrom the previous evaluation. In another words, the two nodes OUTþand OUT� of the sense amplifier are precharged to VDD before theinitiation of the evaluation period and during evaluation period one ofthose two nodes is driven to zero potential based on the dischargingof one of the bit lines. If none of the bit line discharges then a racecondition occurs and the latching mechanism of the sense amplifiercan latch into any direction.

This gives rise to the sensing problem ensued in single-endedsensing. Because in single ended sensing, there is only one bit lineand it either discharges or it does not. If it discharges then there isno problem in the evaluation phase. But if the bit line does notdischarge then a race condition arises. And a chance arises ofmaking a wrong evaluation. Thus, differential sense amplifiercannot be used for single-ended sensing.

The proposed sense amplifier is shown in Fig. 6(b). It isactually based on the proposed 7 T SRAM cell. The proposedsense amplifier utilizes the ‘‘memory of a previous evaluation’’ tocircumvent the problem of race condition. Instead of prechargingboth Q_SA and QB_SA to VDD, read operation is initiated bymaking Q_SA¼ ‘1’ (and QB_SA¼ ‘0’) by a reset operation. If theread bit line discharges then the sense amplifier flips to Q_SA¼ ‘0’(and QB_SA¼ ‘1’). And if the read bit line does not discharge thesense amplifier continues storing Q_SA¼ ‘1’. Thus, there is no racecondition in the sensing mechanism.

Another advantage of this sense amplifier, for the proposed 7TSRAM cell array, is its similarity to the cell itself. Thus, the senseamplifier can be laid out with same pitch as the SRAM cell column,which is very important for the overall area efficiency of the SRAMarray. In 6T SRAM arrays multiple columns are shared by a singlesense amplifier. Thus, the space allowed for a sense amplifier is large.But as was explained earlier, multiple words cannot be implementedin the proposed 7T SRAM cell array. Thus multiple columns cannot beshared by a single sense amplifier. The sense amplifier must haveequal or smaller width than the column. Since the latching compo-nent of the sense amplifier is similar to the cell, that pitch equalitycan be maintained even under different design rules.

4.1. The Principle of the operation of the proposed single ended

sense-amplifier

Before the initiation of the read operation, RST is asserted. Thatwill ensure that Q_SA¼ ‘1’ (and QB_SA¼ ‘0’). Since MRST1 has its one

Page 4: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

Fig. 5. Memory array using Column Virtual Grounding (CVG). (a) A single column with CVG. (b) An array with CVG.

Fig. 6. (a) A basic clocked sense amplifier. (b) The proposed single-ended Sense-

Amplifier.

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236232

end physically connected to GND and MRST2 has its one endphysically connected to VDD, a very short pulse is enough to makeQ_SA¼ ‘1’. Then SAE (Fig. 6(b)) is asserted. As a result, the VQ_SA will

be pulled down and VQB_SA will be pulled up to an intermediatelevel. If the RBL (read bit line) discharges, the pulled down level ofVQ_SA will drop below the elevated level of VQB_SA and the senseamplifier will flip, indicating that the cell being read is storing Q¼ ‘0’.If the RBL does not discharge, the pulled down level of VQ_SA will notdrop below the elevated level of VQB_SA and the sense amplifier willnot flip, indicating that the cell being read is storing Q¼ ‘1’.

5. Performance analysis of the 7T bit-cell

The layouts of the 7T SRAM cell as well as 6T SRAM cell weremade in TSMC 65 nm process. The extracted layout was used tosimulate the behavior of the cells under various process corners,read operation and write operation.

5.1. Write power consumption

In the proposed 7 T cell when the WWL is asserted, the WAX

transistor turns ON and weakens the cell from inside. As a result,small amount of noise (discharge at either of the bit line BL/BLB),in terms of power consumption, ensures flipping of the cell in thedesired direction. For 6 T cells the bit lines need to be dischargedby a large amount (from VDD to 0 V) and as a result subsequentprecharge takes large amount of power. In 7 T cell, bit lines needsmall discharge and as a result subsequent precharge power issignificantly smaller. A comparison of subsequent precharge

Page 5: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

Table 1Energy consumption for successful write operation.

VDD(V) Energy consumption in a column for successful Write (fJ)

7T Cell 6T Cell

1 43.0 97.0

.9 31.5 76.5

.8 23.8 60.0

.7 14.0 46.9

.6 16.2 34.8

Table 2Energy consumption per column in a read operation.

Cell Energy consumption in a column for read operation(fJ)

‘0’ read ‘1’ read

7T 20.73 12.08

6T 20.00 20.00

Table 3Cell leakage current.

Cell Leakage current (nA) Average (nA)

Storing ‘0’ Storing ‘1’

7T 6.40 4.38 5.39

6T 5.63 5.63 5.63

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236 233

energy after a write operation is given in Table 1. The energyincludes the bit line precharge energy and the write driver energy.

It is important to note that the different method of writing(utilized in the proposed design) introduces a dependency of bitline capacitance on cell data, an effect not seen in conventionalSRAM architectures. This relationship results from the directconnection of the cell PMOS’s to the bit lines. The PMOSconnected to the HIGH data node operates in the triode regionwhile the LOW data node PMOS is effectively off. Thus, theparasitic capacitance of the HIGH data node will be included inthe connecting bit line. The HIGH side bit line will thereforeexperience a much higher effective capacitance in comparison tothe LOW side. In the extreme case where all the cells in a columnstore same data, the bit line connected to the HIGH side will havelarger (about 3 times of the bit line connected to the low side)effective capacitance. As a result, write driver should be strongenough to discharge the maximum effective capacitance bit line(connected to the HIGH side) sufficiently so as to ensure success-ful write operation. However, if the stored data in all the cells arereversed then the maximum effective capacitance bit line willbecome minimum effective capacitance bit line and the ‘‘strong’’write driver will discharge that bit line by a larger amount.

The sizing of WAX was made to be W¼150 nm and L¼90 nm(minimum permissible length is 60 nm). A first order analysiswould indicate that optimized write operation will require theWAX to be as strong as possible. Because stronger WAX will bringthe voltage level of Q and QB closer to each other thus making iteasier to flip by discharging BL/BLB. But, due to process variationand mismatch the VTRIP of both inverters are not always same.Assuming Q¼ ‘0’ (and QB¼ ’1’) and we want to make Q¼ ‘1’, it isnot enough that the pulled down voltage level of QB is made tofall just below the elevated level of Q by discharging BLB. Forsuccessful write operation in all variation corners, VQB should fallbelow VQ by a certain amount to ensure that VQB itself indeedbecomes less than extreme cases of VTRIP. Though stronger WAX

brings VQ and VQB closer, it also prevents subsequent fall of VQB (orVQ) by the discharge of BLB (or BL). Thus, there is an optimumsizing for WAX that will result in the minimum discharge in BL (orBLB) for successful write operation in all variation corners.Extensive Monte-Carlo simulation was done with different sizingof WAX and it was found out that the sizing of W¼150 nm andL¼90 nm results in the minimum BL/BLB discharge of 100 mV forsuccessful write operation in all corners.

Ensuring 100 mV of discharge for the case of maximumeffective bit line capacitance will translate into a discharge of290 mV for the case of minimum effective bit line capacitance.And a discharge of 290 mV does not have any destructive effecton the other cells in the same column. It has been seen that aslong as the ‘‘discharged state’’ has a duration of less than 500 ps(the bit line gets precharged for the next write operation withinthat period), discharge of up to 700 mV (i.e. the bit line voltagedrops to 300 mV for a VDD of 1 V) does not have any destructiveeffect on the other cells. That will give a safety margin of about410 mV.

5.2. Read performance

Read operation was performed satisfactorily with a pulse-width of 150 ps at RWL for VDD¼1 V. For a pulse width of 150 psthe RBL discharges by 130 mV, which is sufficient to ensureproper sensing by the proposed sense amplifier, as was verifiedby Monte-Carlo simulation under various mismatch corners. Theenergy consumed in a column during a read operation is given inTable 2. Since the cell is single-ended, the energy consumption for‘0’ and ‘1’ read is not equal. The energy includes the read bit lineprecharge energy and the dynamic energy of the sense amplifier.Also, dimensions of the transistors in the read discharge paths ofboth 6 T and 7 T cells result in similar discharge speed.

5.3. Cell leakage

Since the 7 T SRAM cell is asymmetric, the leakage currentdepends on the stored data. When the stored value is ‘0’ (Q¼ ‘0’),one of the NMOS in the read current path is ON and one is OFFwhile when the stored value is ‘1’ (Q¼ ‘1’) both NMOS in read pathare OFF. Thus, leakage current for Q¼ ‘0’ is higher (rest of the cellremains same for both situation). Leakage current for the 7 T cellis taken to be the average of Q¼ ‘1’ and Q¼ ‘0’.

Cell leakage current for VDD¼1 V is shown in Table 3. Acomparison of leakage currents of 6 T cell and the proposed 7 Tcell as a function of VDD is shown in Fig. 7.

5.4. Soft-error tolerance

Radiation-induced single event transient (SET) has emerged as acritical reliability concern for integrated circuits in sub-100 nan-ometer CMOS technologies [10]. When a sensitive node of a memorycircuit is affected by alpha-particle or high energy neutrons, avoltage transient is induced at that node. The transient is referredto as an SET, which can flip the stored data (‘0’ to ‘1’ or vice versa) ifthe amplitude and duration of the SET is large. Such data flipping isreferred to as a single event upset (SEU) or ‘soft error’ as it does notpermanently damage the memory circuit. However, SEUs causecomputational errors, which can lead to system failure. Accordingly,state-of-the-art microprocessors require SEU protection [11]. Since amicroprocessor or an SOC consist of a large number of SRAM cells,making the SRAM cells SEU robust is vital to ensure the overallreliability of the system.

Typically, an SRAM cell experiences a SEU by having a SET at asensitive node of the back-to-back inverter inside the cell. Thevulnerability of SRAM to soft error is assessed by its critical

Page 6: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

Fig. 8. Comparison of critical charge for 6T and 7T SRAM cells.

Fig. 7. A comparison of leakage currents of 6T cell and the proposed 7T cell as a

function of rail-to-rail voltage.

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236234

charge (Qcrit) [12]. Qcrit is the minimum amount of charge that canflip the data bit stored in an SRAM cell. It exhibits an exponentialrelationship with the soft error rate (SER) [13]. It should be ashigh as possible in order to limit the SER. The various criticalcharge models which have been reported to date agree in thequalitative definition. However, they differ in quantitativedescription. For example, in [12,14], Qcrit has been modeled bythe following equation,

Qcrit ¼ CNVDDþ IDPTF ð1Þ

Where, CN is the equivalent capacitance of the struck node, VDD isthe supply voltage, IDP is the maximum current of the ON PMOStransistor and TF is the cell flipping time. In a conventional 6T cellthe driver NMOS has a width of 1.5 to 1.7 times more than that ofthe PMOS for sufficient write margin. The mobility of n-channel isusually 2 to 3 times of that of a p-channel and as a result thestrength of the driver NMOS is several times higher than that of thePMOS. In a back-to-back inverter data is retained by two nodeshaving complementary value, namely ‘0’ and ‘1’. ‘0’ is retained bythe connecting NMOS and ‘1’ is retained by the connecting PMOS. Ifa SET hits the ‘0’ node and tries to change the voltage level, theconnecting NMOS is more successful in retaining it than the PMOSwhen a SET hits the ‘1’ node because the strength of NMOS ishigher than the PMOS. Since, vulnerability is to be assessed by theworse case of the two types of possible flipping scenario, Qcrit of anSRAM cell is measured from the ‘1 to 0’ flipping scenario. As aresult, the recovering current used in (6.1) is PMOS current.

A dilemma in 6T SRAM cell is that PMOS cannot be upsizedarbitrarily, since that would require strengthening the accesstransistor (for maintaining writability) and subsequently the driverNMOS (for ensuring read stability). But in the 7T cell there is no suchrestriction. In fact, to maintain equal critical charge for both ‘0 to1 flip’ and ‘1 to 0 flip’ the aspect ratio of the PMOS should be at leasttwice of the driver NMOS, which is not possible in 6T-cell. Even in8T cell [2], where read bit line is decoupled and thus there is noneed for the driver NMOS to be stronger than the access transistor,the PMOS cannot be made too strong. Because that would make thewrite margin too small and thus the writability may totallydisappear in worst case variation scenario. But in 7T cell, suchdesign can be accommodated. A comparison of critical charge for 6Tand the proposed 7T SRAM cell is given in Fig. 8. And moreimportantly if leakage power consumption is not the main issuethen the width of the inverter pull-up transistor can be increased forhigher critical charge without sacrificing the write margin.

The SER (Soft error rate) per bit in an SRAM has been describedand experimentally verified by the following empirical model by

Hazucha and Svensson [13],

SERbitpFAexp �Qcrit

QS

� �, or

SERbit ¼ KFAexp �Qcrit

QS

� �ð2Þ

Here, F is the neutron flux with energy greater than 1 MeV, inparticles/cm2

�s; A is the sensitive area of the circuit, in cm2; andQs is the charge collection efficiency of the cell in fC. Typically, Qs

depends on the magnitude of the particle-induced charge, sub-strate doping, carrier mobility, and the voltage of the collectingnode and neighboring nodes. Since different cells have differentcharge collection volume they may have different charge collec-tion efficiency from a single particle strike. However, in the first-order if we assume that the charge collection efficiency of thesensitive node is same in each case, we can estimate the normal-ized SER of the cells by assuming KFA¼1. From [15] an experi-mental value of Qs is taken to be 1.187 fC. Based on that, SER fortwo test case of Qs¼ .5 fC and 1.187 fC is shown in Fig. 9.

5.5. Cell area

Silicon die area is a very expensive resource and since memoryaccounts for as much as 80% of the total area of an SOC, cell area isa very important factor in memory design. Though 7T cell has onemore transistor than 6T cell, the area does not increase becausethat seventh transistor, which is an NMOS, is accommodatedbetween the two driver NMOS of the inverters. The area of a 7TSRAM cell is same as a 6T SRAM cell.

In the layout 3 metal layers was used, which is the minimumeven in conventional 6T SRAM designs. Metal1 is used forinterconnections inside the cell, Metal2 is used for bit lines andVSS, Metal3 is used for the word lines. A figure of the layout isgiven in Fig. 10.

5.6. Array implementation with column virtual grounding

Monte-Carlo simulation of 1000 run was done with aVBIAS¼300 mV and 400 mV (Fig. 5(a)) and no instances of flippingwas observed when the cell WWL was asserted and BL/BLB waskept at VDD. However, when same simulation was performed forVBIAS¼0 V, which is equivalent to no virtual grounding, more than200 instance of flipping was observed. A leakage comparisonbetween with and without virtual grounding of the proposed 7 Tcell is shown in Table 4. A comparison of leakage currents of 6 Tcell and the proposed 7 T cell as a function of rail-to-rail voltage isshown in Fig. 11 (rail-to-rail voltage for 6 T is VDD-0 V while for7T cell is VDD-VGND).

The power savings from any type of virtual grounding techni-ques (or virtual VDD) depend on the switching activity factor

Page 7: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

Fig. 10. 7T cell Layout (The area inside the dotted boundary belongs to one cell).

Table 4Leakage comparison between with and without virtual grounding of the proposed

7T cell.

VGND Leakage current per cell (nA) Average (nA)

Storing ‘0’ Storing ‘1’

0V 6.4 4.38 5.39

300mV 1.76 1.58 1.67

400mV 1.27 1.18 1.23

Fig. 11. A comparison of leakage currents of 6T cell and the proposed 7T cell as a

function of rail-to-rail voltage.

Table 5Minimum average time between two consecutive access with CVG so that leakage

power offsets the dynamic power needed for each access.

Number of word

implemented in a row

Minimum average time between two

consecutive access (ns)

4 41

8 16

16 4

Fig. 9. Comparison of SER for 6T and 7T SRAM cells.

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236 235

(minimum average time between two consecutive accesses).Because, whenever a data is accessed, the VGND (or VVDD) lineshave to be activated and that consumes some dynamic power.If the switching activity factor is high, the dynamic powerconsumption for activation may offset the leakage power savings.Also, the power efficiency of the column virtual grounding

techniques also depends on the number of words implementedin a row. In fact, the CVG technique is more power-efficient whenthe number of words implemented in a row is large. Based on thefirst order analysis an estimate of the average time between twoconsecutive accesses for different number of words implementedin a row, so that the leakage power savings offset the dynamicpower consumption, is given in Table 5.

6. Conclusion

The proposed 7T SRAM cell can be used as an alternative to thetraditional 6T SRAM in cache memories to save power withoutcompromising the read stability. The decoupled read/write bitlines of the proposed cell prevents the read noise-induced dataupsets. The area of the cell layout is the same as that of the 6TSRAM cell while the other read-noise robust cells can incur largearea overhead (30% in [2], 37% in [3]). The leakage powerconsumption of the proposed cell is also similar to 6T cell whileit has 30% higher soft error critical charge. By utilizing the columnvirtual grounding, the proposed cell not only reduces the leakagepower, but also supports multiple words per row. Multiple wordsper row enable bit interleaving and using simple error correctioncodes to reduce the for soft error rate.

The proposed sense amplifier is particularly suitable to be usedwith the proposed 7T SRAM cell. The sense amplifier, being ofsimilar structure as the bit-cell, can be laid out with similardimensions as the bit-cell itself. Thus, it can be pitch-matchedwith the cell array, even if one word per row is implemented.

Page 8: A read-decoupled gated-ground SRAM architecture for low-power embedded memories

W. Hussain, S.M. Jahinuzzaman / INTEGRATION, the VLSI journal 45 (2012) 229–236236

Acknowledgment

This research was supported in parts by the Discovery Grant ofNatural Sciences and Engineering Research Council (NSERC) ofCanada and the Faculty Start-up Grant of Concordia UniversityFaculty of Engineering and Computer Science (ENCS).

Reference

[1] J.D. Schmidt, Integrated MOS random-access memory, Solid-State Design(1965) 21–25.

[2] L. Chang, R.K. Montoye, Y. Nakamura, K.A. Batson, An 8T-SRAM for variabilitytolerance and low-voltage operation in high-performance caches, IEEEJournal of Solid-State Circuits 43 (4) (2008) 956–963.

[3] Z. Liu, V. Kursun, Characterization of a novel nine-transistor SRAM Cell, IEEETransactions On Very Large Scale Integration (VLSI) Systems 16 (4) (2008)488–492.

[4] A.J. Bhavnaganvala, S.V. Kosonocky, S.P. Kowalczyk, R.V. Joshi, Y.H. Chant,U. Srinivasan, et al., A transregional CMOS SRAM with single, logic VDD anddynamic power rails, Symposium On VLSl Circuits Digest of Technical Papers(2004) 292–293.

[5] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N. Vallepalli,et al., A 3-GHz 70-Mb SRAM in 65-nm CMOS technology with integratedcolumn-based dynamic power supply, IEEE Journal of Solid-State Circuits41 (1) (2006) 146–151.

[6] M. Wieckowski, S. Patil, M. Margala, Portless SRAM—a high-performancealternative to the 6T methodology, IEEE Journal of Solid-State Circuits42 (No. 11) (2007) 2600–2610.

[7] E. Seevinck, Static-noise margin analysis of MOS SRAM cells, IEEE Journal ofSolid-State Circuits 22 (No. 5) (1987) 748–754.

[8] G. Torrens, B. Alorda, S. Barcelo, J.L. Rossello, S.A. Bota, J. Segura, Designhardening of nanometer SRAMs through transistor width modulation andmulti-Vt combination, IEEE Transaction on Circuits and Systems—II: ExpressBriefs 57 (No. 4) (2010) 280–284.

[9] N. Shibata, A switched virtual-GND level technique for fast and low powerSRAM’s, IEICE Trans. Electron. E80-C (1997) 1598–1607.

[10] R.C. Baumann, Soft errors in advanced computer systems, Design & Test ofComputers, IEEE 22 (3) (2005) 258–266.

[11] Krueger, D., Francom, E., & Langsdorf, J. 2008. Circuit design for voltagescaling and SER immunity on a quad-core Itaniums processor. Solid-StateCircuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEEInternational. pp. 94–95.

[12] Roche, P., Palau, J.M., Tavernier, C., Bruguier, G., Ecoffet, R., and Gasiot, J.Determination of key parameters for SEU occurrence using 3-D full cell SRAMsimulations, IEEE Trans. Nucl. Sci., 46, 6, 1354–1362.

[13] P. Hazucha, C. Svensson, Impact of CMOS technology scaling on the atmo-spheric neutron soft error rate, IEEE Transaction on Nuclear Science 47 (6)(2000) 2586–2594.

[14] J.M. Palau, G. Hubert, K. Coulie, B. Sagnes, M.C. Calvet, S. Fourtine, Devicesimulation study of the SEU sensitivity of SRAMs to internal ion tracksgenerated by nuclear reactions, IEEE Transactions on Nuclear Science 48 (2)(2001) 225–231.

[15] S.M. Jahinuzzaman, J.S. Shah, D.J. Rennie, M. Sachdev, Design and analysis ofA 5.3-pJ 64-kb gated ground SRAM with multiword ECC, IEEE Journal ofSolid-State Circuits 44 (9) (2009) 2543–2553.

Wasim Hussain was born in Dhaka, Bangladesh, in1983. He received the B.Sc. from Bangladesh Univer-sity of Engineering and Technology (BUET) in 2008. Heis currently pursuing M.A.Sc. at Concordia University,Canada. He has worked as a lecturer in BUET fromAugust, 2008 to December, 2009. Since January, 2010he has been working as a research assistant atConcordia University. His research interests includelow-power/high-speed SRAM design.

Shah M. Jahinuzzaman (S‘03–M’08) received the B.Sc. degree (with honors) inelectrical and electronic engineering from Bangladesh University of Engineeringand Technology, Dhaka, Bangladesh, in 2002, and the M.A.Sc. and Ph.D. degrees inelectrical and computer engineering from the University of Waterloo, Waterloo,ON, Canada, in 2004 and 2008, respectively.

He was an Assistant Professor in the Department of Electrical and ComputerEngineering, Concordia University, Montreal, QC, Canada. Since December 2010,he is with Logic Technology Development Q&R at Intel Corporation in Hillsboro,Oregon. His research interests include soft error tolerant low-power/high-speeddigital circuits and embedded memories.