Semiconductor Memories Mohammad Sharifkhani. Outline Introduction Non-volatile memories.

Post on 13-Jan-2016

228 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

Transcript

Semiconductor Memories

Mohammad Sharifkhani

Outline

• Introduction

• Non-volatile memories

Semiconductor Memory Classification

Read-Write MemoryNon-VolatileRead-Write

Memory

Read-Only Memory

EPROM

E2PROM

FLASH

RandomAccess

Non-RandomAccess

SRAM

DRAM

Mask-Programmed

Programmable (PROM)

FIFO

Shift Register

CAM

LIFO

Memory Timing: Definitions

Write cycleRead access Read access

Read cycle

Write access

Data written

Data valid

DATA

WRITE

READ

Memory Architecture: Decoders

Word 0

Word 1

Word 2

WordN22

WordN21

Storagecell

M bits M bits

S0

S1

S2

SN - 2

A0

A1

AK-1

K = log2N

SN- 1

Word 0

Word 1

Word 2

WordN22

WordN21

Storagecell

S0

Input-Output(M bits)

Intuitive architecture for N x M memoryToo many select signals:

N words == N select signals K = log2NDecoder reduces the number of select signals

Input-Output(M bits)

Array-Structured Memory ArchitectureProblem: ASPECT RATIO or HEIGHT >> WIDTH

Amplify swing torail-to-rail amplitude

Selects appropriateword

Row

Dec

oder

Bit line2m

Word line

An

An+1

An+m

A0

M.2n

An - 1Column decoder

Input-Output(M bits)

Storage cell in the accessed

word

Hierarchical Memory Architecture

Advantages:Advantages:1. Shorter wires within blocks1. Shorter wires within blocks2. Block address activates only 1 block => power savings2. Block address activates only 1 block => power savings

Globalamplifier/driver

Controlcircuitry

Global data bus

Block selector

Block 0

Rowaddress

Columnaddress

Blockaddress

Blocki BlockP 2 1

I/O

Block Diagram of 4 Mbit SRAM

Clockgenerator

CS, WEbuffer

I/Obuffer

Y-addressbuffer

X-addressbuffer

x1/x4controller

Z-addressbuffer

X-addressbuffer

Predecoder and block selectorBit line load

Transfer gateColumn decoder

Sense amplifier and write driver

[Hirose90]

Contents-Addressable Memory

Ad

dre

ss D

eco

de

r

Data (64 bits)

I/O

Bu

ffe

rs

Comparand

CAM Array29 words3 64 bits

Mask

Control LogicR/W Address (9 bits)

Co

mm

an

ds

29 Va

lid

ity

Bit

s

Prio

rity

En

cod

er

Memory Timing: Approaches

DRAM TimingMultiplexed Adressing

SRAM TimingSelf-timed

Addressbus

RAS

RAS-CAS timing

Row Address

AddressBus

Address transitioninitiates memory operation

Address

Column Address

CAS

• Introduction

• Non volatile memories

Non-Volatile MemoriesThe Floating-gate transistor

(FAMOS)

Floating gate

Source

Substrate

Gate

Drain

n+ n+_p

tox

tox

Device cross-section Schematic symbol

G

S

D

Floating-Gate Transistor Programming

0 V

- 5 V 0 V

DS

Removing programming voltage leaves charge trapped

5 V

- 2.5 V 5 V

DS

Programming results in higher VT.

20 V

10 V 5 V 20 V

DS

Avalanche injection

A “Programmable-Threshold” Transistor

“0”-state “1”-state

DVT

VWL VGS

“ON”

“OFF”

FLOTOX EEPROM

Floating gate

Source

Substratep

Gate

Drain

n1 n1

FLOTOX transistorFowler-Nordheim I-V characteristic

20–30 nm

10 nm

-10 V

10 V

I

VGD

EEPROM Cell

WL

BL

VDD

Absolute threshold controlis hardUnprogrammed transistor might be depletion always on 2 transistor cell

Flash EEPROM

Control gate

erasure

p-substrate

Floating gate

Thin tunneling oxide

n1source n1drainprogramming

Many other options …

Cross-sections of NVM cells

EPROMFlashCourtesy Intel

Basic Operations in a NOR Flash Memory―

Erase

S D

12 VG

cell arrayBL0 BL1

open open

WL0

WL1

0 V

0 V

Basic Operations in a NOR Flash Memory―

Write

S D

12 V

6 VG

BL0 BL1

6 V 0 V

WL0

WL1

12 V

0 V

Basic Operations in a NOR Flash Memory―

Read5 V

1 VG

S D

BL0 BL1

1 V 0 V

WL0

WL1

5 V

0 V

NAND Flash Memory

Unit Cell

Word line(poly)

BL

Courtesy Toshiba

Gate

ONO

FGGateOxide

Select line

Select line

Source line(Diff. Layer)

NAND Flash Memory

Word linesSelect transistor

Bit line contact Source line contact

Active area

STI

Courtesy Toshiba

Characteristics of State-of-the-art NVM

Outline

• Introduction

• Non-volatile memories

• RAM

Read-Write Memories (RAM) STATIC (SRAM)

DYNAMIC (DRAM)

Data stored as long as supply is appliedLarge (6 transistors/cell)FastDifferential

Periodic refresh requiredSmall (1-3 transistors/cell)SlowerSingle Ended

6-transistor CMOS SRAM Cell

WL

BL

VDD

M5M6

M4

M1

M2

M3

BL

QQ

CMOS SRAM Analysis (Read)WL

BL

VDD

M 5

M 6

M 4

M1VDDVDD VDD

BL

Q = 1Q = 0

Cbit Cbit

CMOS SRAM Analysis (Read)

0

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.2 1.5 2Cell Ratio (CR)

2.5 3

Vo

ltage

Ris

e (

V)

CMOS SRAM Analysis (Write)

BL = 1 BL = 0

Q = 0

Q = 1

M1

M4

M5

M6

VDD

VDD

WL

CMOS SRAM Analysis (Write)

6T-SRAM — Layout

VDD

GND

QQ

WL

BLBL

M1 M3

M4M2

M5 M6

Decreasing Word Line Delay

Metal bypass

Polysilicon word lineK cells

Polysilicon word lineWL

Driver

(b) Using a metal bypass

(a) Driving the word line from both sides

Metal word line

WL

(c) Use silicides

Resistance-load SRAM Cell

Static power dissipation -- Want RL largeBit lines precharged to VDD to address tp problem

M3

RL RL

VDD

WL

Q Q

M1 M2

M4

BL BL

SRAM Characteristics

• Introduction

• Non-volatile memories

• RAM– SRAM– DRAM

3-Transistor DRAM Cell

No constraints on device ratiosReads are non-destructiveValue stored at node X when writing a “1” = VWWL-VTn

WWL

BL1

M1 X

M3

M2

CS

BL2

RWL

VDD

VDD 2VT

DVVDD 2VTBL2

BL1

X

RWL

WWL

3T-DRAM — Layout

BL2 BL1 GND

RWL

WWL

M3

M2

M1

1-Transistor DRAM Cell

Write: CS is charged or discharged by asserting WL and BL.Read: Charge redistribution takes places between bit line and storage capacitance

Voltage swing is small; typically around 250 mV.

M1

CS

WL

BL

CBL

VDD2 VT

WL

X

sensing

BL

GND

Write 1 Read 1

VDD

VDD /2 V

V BL VPRE– VBIT VPRE–CS

CS CBL+------------= =V

DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells.The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD

Sense Amp Operation

DV(1)

V(1)

V(0)

t

VPRE

VBL

Sense amp activatedWord line activated

1-T DRAM Cell

Uses Polysilicon-Diffusion Capacitance

Expensive in Area

M1 wordline

Diffusedbit line

Polysilicongate

Polysiliconplate

Capacitor

Cross-section Layout

Metal word line

Poly

SiO2

Field Oxiden+ n+

Inversion layerinduced byplate bias

Poly

SEM of poly-diffusion capacitor 1T-DRAM

Advanced 1T DRAM Cells

Cell Plate Si

Capacitor Insulator

Storage Node Poly

2nd Field Oxide

Refilling Poly

Si Substrate

Trench Cell Stacked-capacitor Cell

Capacitor dielectric layerCell plateWord line

Insulating Layer

IsolationTransfer gate

Storage electrode

GND

Static CAM Memory Cell

CAM

Bit

Word

Bit

••• CAM

Bit Bit

CAM

Word

Wired-NOR Match Line

Match M1

M2

M7M6

M4 M5M8 M9

M3int

SWord

••• CAM

Bit Bit

S

CAM in Cache Memory

CAM

ARRAY

Input Drivers

Tag HitAddress

SRAM

ARRAY

Sense Amps / Input Drivers

DataR/W

• Introduction

• Non-volatile memories

• RAM

• Periphery circuits

Periphery

Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry

Row DecodersCollection of 2M complex logic gatesOrganized in regular and dense fashion

(N)AND Decoder

NOR Decoder

Hierarchical Decoders

• • •

• • •

A2A2

A 2A3

WL 0

A2A3A2A 3A2A3

A3 A3A 0A0

A0A 1A 0A1A0A1A0A1

A 1 A1

WL 1

Multi-stage implementation improves performance

NAND decoder usingNAND decoder using2-input pre-decoders2-input pre-decoders

Dynamic Decoders

Precharge devices

VDD

GND

WL3

WL2

WL1

WL0

A0A0

GND

A1A1

WL3

A0A0 A1A1

WL 2

WL 1

WL 0

VDD

VDD

VDD

VDD

2-input NOR decoder 2-input NAND decoder

Active low inputs (all are high except for the selected WL which is low)

4-input pass-transistor based column decoder

Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal pathDisadvantage: Large transistor count

A0S0

BL 0 BL 1 BL 2 BL 3

A1

S1

S2

S3

D

4-to-1 tree based column decoder

Number of devices drastically reducedDelay increases quadratically with # of sections; prohibitive for large decoders

buffersprogressive sizingcombination of tree and pass transistor approaches

Solutions:

BL 0 BL 1 BL 2 BL 3

D

A 0

A 0

A1

A 1

Decoder for circular shift-register

VDD

VDD

R

WL0

VDD

f

ff

f

VDD

R

WL1

VDD

f

ff

f

VDD

R

WL2

VDD

f

ff

f• • •

Sense Amplifiers

tpC V

Iav----------------=

make V as smallas possible

smalllarge

Idea: Use Sense Amplifer

outputinput

s.a.smalltransition

Differential Sense Amplifier

Directly applicable toSRAMs

M4

M1

M5

M3

M2

VDD

bitbit

SE

Outy

Differential Sensing ― SRAMVDD

VDD

VDD

VDD

BL

EQ

Diff.SenseAmp

(a) SRAM sensing scheme (b) two stage differential amplifier

SRAM cell i

WL i

2xx

VDD

Output

BL

PC

M3

M1

M5

M2

M4

x

SE

SE

SE

Output

SE

x2x 2x

Latch-Based Sense Amplifier (DRAM)

Initialized in its meta-stable point with EQOnce adequate voltage gap created, sense amp enabled with SEPositive feedback quickly forces output to a stable operating point.

EQ

VDD

BL BL

SE

SE

Charge-Redistribution Amplifier

0.5

1.0

1.5

2.0

2.5

0.0

0.0 1.00 2.00time (nsec)

V

Vin

Vref5 3V

VL

VS

(b) Transient response

3.00

Concept

M2 M3

M1VL VS

Vref

CsmallClarge

Transient Response

Charge-Redistribution Amplifier―EPROM

SE

VDD

WLC

Load

Cascodedevice

Columndecoder

EPROMarray

BL

WL

Vcasc

Out

Cout

Ccol

CBLM1

M2

M3

M4

Single-to-Differential Conversion

How to make a good Vref?

Diff.S.A.Cell

2xx

Output

WL

Vref

BL

12

Open bitline architecture with dummy cells

CS CS CS CS

BLL

L L1 L0 R0

CS

R1

CS

L

… …

BLR

VDD

SE

SE

EQ

Dummy cell Dummy cell

DRAM Read Process with Dummy Cell

3

2

1

00 1 2 3

BL

BL

t (ns)

reading 03

2

1

00 1 2 3

SE

EQ WL

t (ns)

control signals

3

2

1

00 1 2 3

BL

BL

t (ns)

reading 1

Voltage Regulator

-

+

VDD

VREF

Vbias

Mdrive

Mdrive

VDL

VDL

VREF

Equivalent Model

Charge Pump

CLK

VDD

A BM1

M2Vload

Cload

Cpump

2VDD 2 VT

VDD 2 VT

0 V

VB

Vload

0 V

Q=Cpump (VDD-Vt)

-

-

DRAM Timing

SDRAM Timing

A chunk of data is processed at the same time effective when data is written in large sequential blocks

RDRAM Architecture

memoryarray

Databus

Clocks

Column

Rowdemux packet dec.

packet dec.

Bus

k kx l

demux

Rambus DRAM to reduce the access time

Synch. DRAM

Operates at uP clock speed

up to 1.6 GB/sec bandwidth

Highly parallel: A large number of bits can be read/write at the same time interface ; fast and synch

Address Transition Detection

DELAYtdA0

DELAYtdA1

DELAYtdAN2 1

VDD

ATD ATD

• Introduction

• Non-volatile memories

• RAM

• Periphery

• Reliability

Reliability and Yield

Sensing Parameters in DRAM

From [Itoh01]

4K

10

100

1000

64K 1M 16M256M 4G 64GMemory Capacity (bits/chip)

CD(1F)

CS(1F)

QS(1C)

Vsmax(mv)

VDD(V)

QS= CS VDD/2Vsmax= QS/(CS 1 CD)

Noise Sources in 1T DRam

Ccross

electrode

a-particles

leakage CS

WL

BL substrate Adjacent BL

CWBL

Open Bit-line Architecture —Cross Coupling

SenseAmplifierC

WL1

BL

CBL

CWBL CWBL

CC

WL0

CCBL

C C

WLD WLD WL0 WL1

BL

EQ

Folded-Bitline Architecture

SenseAmplifier

C

WL1

CWBL

CWBL

C

WL0 WL0 WLD

CC

WL1

CC

WLD

BL CBL

BL CBL

EQ

x

x

y

Transposed-Bitline Architecture

SA

Ccross

(a) Straightforward bit-line routing

(b) Transposed bit-line architecture

BL9

BL

BL

BL99

SA

Ccross

BL9

BL

BL

BL99

Alpha-particles (or Neutrons)

1 Particle ~ 1 Million Carriers

WL

BL

VDD

n1

a-particle

SiO21

111

11

22

22

22

Yield

Yield curves at different stages of process maturity(from [Veendrick92])

Redundancy

MemoryArray

Column Decoder

Redundantrows

Redundantcolumns

RowAddress

ColumnAddress

FuseBank:

Error-Correcting Codes

Example: Hamming Codes

with

e.g. B3 Wrong

1

1

0

= 3

Redundancy and Error Correction

Sources of Power Dissipation in Memories

PERIPHERY

ROWDEC

selected

non-selected

CHIP

COLUMN DEC

nCDEV INTf

mCDEV INTf

CPTV INTf

IDCP

ARRAY

m

n

m(n- 1)ihld

miact

VDD

VSS

IDD = ΣCiΔV if+Σ IDCP

From [Itoh00]

Data Retention in SRAM1.30u

1.10u

900n

700n

500n

300n

100n

0.00 .600 1.20 1.80

Factor 7

0.13 m CMOSm

0.18 m CMOSm

VDD

I lea

kag

e

SRAM leakage increases with technology scaling

Suppressing Leakage in SRAM

SRAMcell

SRAMcell

SRAMcell

VDD,int

VDD

VDD VDDL

VSS,int

sleep

sleep

SRAMcell

SRAMcell

SRAMcell

VDD,int

sleep

low-threshold transistor

Reducing the supply voltageReducing the supply voltageInserting Extra ResistanceInserting Extra Resistance

Data Retention in DRAM101

100

102 1

102 2

102 3

102 4

102 5

102 6

15M 64M 255M 1G 4G 15G 64G

Capacity (bit)

Curr

ent

(A)

3.3 2.5 2.0 1.5 1.2 1.0 0.8

Operating voltage (V)

0.53 0.40 0.32 0.24 0.19 0.16 0.13

Extrapolated threshold voltage at 25 C (V)

IACT

IAC

IDC

Cycle time : 150 nsT 5 75 C,S

From [Itoh00]

Case Studies

• SRAM

• Flash Memory

4 Mbit SRAMHierarchical Word-line Architecture

Global word line

Sub-global word line

Block groupselect

Blockselect

Blockselect

Memory cell

Localword line

Block 0

•••

Localword line

Block 1

•••

Block 2...

•••

Bit-line Circuitry

Bit-lineload

Blockselect ATD

BEQ

Local WL

Memory cell

I/O lineI/O

B/T

CD

Sense amplifier

CD CD

I/O

B/T

Sense Amplifier (and Waveforms)

BS

I /O I /O

DATA

Blockselect ATD

BSSA SA

BS

SEQ

SEQ

SEQ

SEQSEQ

Dei

I/O Lines

Address

Data-cut

ATD

BEQ

SEQ

DATA

Vdd

GND

SA, SA

Vdd

GND

1 Gbit Flash Memory

Sense Latches(10241 32)3 8

Data Caches(10241 32)3 8

Sense Latches(10241 32)3 8

Data Caches(10241 32)3 8

Wo

rd L

ine

Dri

ver

Wo

rd L

ine

Dri

ver

Wo

rd L

ine

Dri

ver

Wo

rd L

ine

Dri

ver

512Mb Memory Array 512Mb Memory Array

BL0 BL1 ····· BL16895 BL16996 BL16897··· BL33791

SGDWL31

WL0SGS

Block0

BLT0

Block1023

Block0

Block1023

Bit Line Control CircuitBLT1

I/O

From [Nakamura02]

Writing Flash MemoryN

um

be

r of

me

mo

ry c

ell

s

0V 1V 2V

Vt of memory cells

Verify level5 0.8 V Word-line level5 4.5 V

(a)

3V 4V

Result of 4 timesprogram

100

0V 1V 2V

Vt of memory cells

3V 4V

102

104

106

108

Evolution of thresholds Final Distribution

From [Nakamura02]

125mm2 1Gbit NAND Flash Memory

10.7

mm

11.7mm

2kB

Pa

ge b

uffe

r &

ca

che

Ch

arg

e p

ump

16896 bit lines

32 word lines x 1024 blocks

From [Nakamura02]

125mm2 1Gbit NAND Flash Memory

• Technology 0.13m p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al• Cell size 0.077m2• Chip size 125.2mm2• Organization 2112 x 8b x 64 page x 1k block• Power supply 2.7V-3.6V• Cycle time 50ns• Read time   25s• Program time 200s / page• Erase time 2ms / block

• Technology 0.13m p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al• Cell size 0.077m2• Chip size 125.2mm2• Organization 2112 x 8b x 64 page x 1k block• Power supply 2.7V-3.6V• Cycle time 50ns• Read time   25s• Program time 200s / page• Erase time 2ms / block

From [Nakamura02]

Semiconductor Memory Trends(up to the 90’s)

Memory Size as a function of time: x 4 every three years

Semiconductor Memory Trends(updated)

From [Itoh01]

Trends in Memory Cell Area

From [Itoh01]

Future generations

• Very specialized technologies for stand alone memories expensive

• Reliability is going to be a very important issue (SER) particularly for SRAMs and DRAMs

• Power is going to be the limiting factor particularly when it comes to standby currents

• Embedded memories is the prominent market thrust driven by all mobile/SoC applications

top related