Top Banner
Reliability as a challenge & opportunity to technology scaling Jose Maiz Jose Maiz Fellow, Technology and Manufacturing Group Director of Logic Technology Quality & Reliability 2007 Salishan HPC Conference April 26, 2007
34

Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

May 20, 2018

Download

Documents

dinhcong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Reliability as a challenge & opportunity to technology

scaling

Jose MaizJose MaizFellow, Technology and Manufacturing GroupDirector of Logic Technology Quality & Reliability

2007 Salishan HPC ConferenceApril 26, 2007

Page 2: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 2

Key MessagesKey MessagesTechnology scaling continues according to MooreTechnology scaling continues according to Moore’’s Laws Law

–– 2X increase in functionality every 2 years2X increase in functionality every 2 years–– In the form of cores, integrated functionality or bothIn the form of cores, integrated functionality or both–– 65nm in 2005, 45nm 2007, 32nm 200965nm in 2005, 45nm 2007, 32nm 2009

Technology & Reliability Challenges are many, but so are Technology & Reliability Challenges are many, but so are the opportunitiesthe opportunities

–– Many new device types and materialsMany new device types and materials–– A challenge as well as an opportunityA challenge as well as an opportunity

High RAS will require global fault management strategies High RAS will require global fault management strategies along with robust circuit designalong with robust circuit design

–– Better understanding needed on RAS requirementsBetter understanding needed on RAS requirements–– Research and Cost effectiveness of proposed options Research and Cost effectiveness of proposed options

Page 3: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 3

Technology scaling on a 2 Year Cycle180 nm 130 nm 90 nm 65 nm 45 nm

1999 2001 2003 2005 2007

200mm100nm LG

CoSi2

6 AlSiOF

300mm70nm LG

CoSi2

6 CuSiOF

SiGeSiGe

300mm50nm LG

NiSiStrain Si

7 CuLow-k

300mm35nm LG

NiSi2nd Strain

8 CuLow-k

Details Coming!

Transistor

Interconnect

Courtesy: Intel

Page 4: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 4

Moore’s Law Delivers Value to the End User

With permission from IC Knowledge LLC

Twice the functionality at the same cost every 2 years

Page 5: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 5

Performance/ Watt

Performance/ watt improvement for both Performance/ watt improvement for both integer and Floating pointinteger and Floating point

Trend in Performance/ Watt relative to i386

1

10

100

1000

1985

1988

1992

1994

1996

1998

2000

2002

2004

2006

2006

2006

Year

MIP

S/ W

att (

Rel

ativ

e) Int MIPS/WFP MIPS/W

i386

i486 P5 P6 PI

I

P4

Core® 21/2Core

PIII

Page 6: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 6

Lead in 45nm Technology and ProductsLead in 45nm Technology and Products

153 Mbit SRAM0.346 μm2 cell

119 mm2 chip size>1B transistors

Functional in Jan 2006

Intel® PenrynCoreTM2 family processor

410/820 M transistors (2C/4C)World’s first working 45 nm CPU

Page 7: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 7

45 nm Yield Improvement Trend45 nm Yield Improvement Trend

Excellent Yield learning and good reliability tooOn track for production ramp in 2H ‘07

2 years

2000 2001 2002 2003 2004 2005 2006 2007 2008

Defect density (log scale)0.13 um 90 nm 65 nm 45 nm

Page 8: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 8

SRAM Cell Size TrendSRAM Cell Size Trend

The trend continues

M. BohrIntel

45nm 6-transistor SRAM

cell area of 0.346 mm2

Page 9: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 9

Gate Leakage scaling

0.01

0.1

1

10

100

1000

1 1.2 1.4 1.6 1.8 2 2.2

Electrical Tox (nm)

Gat

e le

akag

e A

/cm

2)

SiO2HK

HighHigh--k + Metal Gate Transistorsk + Metal Gate Transistors

Low Resistance Layer

Integrated 45 nm CMOS process

High performance

Low leakage

Meets reliability requirements

Manufacturablein high volume

High-k Dielectric Hafnium based

Silicon Substrate

Work Function MetalDifferent for NMOS and PMOS

Cooler chips>100x reduction

Gate Leakage

Faster transistors60% greaterCapacitance

BenefitHigh-k vs. SiO2

Page 10: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 10

Very High Innovation RateVery High Innovation Rate

Platform Platform IntegrationIntegration

Chip Chip ArchitectureArchitecture

Transistor Transistor ArchitectureArchitecture

MaterialsMaterials

Power & form factor optimizationPower & form factor optimization

Efficient Performance/Power with Efficient Performance/Power with CoreTM2MultiCoreMultiCoreMonolythicMonolythic integration of Graphics, integration of Graphics, MemMem. Controller etc.. Controller etc.

Novel transistor architectures for Novel transistor architectures for HighKHighK--MGMGTriGateTriGate XtorsXtors and IIIand III--V integration in the future V integration in the future

HighKHighK--MG MG XtorsXtors for performance & Low Powerfor performance & Low PowerLowKLowK ILDsILDs for interconnectfor interconnectNovel materials for strain and electrical Novel materials for strain and electrical PformancePformance

Page 11: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 11

Many Reliability ChallengesMany Reliability ChallengesIncreased Electric FieldsThe shrinking Vmax-Vmin windowThe development of robust High K/ Metal Gate transistorsDimensional scaling of interconnects and their linersThermo-Mechanical limitations of very LowK ILDsSoft ErrorsDefectivity with scaled technologyTransient and intermittent errorsFault tolerance

Innovation needed more than ever

Page 12: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 12

Gate Dielectric Field Trend

Gate Dielectric Electric Field

0

2

4

6

8

10

1000 700 500 350 250 180 130 90 65

Technology node (nm)

Efie

ld(M

V/c

m)

SiON

SiON w HK

HK

45

Substantial increases in Efield enabled by HK/MG

HK+MG Transistor

Silicon substrate

S D

Low R Layer

High-KMetal Gate

Page 13: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 13

VVmaxmax--VVminmin window challengewindow challenge

Vmax :Scailing for density, performance & powerVmin :Transistor variability & increase bit count

Vmax-Vmin trend

180 130 90 65 45 32 25

Technology node (nm)

Vol

tage

(V)

Vmax

SRAM Cell type 1

SRAM Cell type 2

Shrinking Operating Window

Page 14: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 14

05

10152025303540

0 5 10 15 201/sqrt(Gox Area) in um-1

Sig(

time0

VT)

in m

VP1264 PMOSP1264 NMOSP1266 PMOS

P1264 LVP1266 LV

P1268 LV

Dec-19-2006, S. Pae

R94 UF

0 5 10 15 201/sqrt(Gox Area) in um-1

Vtst

anda

rd d

evia

tion

65nm45nm NMOS45nm PMOS

Technology Scaling

• Due to random dopant fluctuation and other process parameters• Develop design techniques that can handle variability

Transistor variability impacts Vmin

Page 15: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 15

VVminmin impacted by bit countimpacted by bit count

Impacted by variability and Impacted by variability and defectivitydefectivity

Improved process and cell upsizing helpsImproved process and cell upsizing helpsNeed Robust manufacturing process now and Fault

Tolerance techniques in the future

Defective bits vs Voltage

1

10

100

1000

Vcc

# D

efec

tive

bits

6 M

Byt

eca

che

# bad bits

100 mV

Increased Bit count

Upsizing & improved

technology

Page 16: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 16

Transistor degrades during use

Slow but continuous processAddressed by variation tolerant design and Frequency Guardbands at test

10

100

Log (time)

PMO

S N

BTI

ΔVT

[mV]

n ~ 0.20

Page 17: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 17

… and so does Product operating frequency

Test Test GuardbandsGuardbands used to eliminate customer used to eliminate customer impactimpact

% F

max

shift

18 48 168 336 500

Fmax degradation in Burn-in

Burn-in Time (hr)

End of life

Page 18: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 18

Transistor degradation

Process improvements are a must to counter the Process improvements are a must to counter the effect of increased E fieldseffect of increased E fields

PMOS NBTI degradation

0

20

40

60

80

E-field

Vt-s

hift

[ -m

V]

Process improvements

Efield increases with scaling

Page 19: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 19

Transistor architecture & materials are changing

Many new materialsMany new materials–– HighKHighK/Metal Gates for gate leakage control and performance /Metal Gates for gate leakage control and performance

scaling scaling Integration and reliability challengesIntegration and reliability challenges–– Low K Low K ILDsILDs ThermomechanicalThermomechanical risk may slow their risk may slow their

introductionintroduction–– Lead free BumpsLead free Bumps

Clever changes in planar transistorsClever changes in planar transistors–– Strain, Strain, epiaxialepiaxial Source/ Drain layers Source/ Drain layers

Novel Transistor architectures like triNovel Transistor architectures like tri--gategate

Exotic options explored: from Carbon Exotic options explored: from Carbon NanotubesNanotubes and and semiconductor semiconductor nanowiresnanowires to IIIto III--V compounds V compounds

Page 20: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 20

Tri-Gate Transistors

Gate

Source Drain

• Transistor gate wraps around 3 sides of Si channel (Tri-Gate)

• Transistor channel is “fully depleted”, unlike normal bulk CMOS

• Fully depleted operation reduces leakage current by up to 10x

Gate OxideSource DrainGate

Channel

Page 21: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 21

Increasing Electron Mobility

Increased electron mobility leads to higher performance and less energy consumption

505033338811InSbInSbInAsInAsGaAsGaAsSiSi

Compound SemiconductorsCompound Semiconductorsnn--MobilityMobility

The challenge is integrating them with Silicon and improving Hole mobility

Page 22: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 22

Scaling of the interconnectScaling of the interconnectCu Resistivity vs Cu Width (mid)

0.015

0.020

0.025

0.030

0.035

0.040

0.05 0.15 0.25 0.35 0.45 0.55 0.65

Copper CD (um)

ohm

-um

P1260 X1S

P1262 X3 Z145E99X

P1262 X3 Z217E330

193nm X3

248nm RELACS

X3 POR (Doug)

X3 ALD (Doug)

Series8

P1264

P1266Cu Resistivity vs Line Width

0.05 0.15 0.25 0.35 0.45 0.55 0.65Copper CD (um)

ohm

-μm

Source:Intel

Effective Effective resistivityresistivity increase due to:increase due to:–– Cross section reduction due to barriersCross section reduction due to barriers–– Increased scattering from grain boundaries and surfacesIncreased scattering from grain boundaries and surfaces

65nm

Tough but manageable challenges

Page 23: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 23

Single Event UpsetsTransient errors that corrupt data but do not Transient errors that corrupt data but do not produce permanent damage (on limited doses)produce permanent damage (on limited doses)–– Charge burst that overwhelms a storage nodeCharge burst that overwhelms a storage node

α α particles in materials and atmospheric particles in materials and atmospheric neutrons in terrestrial systemsneutrons in terrestrial systemsCosmic rays and heavy nuclei in spaceCosmic rays and heavy nuclei in space–– Orders of magnitude higher fluxes than at sea levelOrders of magnitude higher fluxes than at sea level

This is just one class of transient errorsThis is just one class of transient errors–– Others are noise related fails in the interconnect Others are noise related fails in the interconnect

fabric etc.fabric etc.

Page 24: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 24

Single Event Upsets: Cache cell

SEU errors due to neutrons from cosmic rays and SEU errors due to neutrons from cosmic rays and αα particles from particles from residual impurities residual impurities Reduction in charge collection dominates over reduction in critiReduction in charge collection dominates over reduction in critical cal chargecharge

@900m

Cache Cell SEU Trend

0180 130 90 65 45

Technology (nm)

SE

U/ B

it (a

.u)

α FIT

n FIT Terrestrial

Total FIT10

5

Page 25: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 25

Single event upsets: MultiSingle event upsets: Multi--bit failsbit fails

Multi-bit errors are increasing as a proportion of failsExpected consequence of increased charge sharing

1E-5

1E-3

1E-1

0.1 1 10Cell to cell distance (mm)

Mul

ti-B

it up

set P

roba

bilit

y

130 nm90 nm65 nm

Cache cells

Page 26: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 26

Single Event Upsets: Logic latchesSingle Event Upsets: Logic latches

Similar trend starting for latchesSimilar trend starting for latches45 nm results still preliminary45 nm results still preliminary

@900m

latch SEU Trend

0

1

2

180 130 90 65 45 32Technology (nm)

SE

U /L

atch

a.u

.) α FIT

n FIT terrestrial

Total FIT

Page 27: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 27

Single Event Upset: Chip impact

Saturation for cache arraysSaturation for cache arraysGetting there for logic. Perhaps in 32nmGetting there for logic. Perhaps in 32nm

SEU Trend

1

10

180 130 90 65 45 32Technology (nm)

SE

U N

orm

to 1

30nm

cache arrays

Add features to reduce error rates

logic2X increase in RAM cell & latch count

per generation

Page 28: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 28

Circuit contributions to SEU in a typical microprocessor

Static combinational

logic

Flip-flops

Residual Unprotected

memory

Page 29: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 29

Many protection options proposed

Adding Parity/ ECC to logic arrays and Register FilesAdding Parity/ ECC to logic arrays and Register FilesReplication of functional units or coresReplication of functional units or coresLockLock--step for cores or complete chipsstep for cores or complete chipsResidue CheckingResidue CheckingRedundant multithreadingRedundant multithreadingFingerprintingFingerprintingModified Scan latchesModified Scan latchesHardening of worst contributing latchesHardening of worst contributing latches

What is the goal that we are trying to meet?What is the value proposition for HPC?

Page 30: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 30

Recap of fail types & trends

Improved estimation methodologyHardening of critical elementsArchitectural fault tolerance (1)

Increasing to flat

Transient (Ionizing Rad)

Continued Process improvementsGuard-bandsArchitectural fault tolerance (1)

IncreasingTransient (noise)

Continued process improvementArchitectural fault tolerance(2)

??Intermittent

Continued Process improvementsGuard-bandsArchitectural fault tolerance

IncreasingParametric degradation

Continued Process improvementsArchitectural fault tolerance(2)

FlatHard FailsSolution spaceTrendFail type

(1) Such as EDAC: (local circuit, component or system level)(2) Requires self-diagnostics and redundancy

Page 31: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 31

Fault Tolerance trends/ needsConservatism in technology to eliminate errors to Many Sigma will cut into performance

Fault Tolerant schemes will allow few errors to occur by providing the means to detect and correct

– Minimal to no impact to the customer

The continuation of Moore’s Law makes transistor availability plentiful and enables a much broader thinking in Fault Tolerance

– Local Circuit and functional circuit block level– Multi/ Mary core availability– Complement hardware /chip strategies with Platform system

strategies

Page 32: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 32

Help needed from HPC experts Help needed from HPC experts

What are the RAS Requirements for various What are the RAS Requirements for various categories and uses of HPC?categories and uses of HPC?–– Are there agreed targets that can guide us?Are there agreed targets that can guide us?–– Can a $ value be assigned to them?Can a $ value be assigned to them?

How can System architecture and Software How can System architecture and Software help and complement the effort at the help and complement the effort at the component level?component level?

Page 33: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 33

Key MessagesKey MessagesTechnology scaling continues according to MooreTechnology scaling continues according to Moore’’s Laws Law

–– 2X increase in functionality every 2 years2X increase in functionality every 2 years–– In the form of cores, integrated functionality or bothIn the form of cores, integrated functionality or both–– 65nm in 2005, 45nm 2007, 32nm 200965nm in 2005, 45nm 2007, 32nm 2009

Technology & Reliability Challenges are many, but so are Technology & Reliability Challenges are many, but so are the opportunitiesthe opportunities

–– Many new device types and materialsMany new device types and materials–– A challenge as well as an opportunityA challenge as well as an opportunity

High RAS will require global fault management strategies High RAS will require global fault management strategies along with robust circuit designalong with robust circuit design

–– Better understanding needed on RAS requirementsBetter understanding needed on RAS requirements–– Research and Cost effectiveness of proposed options Research and Cost effectiveness of proposed options

Page 34: Reliability as a challenge & opportunity to … Salishan Maiz.pdfReliability as a challenge & opportunity to technology scaling ... integrated functionality or both ... Tri-Gate Transistors

Salishan HPC 2007, J. Maiz 34