Top Banner
Architecture and Circuits for Dependable 3D-VLSI Mitsumasa Koyanagi New Industry Creation Hatchery Center Tohoku University Sendai, Japan JST CREST/DVLSI
18

Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Apr 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Architecture and Circuits for

Dependable 3D-VLSI

Mitsumasa Koyanagi

New Industry Creation Hatchery Center

Tohoku University

Sendai, Japan

JST CREST/DVLSI

Page 2: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Ivo Bolsens (Xilinx CTO), 3D-Architectures for Semiconductor Integration

and Packaging, San Francisco, December 12-14, 2011

Why 3D ?

Page 3: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

3D-eDRAM for L3 Cache (IBM)

Hybrid Memory Cube (HMC)

(Micron+Samsung+IBM)

2.5D and 3D FPGA (Xilinx)

3D System Integration (IBM)

3D DRAM

(Samsung)

3D DRAM (Elpida)

Page 4: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Dependability Related Concerns in 3D VLSI

Heat accumulation and heat removal

Influences of mechanical stress

Metal impurity contamination

Reliabilities of TSV’s and metal microbumps

Design methodology and design tools

Testing and test design

Dependability is Key Issue in 3D VLSI !

Architecture and Circuits for Dependable 3D-VLSI

Page 5: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

100FIT

Random

Hardware

Failures

Management of

Design Process

etc.

Dangerous Failure

ASIL=C

Failure Rate Allocation to

Components Considering

System Structure

Dependability

Requirement

80FIT Test Coverage for Single-

Point Failures:97% Test Coverage for Combined

Failures:80%

1TFPLOS / 5W

Performance=

Marketability・Safety

Requirements by Application

Device Scaling

3D Integration

Limitation in increasing Test Coverage

Increase of Dynamic Failures

Fault Detection by Multi- Modular Redundant

Random Test

Self-Repair

Redundancy

Demand

Utilization

Realization

Necessity

Cons. Demand

3D VLSI

Failure

<Relation between Requirements by Application and Various Technologies>

Target : ISO 26262 ASIL=C

Application of 3D VLSI to Image Processing VLSI for Automobile

Performance

Requirement

Hazard-causing

Systematic Failure

After T. Kamada (Denso)

Page 6: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Architecture of 3D DVLSI System

Interconnect

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Ch

eck

po

inti

ng

System Level SVP Hardware Level SVP

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Error R

ecovery

System Level SVP

Interconnect

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Ch

eck

po

inti

ng

System Level SVP Hardware Level SVP

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Error R

ecovery

System Level SVP

Interconnect

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

System Level SVP Hardware Level SVP

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Ch

eckp

oin

ting

System Level SVP

Interconnect

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

System Level SVP Hardware Level SVP

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD (Vector)

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

System Level SVP

Interconnect

Wide SIMD

CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD CORE

GPP CORE

Many-Banked LM

Cache

Global Memory Ch

eck

- p

oin

tin

g

Erro

r R

eco

very

System Level SVP Hardware Level SVP

Wide SIMD CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

Wide SIMD CORE

GPP CORE

Many-Banked LM

Cache

Global Memory

System Level SVP Hardware Level SVP

Ch

eck

- p

oin

tin

g

Erro

r R

eco

very

After Prof. H. Kobayashi (Tohoku Univ.)

Page 7: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

The Measure for A Reliability Target

MBIST

LBIST

JTAG

Memory

Logic

TSV

Static

failure

Dynamic

failure

Insufficient in a

verification condition

Soft error

Migration

Hot spot

Inspection vector lack

The classification of the error factor

The classification of the

inspection technique

Multi-

plexing

Majority

(New) SVP

The dynamic error case in a car

-Since a power supply variation goes into the place

where connection deteriorated ,then a circuit malfunctions.

- A timing error occurs in a hot spot.

The control cycle of a display system is usually 16mS.

If an operation support system also applies to this,

Then it recovers within 16mS is no problem.

It corresponds by the multiplexing majority.

- Throughput in 16mS

- Reliability of a majority circuit

Majority

It is assisted

by run time execution.

・Fault converge 97%

・80FIT

The case

where failure is

overlooked.

The case

which failure

cannot relieve.

insufficient throughput

insufficient spare circuit

The double failure

which induces the same result occurs.

A majority circuit

breaks down to the danger side.

Quantitive analysis is needed.

It comes back to the reliability

of a majority circuit,

and throughput/parallelism.

After T. Kamada (Denso)

Page 8: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Test Architecture for 3D VLSI with Redundant Tiers

System-Level SVP

Computing Core 1

Computing Core m

Redundant Core 1

Redundant Core (n-m-1)

Hardware-Level SVP (FPGA)

System Configuration, Test System-Level SVP

Task Allocation, Online Self-Test and Repair Control

Test: BIST controlled through TAP (JTAG) I/F Repair: Replacement by another Redundant Core

Ver

tica

l Co

mm

on

Bu

s

Health Info., Logging

Vertically Stacked and Electrically Connected

by Through-Silicon Vias (TSVs) using 3D Integration Technology.

Repair by replacing the failed core with redundant core

Page 9: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Block Diagram of 3D DVLSI (e.g. 4tiers)

System Bus

PB Bridge

Processor Core

On-Line Self-Test

Controller

Stacked Shared

Memory

Vertical Bus

Bridge

Memory Controller

System Bus

PB Bridge

Processor Core

On-Line Self-Test

Controller

Stacked Shared

Memory

Vertical Bus

Bridge

Memory Controller

disabled

System Bus

PB Bridge

Processor Core

On-Line Self-Test

Controller

Stacked Shared

Memory

Vertical Bus

Bridge

Memory Controller

disabled

Tier 3

Tier 2

Tier 1

Tier 0 Exte

rnal

M

em

ory

Vertical Bus using TSVs

System Bus

PB Bridge

Processor Core

On-Line Self-Test

Controller

Stacked Shared

Memory

Vertical Bus

Bridge

Memory Controller

disabled

3D Test Access Port (3D TAP)

Page 10: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Self-Test Control by System-Level SVP in 3D Dependable VLSI System

• SVP (Supervisor Processor) controls TAP and Chain in the stacked 3D dependable LSI – Drives TCK, TMS, TRST, TDI – Read TDO to get test data

registers in the stacked dice

Assuming TAP signals are connected by quadruple TSV that has much higher reliability than single TSV

Tier Core

Tier

TA

P

intra-tier scan chain

chai

n r

etu

rn p

ath

Tier Core

Tier

TA

P

intra-tier scan chain

chai

n r

etu

rn p

ath

Tier Core

Tier

TA

P

intra-tier scan chain

chai

n r

etu

rn p

ath

Tier Core

Tier

TA

P

intra-tier scan chain

chai

n r

etu

rn p

ath

TDI L

TDO

L

TCK

L

TMS L

TRST

L

Top Tier

Tier i+1

Tier i

Bottom Tier

TAP Control Line JTAG Scan Chain

Page 11: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

3D DfT Architecture

Functional Design

• Stacked Dies, Core-Based • Inter-Connect: TSVs • Extra-Connect: Pins

Existing Design-for-Test

• Core: Internal Scan, TDC, LBIST, MBIST; IEEE 1149.1 wrappers, TAPC • Stack Product: IEEE Std 1149.1

3D-DfT Architecture - Test Wrapper per Die

• Based on IEEE 1149.1 • Two Entry/Exit Points per Die: - Pre-Bond : Extra Probe Pads - Post-Bond: Extra TSVs

Page 12: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Ps

eu

do

-Ran

do

m

Pa

tte

rn G

en

era

tor

MIS

R/ C

om

pa

rato

r

Die 1

(Bottom)

Ps

eu

do

-Ran

do

m

Pa

tte

rn G

en

era

tor

MIS

R/ C

om

pa

rato

r

Ps

eu

do

-Ran

do

m

Pa

tte

rn G

en

era

tor

MIS

R/ C

om

pa

rato

r

Sys. SVP

(Top)

Sys. SVP

Die n

TDI TDO TDI TDO

HW SVP

New 3D VLSI DfT Architecture for Online Self-Test of Dies Based on IEEE 1149.1.

Page 13: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Tier BIST Dynamically Controlled by System-Level SVP

Memoryグループ

collar

controller

TAPC

Soft-IP(core)

hph05shxe100(core)

hph05shxeva2n0

scan_chain

Soft-IP(ext)

CPG/PLL

STC-DIV

PR

PG

MIS

R

IR DR

CO

MP

AR

E

hph05shxeva2n0(SVP)

TDI,TMS,TCK

TSV-scan

各moduleへクロックを供給

STC-OCC

scan_chain

scan_chain

scan_chain

scan_chain

scan_chain

scan_chain

scan_chain

最大F/F段数 1500

chai

n数

32本

Memoryグループ

collar

controller

Memoryグループ

collar

controller

Memoryグループ

collar

controller

Memoryグループ

collar

controller

Memoryグループ

collar

controller

LBIST/PRPG LBIST/MISR

LBIST/Comparator

Memory+MBIST Controller

TAP Controller

Scan Chains

TSV Test/Repair Circuit Tier

Max chain length = 1500

32

sca

n c

hai

ns

General Purpose Processor

Clock Controller

from/to Sys-SVP

Peripherals

Page 14: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Redundancy for Through Si Vias (TSVs)

No Repair Multiplexed TSV With TSV Repair

m: multiplicity n signals : r redundant TSVs

2 4 4:2 16:4

Area +0% +100% +300% +50% +25%

Capacity +0% +100% +300% +0% +0%

Switches/Sig 0 0 0 3 5

TSV Group Yield (n TSVs)

RTSVn 1 − 1 − RTSV

m n n+ r

iRTSV

i 1 − RTSVn+r−i

n+r

i=n

2,000 1.9 × 10−7% 81.87% 99.99% 99.03% 99.98%

5,000 1.5 × 10−20% 60.65% 99.99% 97.59% 99.96%

10,000 2.2 × 10−42% 36.79% 99.99% 95.23% 99.91%

20,000 5.1 × 10−86% 13.53% 99.98% 90.69% 99.83%

Ass

em

bly

Y

ield

*

*assumed RTSV = 0.99

Samsung (ISSCC 2009) This Work

Page 15: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Approach for HW-SVP

Establishing a self-repair scheme for HW-SVP

Soft-error recovery using dynamic reconfiguration

Designing recovery controller and Scrubbing controller

Designing fault-tolerant system using TMR scheme

Hard-error avoidance using partial reconfiguration

Relocating partial reconfiguration bitstream (PRB)

Designing TMR scheme with Spare

Self-repair scheme and Evaluation system

Developing Evaluation system to evaluate soft-error tolerability

TMR : Triple Modular Redundancy

After Prof. T. Sueyoshi (Kumamoto Univ.)

Page 16: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

System Configuration of HW-SVP

Triplicating processor core and peripheral modules

Implementing RM, RC and Spare

RM and RC control recovery sequence

Spare is used for hard-error avoidance

RC : Recovery Controller

Plasma Plasma

Plasma

Spare

Selector + Voter + Detector

Memory (ECC protected)

メモリコントローラ

UART

メモリコントローラ

UART

Memory controller

UART RC ICAP

FrameECC

Memory

ICAP : Internal Configuration Access Port

RM

RM : Recovery Module

Implemented on:Xilinx Virtex-6 XC6VLX240T

After Prof. T. Sueyoshi (Kumamoto Univ.)

Page 17: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

Soft-Error Recovery in HW-SVP

Readback and Overwrite reconfiguration (Scrubbing)

ICAP : Internal Configuration Access Port

* Frame : Minimum unit of reconfiguration (1 frame = 2,592bit on Virtex-6)

Readback and error detect

ICAP

Frame ECC ・・・

FPGA

・・・ ・・・ ・・・

(3) Repair readback data

(4) Overwrite same frame

Reconfigure to correct error

Error detected

Apply these sequence for all frame

ICAP

Frame ECC

Frame*

・・・

FPGA

・・・ ・・・ ・・・

(2) Create syndrome

(1) Readback configuration data

After Prof. T. Sueyoshi (Kumamoto Univ.)

Page 18: Architecture and Circuits for Dependable 3D-VLSI · 2012-06-05  · Dependability Related Concerns in 3D VLSI Heat accumulation and heat removal Influences of mechanical stress Metal

ICAP

Hard-Error Recovery in HW-SVP Relocate PRB and separate a broken module

・・・ ・・・ ・・・

FPGA

・・・ ・・・ ・・・

Module_0

Module_1

Module_2

Spare

Hard error

Selector

Voter

Implementing a copy of Module on Spare to reconstruct TMR configuration

* This is realized by uniforming inner configuration of PR region (reported on Dec. 2011)

Readback

Reconfiguration

PRB relocation *

After Prof. T. Sueyoshi (Kumamoto Univ.)