Page 1
Integrated Test Data Compression and Core Wrapper Design for Low-Cost System-on-a-Chip Testing
Paul Theo Gonciari
Bashir Al-Hashimi
Electronic Systems Design
Group
University of
Southampton, UK
Nicola Nicolici
Electrical and Computer
Engineering
McMaster University,
Canada
Page 2
Overview
• Low-cost system-on-a-chip test
• Single vs. multiple scan chains compression
• Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
• Experimental results
• Conclusion
Page 3
Low-cost SOC test
• Problems– High volume of test data– Increased chip/ATE frequency ratio– Increased chip/ATE pin number ratio– Increased scan-power dissipation
High ATE costs and yield loss
Page 4
Low-cost SOC test
• Solutions– Test data reduction– Reuse existing ATE technology– Exploit chip/ATE frequency ratio– Reduce pin count testing (RPCT)– Scan chain partitioning
Page 5
TAM add-on architecture
Core Core
SOC
Low-cost solution for core based SOC test
TAM add-on
Page 6
Overview
• Low-cost system-on-a-chip test
Single vs. multiple scan chains compression
• Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
• Experimental results
• Conclusion
Page 7
Single scan chain TDC
si
so
Core
sync
AT
E H
ead
decoder
5 FF
SISR
counter
SOC
Page 8
Single scan chain TDC (cont)
• Exploit test set regularities (e.g., runs of 0s)
• Based on coding schemes
• Exploit frequency ratio
• Synchronization overhead – temporal deserialization [Gonciari, ETW02]
– External clock synchronization– FIFO like structures
• High scan power due to the long scan chain
Page 9
Multiple scan chain TDC
SIS
R
scan chain
scan chainscan chain
Core
WSC
XORNetwork
Core
scan chain
data in
ctrl
Page 10
Multiple scan chain TDC (cont)• Exploit care bits sparseness
• Uses XOR based spreading networks
• Temporal pattern lockout– Extra control line– Doubles the volume of test data– Influences test application time
• Structural Pattern lockout – can influence fault coverage
• High scan power due to driving of all scan chains
Extend single scan chain TDC to multiple scan chains
Page 11
Extend single scan chain TDC …
Use one decoder and shift register [Chandra, DATE02]
de
co
de
r
sh
ift reg
iste
r
scan chain
scan chain
scan chainCore
Page 12
Use one decoder and shift register
• Loosened the ATE timing constraint – Exploitation of frequency ratio
• Reduce peek scan-power – Shift register buffering
• Synchronization overhead
• Decrease in compression ratio
– Unbalanced scan chains
– Test set rotation
Page 13
Extend single scan chain TDC … (cont)Use one decoder per scan chain
[Chandra, TCAD01] [Gonciari, ETW02]
ctrl
ctrl
ctrl
dis
tr
dec1
dec2
dec3
scan chain
scan chain
scan chain
Core
Page 14
Use one decoder per scan chain• Loosened the ATE timing constraint
– Exploitation of frequency ratio
• Reduced scan-power – Scan chain partitioning
• Good compression ratio– No test set rotation
• Reduced synchronization overhead
Increased area and control overhead
Large number of scan chains
Unbalanced scan chains
Page 15
Low-cost SOC test• Solutions
– Test data reduction– Reuse existing ATE technology– Exploit chip/ATE frequency ratio– Reduce pin count testing (RPCT)– Scan chain partitioning
Use one decoder per scan chainIncreased area and control overhead
Large number of scan chains
Unbalanced scan chains
Page 16
Overview
• Low-cost system-on-a-chip test
• Single vs. multiple scan chains compression
Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
• Experimental results
• Conclusion
Page 17
TAM add-on architecture
Core Core
SOC
Low-cost solution for core based SOC test
TAM add-on
Page 18
Core wrapper design
WSC2
WSC3
WSC1
WSC4Core
tb2tb3
tb4
tb1
Why core wrapper design ?
• WSC partitioning [Gonciari, VTS02]– Useless memory reduction– Easy control
Page 19
Reducing control and area overhead
ctrl
ctrl
ctrl
ctrl
dis
tr
dec1
dec2
dec3
WSC
WSC
WSC
Coredec4 WSC
Instead of
Page 20
Reducing control and area overhead …
WSC
WSC
WSC
WSC
Core
• WSC partitioning– 2 partitions– 1 control unit per partition– 1 decoder per partition
Exploit WSC partitioning for area and control reduction
Page 21
Reducing control and area overhead …
WSCWSC
WSCWSC
• Control– Length of max scan chain– No of scan chains– Diff of partitions length
Easy control per partition
diff
length
no
WS
Cs
Page 22
WSC
WSCdec1
Extended decoder (xDec) – input
decscan clk
data
length no WSCs
diff
Page 23
Extended decoder (xDec) – output
WSC
WSC
dec
no WSCs
mu
x SISR
Page 24
Extended distribution architecture
distr
xDec1m
ux SISR
Core
WSC
WSC
WSC
WSCxDec2
mu
x SISR
mu
x
xDistr
Page 25
Extended distribution architecture …
Core
WSC
WSC
WSC
WSC
Core
WSC
WSC
WSC
WSC
Unequal partition size for some cores !!
Page 26
Extended distribution architecture
xDec1 mu
x
xDec2
mu
x
add-on-xDistrm
ux
Core
WSC
WSC
WSC
WSC
WSC
WSC
WSC
WSC
Core
Page 27
Multiple TAM SOC test
Core Core
2xSIS
R2xS
ISR
add
-on
add
-on Core Core
SOC
Page 28
Design flow integration
Constr ?
TAM DesignTAM width
VTD constraints
Simulate
Extend CW
Change CW
NO
YES
1:k add-on-xDistr
Page 29
Overview
• Low-cost system-on-a-chip test– Test data reduction– Synchronization overhead
• Single vs. multiple scan chains compression
• Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
Experimental results
• Conclusion
Page 30
Minimum VTD vs. equal partitions
0
20000
40000
60000
80000
100000
120000
s13207 s15850 s35932 s38584
Vo
lum
e o
f te
st d
ata
Minimum VTD Equal partitions
Test bus = 16 Frequency ratio 2
Page 31
0
20000
40000
60000
80000
100000
s13207 s35932 s38417 s38584
Vo
lum
e o
f te
st d
ata
Minimum VTD Equal partitions
Minimum VTD vs. equal partitions
Test bus = 16 Frequency ratio 4
Page 32
add-on-xDistr vs. SSC
0
5000
10000
15000
20000
25000
4 8 16 24 32
Vo
lum
e o
f te
st d
ata
add-on-xDistr Single scan chain
Core s35932 Frequency ratio 2
Page 33
add-on-xDistr vs. SSC
Core s35932 Frequency ratio 4
13000
14000
15000
16000
17000
4 8 16 24 32
Vo
lum
e o
f te
st d
ata
add-on-xDistr Single scan chain
Page 34
add-on-xDistr vs. SSCSystem 1 Frequency ratio 2
Test bus 24 Reduction 19.29%
0
40000
80000
120000
160000
Vo
lum
e o
f te
st d
ata
Single scan chain add-on-xDistr
Page 35
add-on-xDistr vs. SSCSystem 2 Frequency ratio 2
Test bus 24 Reduction 26.88%
0
40000
80000
120000
160000
Vo
lum
e o
f te
st d
ata
Single scan chain add-on-xDistr
Page 36
Conclusion
• Low-cost solution for core based SOC test
• TAM add-on architecture
• Design flow integration
• Exploited core wrapper design features– Reduced control overhead– Reduced area overhead
• Reduced scan power through partitioning
• Small area overhead (3-4%) for System1,2
Page 37
Test data reduction
dec
DIB
SO
SOCATE
CUTHead
• Aims– Volume of test data– Area overhead– Test application time
Page 38
Generic on-chip decoder
CI
PG
ATE
scan clk
data in
ate clkData out
sync
• Serial decoder– PG and CI can not work independently– Implicit communication between PG and CI
• Parallel decoder– PG and CI can work independently– Explicit communication between PG and CI
Page 39
Synchronization overhead
• Extensions to the DIB– Multiple ATE channels– Deserialization units– Latency FIFOs– Clock synchronization
Page 40
Synchronization overhead (cont)
dec
DIB
SOCATE
CUTS
O
• New ATEs
Source synchronous buses
Require programming
Page 41
Synchronization overhead (cont)
dec
DIB
SOCATE
CUTSO
Page 42
Synchronization overhead (cont)
• Low-cost test through ATE reuse– Small area overhead increase
– Solution for entire chip test– Test application time reduction
dec
DIB
SOCATE
CUTSO
Page 43
Synchronization overhead
• Old ATEs– Latency FIFO – Clock synchronization
0 2 3 4 5 6 71
PG
CI CISTOP CI
ATE clk
Chip clk
PG PG
Page 44
On-chip SO solution 0 2 3 4 5 6 71
PG
CI CISTOP CI
ATE clk
Chip clk
PG PG
Page 45
On-chip SO solution (cont)
Increased VTD and TAT
Exploit DUMMY bits and reduce VTD and TAT
0 2 3 4 5 6 71
PG
CI CIDUMMY CI
ATE clk
Chip clk
PG PG
Page 46
On-chip SO solution (cont)
• Distribution unit
– Any number of cores
– Self synchronous architecture
PG2
0 2 3 4 5 6 71
PG1
CI1 CI1CI2 CI1
ATE clk
Chip clk
PG1 PG1
dis
tr
dec1
dec2
Page 47
XOR-network %tpl
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25
Care bit density
% t
em
po
ral p
atte
rn lo
cko
uts
W=8 W=16 W=24 W=32
Page 48
S38417: VTD / TAT for w = 32
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 5 10 15 20 25Care bit density
VT
D/T
AT
add-on-xDistr(32,6) XOR-network(32) VTD XOR-network(32) TAT
Page 49
S35932: VTD / TAT for w = 32
0
20000
40000
60000
80000
100000
120000
0 20 40 60 80Care bit density
VT
D/T
AT
add-on-xDistr(32,6) XOR-network(32) VTD XOR-network(32) TAT