Built Built - - In Self In Self - - Test of Test of DSPs DSPs in Virtex in Virtex - - 4 FPGAs 4 FPGAs Charles Stroud Charles Stroud Dept. of Electrical & Computer Engineering Dept. of Electrical & Computer Engineering Auburn University Auburn University (Funded by NSA)
30
Embed
Built-In Self-Test of DSPs in Virtex-4 FPGAsstrouce/class/elec6970/DSPBIST.pdf · 9Application to Virtex-4 DSPs ... 1111111110000000000 1111111110000000001 1111111100000000001 ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BuiltBuilt--In SelfIn Self--Test of Test of DSPsDSPsin Virtexin Virtex--4 FPGAs4 FPGAs
Charles StroudCharles StroudDept. of Electrical & Computer EngineeringDept. of Electrical & Computer Engineering
Outline of PresentationOutline of PresentationHistory of DSP Architectures in History of DSP Architectures in FPGAsFPGAs
Overview of VirtexOverview of Virtex--4 DSP4 DSPPrior Testing R&D vs. Our Analysis for:Prior Testing R&D vs. Our Analysis for:
Literature on DSP test not applicableLiterature on DSP test not applicableNo papers published on No papers published on DSPsDSPs in in FPGAsFPGAs
Literature on Multipliers and AddersLiterature on Multipliers and AddersApplication to VirtexApplication to Virtex--4 DSPs4 DSPs
BIST for DSPs in VirtexBIST for DSPs in Virtex--44Architecture, Operation, and ImplementationArchitecture, Operation, and ImplementationTiming and Fault Injection AnalysisTiming and Fault Injection Analysis
Summary and ConclusionsSummary and ConclusionsPlans for application to VirtexPlans for application to Virtex--55
NNxxNN array of unit cellsarray of unit cellsUnit cell = CLB + routingUnit cell = CLB + routingFast carry logic in CLBs for addersFast carry logic in CLBs for adders
Virtex/SpartanVirtex/Spartan--22MMxxNN array of unit cellsarray of unit cells
Carry logic + AND gate for array multipliersCarry logic + AND gate for array multipliers4K block 4K block RAMsRAMs at edgesat edges
VirtexVirtex--2/Spartan2/Spartan--3318K block 18K block RAMsRAMs in arrayin array18x1818x18--bit multipliers with each RAMbit multipliers with each RAM
““based on modified Booth architecturebased on modified Booth architecture””
Pattern expansion required for 16x16Pattern expansion required for 16x16--bit to 18x18bit to 18x18--bitbitPotential for mistakes Potential for mistakes ifif patterns not expanded properlypatterns not expanded properly
Modified Booth multiplier resultsModified Booth multiplier results≈≈ 62% with carry62% with carry--save addersave adder≈≈ 37% with CLA37% with CLA
ConclusionConclusion: array multiplier test vectors do not : array multiplier test vectors do not adequately test modified Booth multiplieradequately test modified Booth multiplier
Chris EricksonChris Erickson’’ssResultsResults
Note differenceNote differencein FC wrt adderin FC wrt adderimplementationimplementation
Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTwo test algorithms using 8Two test algorithms using 8--bit counter bit counter (256 vectors)(256 vectors)
““Low Power BIST for Wallace TreeLow Power BIST for Wallace Tree--based Fast Multipliersbased Fast Multipliers””Bakalis, Kalligeros, Nikolos, Vergos & AlexiouBakalis, Kalligeros, Nikolos, Vergos & Alexiou
Proc. Int. Symp. on Quality of Electronic Design,Proc. Int. Symp. on Quality of Electronic Design,pp. 433pp. 433--438, 2000438, 2000
5x3 connections with 5 inputs to Booth encoding5x3 connections with 5 inputs to Booth encodingBut which side is Booth encoding?But which side is Booth encoding?Our approach: run both 5x3 and 3x5 algorithmsOur approach: run both 5x3 and 3x5 algorithms
““Effective BuiltEffective Built--In SelfIn Self--Test for Booth MultipliersTest for Booth Multipliers””Gizopoulos, Paschalis & ZorianGizopoulos, Paschalis & Zorian
IEEE Design & Test of ComputersIEEE Design & Test of Computerspp. 105pp. 105--111, 1998111, 1998
4x4 connections to multiplier inputs4x4 connections to multiplier inputsOur approach: also include 4x4 if fault coverage improvesOur approach: also include 4x4 if fault coverage improves
×nn
2n
Booth encoding
n×n multiplier
8-bit counterMSB LSB
4 4
4×4 algorithm
5 3
5×3 algorithm
3 5
3×5 algorithm
Algorithm used inAlgorithm used inSrinivasSrinivas GarimellaGarimella’’ss
MS thesis forMS thesis forVirtexVirtex--2 multipliers2 multipliers
≈≈ 90%90% with ripplewith ripple--carry addercarry adder≈≈ 90%90% with carrywith carry--save addersave adder≈≈ 70%70% with CLAwith CLA
ConclusionConclusion: modified Booth multiplier test : modified Booth multiplier test vectors do test array multipliervectors do test array multiplier
But ModifiedBut Modified--Booth/WallaceBooth/Wallace--Tree appears to Tree appears to be most likely candidate for Virtexbe most likely candidate for Virtex--4 DSP 4 DSP multiplier implementationmultiplier implementation
Also for VirtexAlso for Virtex--5 and 5 and AlteraAltera
Chris EricksonChris Erickson’’ssResultsResults
Note differenceNote differencein FC wrt adderin FC wrt adderimplementationimplementation
Other Multiplier ResultsOther Multiplier Results4x44x4--bit implementationsbit implementationsExhaustive test patternsExhaustive test patterns
Undetected faults are undetectableUndetected faults are undetectableSame as 4x4, 5x3, & 3x5 algorithm for 4x4Same as 4x4, 5x3, & 3x5 algorithm for 4x4--bit multiplierbit multiplier
Simulation results discrepancy for array multiplierSimulation results discrepancy for array multiplier4 undetected faults in 4x44 undetected faults in 4x4--bit implementationbit implementation1 undetected fault in 18x18 multiplier w/ 4x4 algorithm 1 undetected fault in 18x18 multiplier w/ 4x4 algorithm in in Chris EricksonChris Erickson’’ss resultsresults
CarryCarry--LookLook--Ahead AdderAhead AdderRecall CLA was Recall CLA was more difficult to testmore difficult to testBasic CLA is 4Basic CLA is 4--bitsbits
44--bit CLAs then bit CLAs then combined to form combined to form larger adderslarger adders
Ripple CLAsRipple CLAs2 types based on 2 types based on Lookahead Carry Lookahead Carry Unit (LCU):Unit (LCU):
CLA Test AlgorithmsCLA Test Algorithms““On the Adders with Minimum TestsOn the Adders with Minimum Tests””
Kajihara and SasaoKajihara and SasaoProc. VLSI Test Symp, pp. 10Proc. VLSI Test Symp, pp. 10--15, 1997 (VTS15, 1997 (VTS’’97)97)
10 vectors detect all single and multiple faults10 vectors detect all single and multiple faultsIn any size In any size rippleripple CLA (CLA (not an LCU implementationnot an LCU implementation))
““Scalable Test Generators for HighScalable Test Generators for High--Speed Speed Datapath CircuitsDatapath Circuits””
AlAl--Asaad, Hayes, and MurrayAsaad, Hayes, and MurrayJ. Electronic Testing, vol 12, pp. 111J. Electronic Testing, vol 12, pp. 111--125, 1998 (JETTA125, 1998 (JETTA’’98)98)
22××((NN+1) vector sequence (for an +1) vector sequence (for an NN--bit adder)bit adder)TPG implementation requires:TPG implementation requires:
NN+1+1--bit shift registerbit shift registerNN XOR gates, XOR gates, NN XNOR gates, and 1 inverterXNOR gates, and 1 inverter
Fault Simulation ResultsFault Simulation ResultsJETTAJETTA’’98 approach gives best overall fault coverage 98 approach gives best overall fault coverage regardless of adder implementationregardless of adder implementation
Undetected faults in JETTAUndetected faults in JETTA’’98 approach can be detected98 approach can be detectedResults in Results in ““New BISTNew BIST”” column for column for 2×(N+2) vector sequencevector sequence
JETTAJETTA’’98 also claims similar BIST approach for 98 also claims similar BIST approach for ModifiedModified--Booth multiplierBooth multiplier
But description of test algorithm is very sketchyBut description of test algorithm is very sketchy
Adder in VirtexAdder in Virtex--4 DSP4 DSPAdder has 3 input portsAdder has 3 input ports
P = ZP = Z±±(X+Y+Cin)(X+Y+Cin)We interpret this as a 2We interpret this as a 2--stage CLA stage CLA adder/subtractor implementationadder/subtractor implementation
Apply test patterns to each stage in turnApply test patterns to each stage in turn2 clock cycles2 clock cyclesper vectorper vectorOPMODEOPMODEcontrolcontrol
48-bit CLA
48-bit CLA
(X MUX)A port
(Y MUX)B port
(Z MUX)C port CIN
Subtract
Clock cycle #1Clock cycle #1X test vectorX test vectorClock cycle #2Clock cycle #2Y test vectorY test vectorClock cycle #2Clock cycle #2Z test vectorZ test vector
Four groups of 256 clock cycles (ccs) eachFour groups of 256 clock cycles (ccs) eachAllows control of operational modes (OPMODEs) of DSPAllows control of operational modes (OPMODEs) of DSP
Test mode controlled by 4Test mode controlled by 4--bit shift registerbit shift registerBits include: Test Mode (2), Invert Control Signals, ResetBits include: Test Mode (2), Invert Control Signals, ResetContents loaded via Boundary Scan interfaceContents loaded via Boundary Scan interface
Reduces the number of downloads to FPGAReduces the number of downloads to FPGA
Pseudo-Random Control SignalsConstant Control Signals
TPG ArchitectureTPG ArchitectureCounter Counter ⇒⇒ 55×3 and 33 and 3×5 multiplier test to ports A&B 5 multiplier test to ports A&B Shift register Shift register ⇒⇒ 2×(N+2) vector adder test to port Cvector adder test to port CFSM FSM ⇒⇒ OPMODE control for 4 group sequencesOPMODE control for 4 group sequencesLFSR LFSR ⇒⇒ pseudopseudo--random patterns to other control random patterns to other control inputs during last two groups of 256 clock cyclesinputs during last two groups of 256 clock cycles
ORA ImplementationORA ImplementationOld comparisonOld comparison--based ORAbased ORA
Logic 1 latched in FF due to mismatchesLogic 1 latched in FF due to mismatchesConfiguration memory readback used to get resultsConfiguration memory readback used to get results
CLBs have dedicated carry chain for fast adders CLBs have dedicated carry chain for fast adders and countersand counters
New ORA latches logic 0 due to mismatchNew ORA latches logic 0 due to mismatchCarry chain performs iterative OR functionCarry chain performs iterative OR functionSingle pass/failSingle pass/failindication at end ofindication at end ofBIST sequenceBIST sequenceOnly read configuration memory to get failing results Only read configuration memory to get failing results for diagnosisfor diagnosis
BIST ConfigurationsBIST Configurations5 downloads to FPGA5 downloads to FPGA
1 compressed download (<50% of full config)1 compressed download (<50% of full config)+ 4 partial reconfigurations (<0.5% of full config)+ 4 partial reconfigurations (<0.5% of full config)
only change DPS configuration bitsonly change DPS configuration bits
7 BIST sequences7 BIST sequencesBIST configurations #2 & #3 ran twiceBIST configurations #2 & #3 ran twice
different control register values for multiplier/adder test algodifferent control register values for multiplier/adder test algorithmsrithms
.NCD to .BIT.NCD to .BITBitGenBitGenDownload into FPGADownload into FPGA
.NCD to .XDL.NCD to .XDLModification program for Modification program for generating remaining 4 generating remaining 4 BIST configurationsBIST configurations
110111010000100010110101System or BIST configuration fileSystem or BIST configuration file
FPGAFPGA
Physical Fault InjectionPhysical Fault InjectionFaulty FPGAs are difficult to findFaulty FPGAs are difficult to find
1 ORCA with faulty PLB & 2 ORCAs with faulty routing1 ORCA with faulty PLB & 2 ORCAs with faulty routingPhysical fault insertionPhysical fault insertion
Etch package down to bare die and Etch package down to bare die and ““zapzap””We use fault injection emulationWe use fault injection emulation
Modify configuration bits before or after download (RMW)Modify configuration bits before or after download (RMW)Can inject single and/or multiple faultsCan inject single and/or multiple faults
StuckStuck--at faults & bridging faultsat faults & bridging faultsFaults limited effects of configuration bitsFaults limited effects of configuration bits
Fault Injection Emulation ResultsFault Injection Emulation Results1) Download BIST configuration1) Download BIST configuration2) Manipulate configuration bit via read2) Manipulate configuration bit via read--modifymodify--writewrite3) Run BIST sequence3) Run BIST sequence4) Get BIST results4) Get BIST results
SummarySummaryInvestigated known test algorithms for Investigated known test algorithms for multipliers and addersmultipliers and addersLooked for architecture independent tests Looked for architecture independent tests with highest fault coveragewith highest fault coverageJETTAJETTA’’98 approach easy to implement98 approach easy to implement
Needs modification for 100% FCNeeds modification for 100% FC7 DSP BIST sequences with 5 downloads7 DSP BIST sequences with 5 downloads
New ORA eliminates config memory readbackNew ORA eliminates config memory readbackTotal testing time < 52% of 1 full downloadTotal testing time < 52% of 1 full download
Using compressed and partial reconfigurationUsing compressed and partial reconfigurationOnly DSP configuration bits need to be changedOnly DSP configuration bits need to be changed
Application to VirtexApplication to Virtex--5 DSPs5 DSPs