Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip DV Club - July 2009 Jai Kumar, Verification Technologist Sun Microsystems Inc. [email protected] http://sun.com
Jul 16, 2015
Leveraging Low-CostFPGA Prototypingfor Validation of Highly Threaded Server-on-Chip
DV Club - July 2009
Jai Kumar, Verification TechnologistSun Microsystems [email protected]://sun.com
Slide 2 Jai KumarDV Club
Outline
• Verification Challenges• Emulation alternatives• FPGA Prototyping Basics• Prototyping Challenges• Guidelines• Results• Summary
What's in it for you -Managers:- Requirements – effort, $$, Time, toolsEngineers:- Challenges- Avoid PitfallsVendors: - Enhancements to simplify adoption
Slide 3 Jai KumarDV Club
Design Challenges Impacting Verification
T1000 T5220 T5240 T5440
0
50
100
150
200
250
300Threads
T1000 T5220 T5240 T5440
0
20
40
60
80
100
120
140
160
180Design Size
T1000 T5220 T5240 T5440
0
1
2
3
4
5
6
7
8
9Performance
T1000 T5220 T5240 T5440
0
100
200
300
400
500
600Memory
64G128G
256G
512G
1X2.5X
4X
8X
3264
128
256
41M80M
120M160M
5000000 10000000 150000001
10
100
1000
10000
100000
1000000
Design Size (M gates)
Sim
ulat
ion
Spe
ed (
cycl
es/s
ec)
SW Sim
Emulation
FPGA Prototyping
Slide 4 Jai KumarDV Club
Server-on-Chip: Verification Complexity • 2x+ performance over
UltraSPARC T1, within thesame power envelope
• Up to 8 cores @1.4GHz• 2x the threads
> Up to 64 threads per CPU • 2x the memory
> Up to 128GB memory> Up to 16 full buffered Dimms> 2.5x memory BW = 60+GB/S
• 8x FPUs, 1 fully pipelinedfloating point unit/core
• 4MB L2$ (8 banks) 16 way set• Security co-processor per core
> DES, 3DES, AES, RC4, SHA1, SHA256, MD5, RSA to 2096 key,ECC
• Powers SunFire T5120, T5220, T6320 Servers
SSI, JTAG Debug port
C4C3C2C1
L2$ BankL2$ BankL2$ BankL2$ Bank
Crossbar16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
C8C7C6C5
16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
L2$Bank
Memorycontroller
Memorycontroller
Memorycontroller
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
Crossbar
Memorycontroller
L2$Bank
L2$Bank
L2$Bank
L2$Bank
L2$Bank
L2$Bank
L2$Bank
Dual-channelFB-DIMM
Dual-channelFB-DIMM
Dual-channelFB-DIMM
Dual-channelFB-DIMM
NIU PCIe
10 Gb Ethernet X8 @ 2.5 GHz2 GB/s each direction
Sys I/Fbuffer switch
core
Slide 6 Jai KumarDV Club
FPGA Roadmap
Source: MPSOC Keynote 2006, Xilinx
FPGAs are getting bigger, cheaper and faster!
Slide 7 Jai KumarDV Club
Solution: Supplement Emulation with cheaper FPGA prototyping alternatives
• Why use FPGA prototyping? Not enough $$ for HW Emulators (big iron) – R&D dollars Need to run at close to real-time speed New advancements in FPGA technology creates opportunity for leverage
• Benefits Availability of standard off-the-shelf, mix-n-match FPGA HW/SW tools (small
iron) Allows you to stretch your R&D dollars Deploy many replicates – multiple systems in parallel Supplements your emulators (big iron) – does not replace
Think Small, Fast and Many
Slide 8 Jai KumarDV Club
FPGA Prototyping 101
What is Prototyping: • Process of mapping RTL functionality to FPGAsHardware:• Multiple Latest, Largest FPGAs on a board• Two Major Vendors: Altera & Xilinx• Capacity: 3-150M Gates• Performance: 5 to 50MHzSoftware:• Synthesis, Design Partition, FPGA P&R• Debug Tools
Slide 9 Jai KumarDV Club
Big Picture
Mo
delin
g E
ffort
1 10 100 1K 10K 100K 500K 1M 5M 10M 100M 1G+
Simulation
Acceleration
Emulation
FPGA Prototyping
HW verification System-level (HW/SW verification
SW Development
Productivity
Debug Productivity
Simulation Speed (Hz)
Silicon
Solaris Boot
Time 15 years
1Day 18hrs6 hours
38mins
Slide 10 Jai KumarDV Club
FPGA Protyping Vs. Emulation
Features FPGA Prototype EmulationGeneral:
Capacity Expandability Good Very Good Memory Capacity Very Good Good Ease of use Low Very Good Cost Low HighModel Build Efficiency:
Compile Time OK Very Good Model Size Smaller Bigger RTL Flexibility OK Good Test bench support OK Very GoodSimulation Efficiency:
Simulation Speed Very Good Good Save/Restore No Very Good IO Expandability (PCIE,Ethernet etc) Very Good GoodDebug Efficiency:
Signal Visibility Limited Very Good Waveforms w/o re-run No Very Good
Slide 11 Jai KumarDV Club
FPGA Tools
Design Partition
RTL Synthesis
Altera Place & Route Xilinx Place & Route
Altera Stratix3 FPGA Xilinx Virtex5 FPGA
Altera SignalTap Debug Xilinx Chipscope Debug
Gidel HW DINI HW Vendor XDINISynopsys
Advanced Debug
Tools
AuspySynopsys
Certify
Altera Quartus
Synopsys
SynplifyXilinx ISE
ALDEC
DAFCA
Synopsys
Identify Pro
RTL Design
HW Boards
Off-the-Shelf, Mix-n-Match FPGA Emulation HW/SW Tools
Slide 12 Jai KumarDV Club
Deployment Strategy• Understand platform capabilities and limitations
> Build your use model> Set management, user expectations
• Identify Applicable Model Configurations > Size limited to small capacity (<16MGates)
• Identify Workload> Primary Platform for SW Development > Secondary Platform for RTL/IO Verification
• Design Mapping > Automated FPGA RTL Coding enforcements
• Leverage simulators/emulators for debug
Slide 13 Jai KumarDV Club
Prototyping Challenges• Design Mapping – Size, Style
> Limit to 4-6 FPGAs (~16M Gates)
• Memory Mapping> RTL Arrays (custom logic) – BLK RAM inferencing> Multi-ported arrays – over clocking > Large system memory - mapping to DDR
• Verification Infrastructure > TestBench – synthesizable, self-checking> Initialization - Use back-door access to download/upload big memories> Monitors, SVA, $display is not supported – use LA triggers
• Mapping Transformation Verification > Gate-level Simulation at every stage
Slide 14 Jai KumarDV Club
Guidelines
• RTL Coding Guidelines for FPGAs > No XMRs, no force/release, avoid latches, clock gating> No initializations (constant inits results in undesired synth
optimizations)> Perform FPGA RTL Linting Check
• Stand-alone Synthesis & Verif of custom logic> check for RAM utilization & reduced CLK domains> Mixed-mode RTL-Gate Simulations
• Perform full-chip gate simulations at different stages > After synthesis, after partitioning, after insertion of signal
multiplexing logic
Slide 15 Jai KumarDV Club
FPGA Flow Modular
Synthesis
Parallel Synthesis
EmulationRTL Model
NetlistQualification
DesignPartition
Design VisibilityFPGA
Place & RouteC-API
Compile
RTL Simulation- verify latch, clk-gate conversions
- fpga partitioning
- pin multiplexing
Gate-level Simulation
FPGAPlatform
Slide 16 Jai KumarDV Club
• OpenSPARC T2 Model > 3.8M Gates, Runs @8MHz> Being opensourced soon –
opensparc.net
• Hardware: > 6M Gates> 2 Altera Stratix III SL340 FPGAS
• Software: > RTL Partitioner, Bundled FPGA tools
• Effort:> 1 engineer; 3 months
• Applications: > Verify Core, SOC, IO> Verify Firmware (HV/OBP), Solaris,
Application
C4C3C2C1
L2$ BankL2$ BankL2$ BankL2$ BankCrossbar16
KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
C8C7C6C5
16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
16 KB I$8 KB D$
L2$Bank
Memorycontroller
Memorycontroller
Memorycontroller
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
FPUSPU
Crossbar
MemorycontrollerL2$
BankL2$
BankL2$
BankL2$
BankL2$
BankL2$
BankL2$
Bank
NIU PCIe
Sys I/Fbuffer switchcore
FPGA Prototyping Results
Slide 17 Jai KumarDV Club
Platform improvements – to ease adoption
• Bridge gap between Emulator and FPGA Prototyping> Learn from advances in the emulator space> Ease of model build > Support for RTL, SVA, TB constructs> Seamless RTL partitioning > Eliminate need for gate-simulations
• Support for Verification infrastructure > XMRs, preserve net names, ports
• Enhance Debug experience> Improve debug tools, offload to simulators
Slide 18 Jai KumarDV Club
Summary• Low cost FPGA prototyping supplements expensive
emulators• Collaborate with vendors to implement feature-set
for your use models • FPGA Prototyping is effort-intensive, but will pay off
in cost savings & higher performance• Benefit:
> Higher HW & SW coverage (fewer silicon respins)> Debug Bringup Tools before TO (faster bringup; productization
time savings)
Leveraging Low-CostFPGA Prototypingfor Validation of Highly Threaded Server-on-Chip
DV Club - July 2009
Jai Kumar, Verification TechnologistSun Microsystems [email protected]://sun.com