Top Banner
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip DV Club - July 2009 Jai Kumar, Verification Technologist Sun Microsystems Inc. [email protected] http://sun.com
19

Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Jul 16, 2015

Download

Technology

DVClub
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Leveraging Low-CostFPGA Prototypingfor Validation of Highly Threaded Server-on-Chip

DV Club - July 2009

Jai Kumar, Verification TechnologistSun Microsystems [email protected]://sun.com

Page 2: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 2 Jai KumarDV Club

Outline

• Verification Challenges• Emulation alternatives• FPGA Prototyping Basics• Prototyping Challenges• Guidelines• Results• Summary

What's in it for you -Managers:- Requirements – effort, $$, Time, toolsEngineers:- Challenges- Avoid PitfallsVendors: - Enhancements to simplify adoption

Page 3: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 3 Jai KumarDV Club

Design Challenges Impacting Verification

T1000 T5220 T5240 T5440

0

50

100

150

200

250

300Threads

T1000 T5220 T5240 T5440

0

20

40

60

80

100

120

140

160

180Design Size

T1000 T5220 T5240 T5440

0

1

2

3

4

5

6

7

8

9Performance

T1000 T5220 T5240 T5440

0

100

200

300

400

500

600Memory

64G128G

256G

512G

1X2.5X

4X

8X

3264

128

256

41M80M

120M160M

5000000 10000000 150000001

10

100

1000

10000

100000

1000000

Design Size (M gates)

Sim

ulat

ion

Spe

ed (

cycl

es/s

ec)

SW Sim

Emulation

FPGA Prototyping

Page 4: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 4 Jai KumarDV Club

Server-on-Chip: Verification Complexity • 2x+ performance over

UltraSPARC T1, within thesame power envelope

• Up to 8 cores @1.4GHz• 2x the threads

> Up to 64 threads per CPU • 2x the memory

> Up to 128GB memory> Up to 16 full buffered Dimms> 2.5x memory BW = 60+GB/S

• 8x FPUs, 1 fully pipelinedfloating point unit/core

• 4MB L2$ (8 banks) 16 way set• Security co-processor per core

> DES, 3DES, AES, RC4, SHA1, SHA256, MD5, RSA to 2096 key,ECC

• Powers SunFire T5120, T5220, T6320 Servers

SSI, JTAG Debug port

C4C3C2C1

L2$ BankL2$ BankL2$ BankL2$ Bank

Crossbar16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

C8C7C6C5

16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

L2$Bank

Memorycontroller

Memorycontroller

Memorycontroller

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

Crossbar

Memorycontroller

L2$Bank

L2$Bank

L2$Bank

L2$Bank

L2$Bank

L2$Bank

L2$Bank

Dual-channelFB-DIMM

Dual-channelFB-DIMM

Dual-channelFB-DIMM

Dual-channelFB-DIMM

NIU PCIe

10 Gb Ethernet X8 @ 2.5 GHz2 GB/s each direction

Sys I/Fbuffer switch

core

Page 5: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 5 Jai KumarDV Club

Problem: cost of Emulation going up

Gulfstream jetEmulator HW (big iron)

Page 6: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 6 Jai KumarDV Club

FPGA Roadmap

Source: MPSOC Keynote 2006, Xilinx

FPGAs are getting bigger, cheaper and faster!

Page 7: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 7 Jai KumarDV Club

Solution: Supplement Emulation with cheaper FPGA prototyping alternatives

• Why use FPGA prototyping? Not enough $$ for HW Emulators (big iron) – R&D dollars Need to run at close to real-time speed New advancements in FPGA technology creates opportunity for leverage

• Benefits Availability of standard off-the-shelf, mix-n-match FPGA HW/SW tools (small

iron) Allows you to stretch your R&D dollars Deploy many replicates – multiple systems in parallel Supplements your emulators (big iron) – does not replace

Think Small, Fast and Many

Page 8: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 8 Jai KumarDV Club

FPGA Prototyping 101

What is Prototyping: • Process of mapping RTL functionality to FPGAsHardware:• Multiple Latest, Largest FPGAs on a board• Two Major Vendors: Altera & Xilinx• Capacity: 3-150M Gates• Performance: 5 to 50MHzSoftware:• Synthesis, Design Partition, FPGA P&R• Debug Tools

Page 9: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 9 Jai KumarDV Club

Big Picture

Mo

delin

g E

ffort

1 10 100 1K 10K 100K 500K 1M 5M 10M 100M 1G+

Simulation

Acceleration

Emulation

FPGA Prototyping

HW verification System-level (HW/SW verification

SW Development

Productivity

Debug Productivity

Simulation Speed (Hz)

Silicon

Solaris Boot

Time 15 years

1Day 18hrs6 hours

38mins

Page 10: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 10 Jai KumarDV Club

FPGA Protyping Vs. Emulation

Features FPGA Prototype EmulationGeneral:

Capacity Expandability Good Very Good Memory Capacity Very Good Good Ease of use Low Very Good Cost Low HighModel Build Efficiency:

Compile Time OK Very Good Model Size Smaller Bigger RTL Flexibility OK Good Test bench support OK Very GoodSimulation Efficiency:

Simulation Speed Very Good Good Save/Restore No Very Good IO Expandability (PCIE,Ethernet etc) Very Good GoodDebug Efficiency:

Signal Visibility Limited Very Good Waveforms w/o re-run No Very Good

Page 11: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 11 Jai KumarDV Club

FPGA Tools

Design Partition

RTL Synthesis

Altera Place & Route Xilinx Place & Route

Altera Stratix3 FPGA Xilinx Virtex5 FPGA

Altera SignalTap Debug Xilinx Chipscope Debug

Gidel HW DINI HW Vendor XDINISynopsys

Advanced Debug

Tools

AuspySynopsys

Certify

Altera Quartus

Synopsys

SynplifyXilinx ISE

ALDEC

DAFCA

Synopsys

Identify Pro

RTL Design

HW Boards

Off-the-Shelf, Mix-n-Match FPGA Emulation HW/SW Tools

Page 12: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 12 Jai KumarDV Club

Deployment Strategy• Understand platform capabilities and limitations

> Build your use model> Set management, user expectations

• Identify Applicable Model Configurations > Size limited to small capacity (<16MGates)

• Identify Workload> Primary Platform for SW Development > Secondary Platform for RTL/IO Verification

• Design Mapping > Automated FPGA RTL Coding enforcements

• Leverage simulators/emulators for debug

Page 13: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 13 Jai KumarDV Club

Prototyping Challenges• Design Mapping – Size, Style

> Limit to 4-6 FPGAs (~16M Gates)

• Memory Mapping> RTL Arrays (custom logic) – BLK RAM inferencing> Multi-ported arrays – over clocking > Large system memory - mapping to DDR

• Verification Infrastructure > TestBench – synthesizable, self-checking> Initialization - Use back-door access to download/upload big memories> Monitors, SVA, $display is not supported – use LA triggers

• Mapping Transformation Verification > Gate-level Simulation at every stage

Page 14: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 14 Jai KumarDV Club

Guidelines

• RTL Coding Guidelines for FPGAs > No XMRs, no force/release, avoid latches, clock gating> No initializations (constant inits results in undesired synth

optimizations)> Perform FPGA RTL Linting Check

• Stand-alone Synthesis & Verif of custom logic> check for RAM utilization & reduced CLK domains> Mixed-mode RTL-Gate Simulations

• Perform full-chip gate simulations at different stages > After synthesis, after partitioning, after insertion of signal

multiplexing logic

Page 15: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 15 Jai KumarDV Club

FPGA Flow Modular

Synthesis

Parallel Synthesis

EmulationRTL Model

NetlistQualification

DesignPartition

Design VisibilityFPGA

Place & RouteC-API

Compile

RTL Simulation- verify latch, clk-gate conversions

- fpga partitioning

- pin multiplexing

Gate-level Simulation

FPGAPlatform

Page 16: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 16 Jai KumarDV Club

• OpenSPARC T2 Model > 3.8M Gates, Runs @8MHz> Being opensourced soon –

opensparc.net

• Hardware: > 6M Gates> 2 Altera Stratix III SL340 FPGAS

• Software: > RTL Partitioner, Bundled FPGA tools

• Effort:> 1 engineer; 3 months

• Applications: > Verify Core, SOC, IO> Verify Firmware (HV/OBP), Solaris,

Application

C4C3C2C1

L2$ BankL2$ BankL2$ BankL2$ BankCrossbar16

KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

C8C7C6C5

16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

16 KB I$8 KB D$

L2$Bank

Memorycontroller

Memorycontroller

Memorycontroller

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

FPUSPU

Crossbar

MemorycontrollerL2$

BankL2$

BankL2$

BankL2$

BankL2$

BankL2$

BankL2$

Bank

NIU PCIe

Sys I/Fbuffer switchcore

FPGA Prototyping Results

Page 17: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 17 Jai KumarDV Club

Platform improvements – to ease adoption

• Bridge gap between Emulator and FPGA Prototyping> Learn from advances in the emulator space> Ease of model build > Support for RTL, SVA, TB constructs> Seamless RTL partitioning > Eliminate need for gate-simulations

• Support for Verification infrastructure > XMRs, preserve net names, ports

• Enhance Debug experience> Improve debug tools, offload to simulators

Page 18: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Slide 18 Jai KumarDV Club

Summary• Low cost FPGA prototyping supplements expensive

emulators• Collaborate with vendors to implement feature-set

for your use models • FPGA Prototyping is effort-intensive, but will pay off

in cost savings & higher performance• Benefit:

> Higher HW & SW coverage (fewer silicon respins)> Debug Bringup Tools before TO (faster bringup; productization

time savings)

Page 19: Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

Leveraging Low-CostFPGA Prototypingfor Validation of Highly Threaded Server-on-Chip

DV Club - July 2009

Jai Kumar, Verification TechnologistSun Microsystems [email protected]://sun.com