Top Banner
Stratix ® 10: 14nm FPGA Delivering 1GHz Mike Hutton Product Architect, Altera IC Design HotChips 2015
24

Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Apr 02, 2018

Download

Documents

vohuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix® 10: 14nm FPGA Delivering 1GHz

Mike Hutton Product Architect, Altera IC Design HotChips 2015

Page 2: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Acknowledgements

2

The Stratix 10 Architecture and Definition Teams − Herman Schmit, Dana How, Gordon Chiu, Carl Ebeling, Bruce Pedersen,

Andy Lee, Martin Langhammer, Ben Gamsa − Dave Lewis, Valavan Manohararajah, David Galloway, Jeff Chromczak,

Tim Vanderhoek, Ian Milton − Sean Atsatt, David Shippy, Arif Rahman, Mark Chan, Jeff Schultz, Richard

Grenier, Steven Perry, Jiefan Zhang, Rita Chu, Ting Lu − KS Foo, Chee Hak Teh, Lai Guan Tang − Bernhard Friebe, Eleena Ong, Jordon Inkeles, Allan Davidson, Lux Joshi,

Martin Won

Page 3: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix 10 Architectural Big Rocks

3

2X the achievable performance of Stratix V At up to 70% lower power Heterogeneous 3D System In Package (SiP) integration Adoption of Intel 14nm tri-Gate process Hierarchical configuration and security

Page 4: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix 10 Innovations: 2.5D

4

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

Memory

Memory

Memory

Memory

Memory

Multi-Die via EMIB − Separate core / transceiver − Embedded Multi-Die

Interconnect Bridge

FPGA Core Die

Page 5: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix 10 Innovations: Configuration & Clocking

5

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

UIB

Memory

Memory

UIB

Memory

UIB

Memory

UIB

Memory

Multi-Die via EMIB − Separate core / transceiver − Embedded Multi-Die

Interconnect Bridge

Scalable Sector Architecture − Software Configuration − Configuration NoC − Routable Clocks

Page 6: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix 10 Innovations: HyperFlex

6

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

UIB

Memory

Memory

UIB

Memory

UIB

Memory

UIB

Memory

Multi-Die via EMIB − Separate core / transceiver − Embedded Multi-Die

Interconnect Bridge

Scalable Sector Architecture − Software Configuration − Configuration NoC − Routable Clocks

Core Performance − HyperFlex Fabric, Tri-Gate − 1GHz M20K and DSP MAC − 750 MHz Floating Point

Page 7: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix 10 Innovations: SoC & Memory

7

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

PCIe

x1

- x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.)

XCVR Bank (6ch.) PCIe x1 - x16

UIB

Memory

Memory

UIB

Memory

UIB

Memory

UIB

Memory

Multi-Die via EMIB − Separate core / transceiver − Embedded Multi-Die

Interconnect Bridge

Scalable Sector Architecture − Software Configuration − Configuration NoC − Routable Clocks

Core Performance − HyperFlex Fabric, Tri-Gate − 1GHz M20K and DSP MAC − 750 MHz Floating Point

SoC − 1.5GHz Performance

ARM® Cortex A53 − DDR I/O Banks with Hard

Memory Controller

Page 8: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

System-in-Package Construction

XCVR (6CH)

XCVR (6CH)

XCVR (6CH)

XCVR (6CH) PCIe x16

XCVR (6CH)

XCVR (6CH)

XCVR (6CH)

XCVR (6CH) PCIe x16

XCVR (6CH)

XCVR (6CH)

XCVR (6CH)

XCVR (6CH) PCIe x16

XCVR (6CH)

XCVR (6CH)

XCVR (6CH)

XCVR (6CH) PCIe x16

XCVR tile 24 XCVRs

PCIe HIP 3V GPIO

XCVR (6CH)

XCVR (6CH)

XCVR (6CH)

XCVR (6CH) PCIe x16

XCVR (6CH)

XCVR (6CH)

XCVR (6CH)

XCVR (6CH) PCIe x16 Core Fabric

8

Package substrate

AIB

A

IB

AIB

AIB

A

IB

AIB

XCVR tile XCVR tile Core Fabric Package lid

AIB

A

IB

AIB

AIB

A

IB

AIB

XCVR (6ch) Transceiver Banks PCIe x16 PCIe Hard IP block

Intel EMIB

HyperConnect-AIB AIB Note: Not to-scale

Page 9: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

System-in-Package EMIB Technology

9

Many benefits − Reduced complexity vs. full interposer, and no reticle limit − De-Couple analog (transceiver) development from digital FPGA fabric − Transceiver reliability & yield enhancement

Don’t need rectangular “die” Matching transceiver speed-grades

− Tick mixed with tock for added derivatives E.g. 56G PAM4 transceiver tile E.g. new hardened I/O interface IP E.g. SiP Memory or ASIC tiles

Embedded Multi-Die Interconnect Bridge

Page 10: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Sectors and Configuration Sub-System

10

Config-System manages CRAM Historically just a shift register − AR/DR to Configuration RAM Array − FSM controlled

Modern configuration adds significant system functionality − Encryption, decryption, bitstream compression, redundancy − Security: authentication, side-Channel, firewall, PUF − SEU, scrubbing and partial-configuration management − Debug

Our solution: move it to software − More robust, upgradable, and risk-averse

Page 11: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Stratix 10 Configuration Sub-system Overview

11

Secure Device Manager (SDM) − Config and Re-Config, compression − Security: authentication, encryption, PUF − Maintenance (power, T/V, SEU, debug)

Local Sector Manager (LSM) − Sector configuration manager

Config Network-on-Chip (CNoC) − SDM/LSM Communication

SDM

CNOC

(physical) (logical)

Page 12: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

SoC Application Processor

12

1.5 GHz Quad-Core ARM® Cortex™ A53 − CCU: Cache-coherency between FPGA accelerators and processors − Integrated with configuration subsystem (SDM) – sharing peripherals

Page 13: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Routable Clocking

13

SW-routed clocks in sector “seams” More efficient use of globals Active skew management

(Legacy) Global Spines Stratix 10 Routed Clocks

Page 14: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Core Fabric Building Blocks

14

Adaptive Logic Module

LAB clusters, staggered H/V routing

1GHz DSP MAC 10 TFLOPs IEEE 754

1GHz RAM Data Forwarding

20Kb

LIM/LEIM/DIM fabric

Page 15: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Fabric Performance and Power

15

Performance: − A complete “re-think” on the philosophy of FPGA Fabric Architecture − Registers are not just logic resources, they are routing resources − Goal is to enable seamless movement and addition (pipelining) of registers − Target: 2X the performance, without making the wires “2x faster”

Power − 14nm Tri-Gate process (FinFET) provides process benefit for power − Expanded use of VID and power management adds more

High-Performance 800 mV to 940 mV Low-Power options from 850 mV down to 800 mV

− HyperFlex for power reduction Combine performance from HyperFlex with low-power options

− Target: 50% to 70% lower power per function, without slowing down

Page 16: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Re-timing and Pipelining in Conventional FPGAs

16

Re-Timing - Balance flops - 16% fmax gain - Added area

Pipelining - Add flops - 40% fmax gain - Added clock tick - Added area

Raw Logic - Unbalanced paths

Page 17: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

HyperFlex: Pipeline Registers by Design

17

Routing muxes (all H/V wires) have optional registers − Including LAB, M20K and DSP block inputs, CC, SCLR/CE

Architectural Goals: − Perfect balance – P&R chooses the right register (of many) to turn on − Simple Software – Re-timing is a simple push/pull along the path − No wasted LEs – Designs with high FF:LUT ratios no longer an issue − No wasted routing – Don’t have to route to find an available FF

Page 18: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Moving a Register in the HyperFlex fabric

18

Disable in ALM, add to routing Moving a register is a push/pull operation on the route There is always a register on the routing mux Quartus® II chooses the most appropriate FF for path balancing

Page 19: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Re-timing and Pipelining in Stratix 10

19

Re-Timing - Balance flops - 40% fmax gain - Same resources

Pipelining - Add flop - Add clock tick - 2X fmax gain - Same resources

Raw Logic - Unbalanced paths

Page 20: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Software and Designer Use-Model

20

Software adds a new step

Designer/SW concentrate on critical domains/chains, not volatile reg-reg paths

HyperPipeline the data-path, optimize control logic

Page 21: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Power: Half the power per function

21

14nm Tri-Gate provides a good chunk of this − Allows us to take more of the process benefit as performance

Expanded use of VID and power management − M20K and DSP block power gating

Added registers helps: − Reduced footprint for register-heavy designs

At 2X the speed, reduced size − Half the width means half the area − Which means half the static power

on the same device.

Bus Half-width Bus

… ..

Page 22: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Area/Delay/Power Tradeoffs with Stratix 10

22

Stratix V Migrate to Stratix 10

½ Width 2x fmax

2X Throughput

I/O

AC

DC

Pow

er

Page 23: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Summary

23

3D integration isn’t just integration, it is − De-risking, process matching, derivative proliferation and tick/tock

Device floorplanning and configuration get an upgrade − Software control allows for security and feature-up of devices

SoC integration is mainstream − Processor cost is a small subset of the die, coherent-accelerators

Pipelining unlocks optimizations in FPGA architecture − Using wires efficiently, not brute-forcing them faster − Faster == lower power when you can get designs to a more efficient place

Process is still giving us power benefits − 14nm Tri-Gate reduces power, enabling higher performance circuit-design

Page 24: Stratix 10: 14nm FPGA Delivering 1GHz - Hot Chips · Stratix® 10: 14nm FPGA Delivering 1GHz ... Adoption of Intel 14nm tri-Gate process ... development from digital FPGA fabric ...

Thank You