Serial Memories Fill a Need - MemCon · Mark Baumann Director of Applications Bandwidth Engine Serial Interface ... Topics ! Parallel Interface evolution – faster, wider # How long
Post on 29-Nov-2018
214 Views
Preview:
Transcript
Agenda
v Michael Sporer – Director of Marketing
§ The future of parallel versus serial interface for memory
v Mark Baumann – Director of Applications Engineering
§ Based on experience at MoSys developing and introducing the GigaChip interface and 1st, 2nd and 3rd generations of Bandwidth Engine ICs we will describe several options for future memory interface solutions.
Copyright ©MoSys, Inc. 2015. All rights reserved. 2 MemCon 2015 - October 12th
Discrete DRAM doesn’t do Serial… yet
v Memory is the last holdout that still hasn’t gone serial
Copyright ©MoSys, Inc. 2015. All rights reserved. 3 MemCon 2015 - October 12th
Challenges of Implementing DDR
Copyright ©MoSys, Inc. 2015. All rights reserved. 4
Source: Agilent MemCon 2015 - October 12th
DRAM bus trace length matching requirements
Design, Development & Qualification
Tradeoffs: Serial vs. Parallel
v On the Chip § SerDes adds costs on chip
• MUX deMUX • 2.5GHz chip with 25 Gbps IO
v IO Bandwidth / Chip Area § Roughly the same on chip § Depends on the range
v IO Bandwidth / Power § It depends on reach
v On the Board § Fewer lanes
• 25GHz is more challenging, but is solvable
§ Longer reach than parallel • Easier board floor planning • Distributed thermal loads
§ Greater noise immunity
v Is it a balanced tradeoff?
Copyright ©MoSys, Inc. 2015. All rights reserved. 5 MemCon 2015 - October 12th
HMC gives them the bandwidth they need
v “DDR has run out of pins on the package”
Copyright ©MoSys, Inc. 2015. All rights reserved. 6
Source: Xilinx Technology Outlook -‐ Liam Madden, FPL, Sept-‐2014 MemCon 2015 - October 12th
TSV Based DRAM Stacks
v The performance potential of TSV based DRAM stacks can be realized with two very different interface and packaging solutions.
v High Bandwidth Memory (HBM) § Evolutionary § wide, parallel interface
v Hybrid Memory Cube (HMC) § high performance serial interface.
v Both solutions have their place in new systems design and there are advancements in both options on the horizon.
Copyright ©MoSys, Inc. 2015. All rights reserved. 7 MemCon 2015 - October 12th
and HBM is coming …
v Just look at what AMD and nvidia have planned
Copyright ©MoSys, Inc. 2015. All rights reserved. 8 MemCon 2015 - October 12th
HBM Gen1 shipping now
HBM Gen2 coming soon
Interposer based MCM
v Xilinx highlighted that the technology wasn’t the critical element, it was the supply chain.
Copyright ©MoSys, Inc. 2015. All rights reserved. 9
Source: Xilinx Technology Outlook -‐ Liam Madden, FPL, Sept-‐2014 MemCon 2015 - October 12th
Economics of Direct Attach HBM
v @Customer: Can customer afford Direct Attach HBM? § Interposer development costs § Fixed memory footprint § Special Supply Chain
§ What is the volume required to recoup incremental costs?
v @Manufacturer: Can DA-HBM exist in a low volume, high mix manufacturing environment?
Copyright ©MoSys, Inc. 2015. All rights reserved. 10 MemCon 2015 - October 12th
Serial HBM Solution
v Serial HBM Reduces Risk at the Customer § Lower Technology Risk
• Pin count advantage for host device, • Ease of routing a serial interface • Standard CEI interface • Scalable and versatile
§ Component type Supply Chain • Inventories • Test and Burn-In
§ Cost Advantages • Standard board assembly
v Serial HBM Markets § Networking
• Packet Buffering and high capacity tables § Embedded
• Supports a range of capacity and speeds with long product lifecycles • Protects customers from changing HBM memory interface on host
v All the Bandwidth but none of the headaches of DA-HBM
12 Copyright ©MoSys, Inc. 2015. All rights reserved.
Serial Interface HBM
shim GCI
MemCon 2015 - October 12th
Flexible Capacity Expansion : Serial
v One host port of 16 lanes can connect to 1, 2 or 4 devices
v No additional bus loading or pin count
v No throughput degradation
§ Expansion example shows MoSys Bandwidth Engine
Host
16 8 8
4 4 4 4
Host
Host 1x
4x
2x
13 Copyright ©MoSys, Inc. 2015. All rights reserved.
HBM Memory Solutions
v Direct Attach HBM – 4 HBM § MCM Yield § Single Sourced § Interface support longevity § Memory controller complexity and power
added to ASIC
v Serial HBM Package on Package § Tested and optional burn in of component
HBM before MCM assembly § shim features optimized for application § Incremental power for additional shim ASIC § USR SerDes for MCM
v Serial HBM On Motherboard: § VSR SerDes for Motherboard § Lowest Cost, highest yield solution § 30% board area increase § Easiest thermal solution
Copyright ©MoSys, Inc. 2015. All rights reserved. 15
ASIC 55 um
HB
M
HB
M
HB
M
HB
M
ASIC 180 um
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
HB
M
shim
ASIC 180 um
MemCon 2015 - October 12th
Serial vs. Direct Attach Value Comparison
Copyright ©MoSys, Inc. 2015. All rights reserved. 16
A+ribute Serial HBM Direct A+ach HBM
Technical Risk + +
• Smaller Interposer • Discrete Component BI & Test
-‐ -‐
• MCM Yield • HBM Repair
Cost + +
• Lower yielded cost • Supply Chain Inventory
-‐ -‐
• MCM Development Cost • MCM Yield
Power -‐ • incremental power /BW + • Lower power Thermal + • Distributed sources -‐ • Higher Thermal Density
Time to Market + +
• Proven Standard SerDes • Discrete Component Design
-‐ -‐
• HBM Interface IP Availability • MCM Complexity
Flexibility + + +
• On or Off substrate • Memory expansion • Fungible Serdes
-‐ -‐
• Depopulate or not • Single purpose HBM IO Block
Reliability + +
• Burn-‐In OpUon • Field Repair managed in Serial HBM
-‐
• JEDEC Field Repair in host ASIC
Supply Chain Ownership
+ + +
• Single Point • Discrete component • MulU-‐sourced
-‐ -‐ -‐
• MulUple or Single Points • MCM Model • Single Sourced
Board Area -‐ • 0% to 30% larger + • baseline
MemCon 2015 - October 12th
Normalized Yielded Cost of HBM
Copyright ©MoSys, Inc. 2015. All rights reserved. 17 MemCon 2015 - October 12th
Assembly yield expected to be 95%
HMC – Hybrid Memory Cube
v Breakthrough in power due to TSV based construction § 5 pJ/b DRAM only
v Combined with Logic die resulting in 24.5W per 1Tbps § 3 links @ 12.5G § 24.5 pJ/b total (vs. 39 for DDR4)
Copyright ©MoSys, Inc. 2015. All rights reserved. 18 MemCon 2015 - October 12th
Serial vs. Parallel Memory Comparison
Attribute Bandwidth Engine BE-2 | BE-3
Hybrid Memory Cube (HMC)
High Bandwidth Memory (JEDEC)
DDR4 (JEDEC)
Physical Interface Serial CEI Standard Serial CEI Std JEDEC HBM IO JEDEC DDR4 IO
Protocol GigaChip™ Interface HMC Consortium RAS/CAS
Source of Supply Dual-Sourced Single Sourced Multi-Sourced
Access TDM Scheduler Sched./Switch Banked RAM
Capacity 576 Mb 1152 Mb 16~32 Gb 32-64 Gb 4-8 Gb
Buffer Bandwidth 400 Gbps 800 Gbps 1280 Gbps 2048 Gbps 38 Gbps
Transaction Rate >4.5 Bt/s >10 Bt/s 2.6~2.9 Bt/s TBD 0.2 Bt/s
Signal Pins 66 66 272 ~1600 42
Package BGA 19x19 BGA 25x25 BGA 31x31 KGSD BGA 8x12
Power 7-11W TBA ~28W 8W estimated 0.7W
DDR4 ~ 16+20Switch
Serial IO
16 16 16 16
………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
TDM / Scheduler
Serial IO
8 8
19 Copyright ©MoSys, Inc. 2015. All rights reserved.
Channel 0 Channel 1
HBM – 8 channels & 128 banks,
~1600 pins, Si Interposer
MemCon 2015 - October 12th
Future TSV DRAM Comparison
Copyright ©MoSys, Inc. 2015. All rights reserved. 20
Direct A+ach HBM Serial HBM concept HMC
Bandwidth equal
Interposer / Yield cost CPU Memory Memory
Power 1x <2x >3x
Latency Lowest Low ?
DeterminisUc Yes Yes No
Longevity of Interface 5 years indefinitely
Field Repair Host based Serial HBM based HMC based
Host IO (PHY & pins) Single Purpose General Purpose and LP SerDes
Test or Burn-‐In Not possible Possible
Supply Chain MCM-‐type Component
ApplicaUon Performance
none OpUmized for applicaUon
Generic HMC SpecificaKon
Source MulU-‐sourced Single Source
MemCon 2015 - October 12th
The Ultimate Network Processor’s Memory Implementation
v Memcon 2014 MoSys presented on extreme memories for networking and showed the relative position and value for different memories for a 1.2Tbps Network processor.
v HBM for buffering v Serial memories
for header processing and search
v Off chip PHY to optimize datapath
v This is a great point solution for 1.2 Tbps datapath
v What about less extreme systems?
Copyright ©MoSys, Inc. 2015. All rights reserved. 22 MemCon 2015 - October 12th
Fron
t Pan
el
Example 400G Line Card w/ EZchip NPS Z30 Adds 50% System Memory Bandwidth
Packet Buffer 24 x DDR4 devices
Embedded Memory
uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP uP
Intelligent Offload Flexible Feature &
Performance Expansion
Memory I/O Memory bandwidth for Packet Buffering, cores
and HW Accelerators
Packet Forwarding Engine
Hardware Accelerators
8-16 serial lanes
Bac
kpla
ne
MoSys Framer/
Gear Box
MoSys
MSRZ30
FIC
Flexibility + Performance “C” Programmable Processors
+ L2-L7 Accelerators
23 Copyright ©MoSys, Inc. 2015. All rights reserved.
DDR4
DDR4
DDR4
DDR4
DDR4
DDR DDR4
DDR4
DDR4
DDR4
DDR4
DDR DDR4
DDR4
DDR4
DDR4
DDR4
DDR DDR4
DDR4
DDR4
DDR4
DDR4
DDR
MemCon 2015 - October 12th
800GE Using Serial HBM & BE3
Copyright ©MoSys, Inc. 2015. All rights reserved. 24
400G PFE (ASIC/FPGA)
400G PFE (ASIC/FPGA)
4 x 100G
4 x 100G
Optics Module
GB/RT
LineSpeed Gearbox, Retimer
Optics Module
GB/RT
LineSpeed Gearbox, Retimer
Bandwidth Engine Gen 3
Shared: • FIB Tables • Statistics • Metering • Semaphores • Packet Buffers
MemCon 2015 - October 12th
shim
GC
I
Conclusion
v Serial memory offers advantages over Direct Attach HBM § Economics driven by Supply Chain § Flexible and adaptable § Scalable performance § Quality and reliability § Simplifying board design and cooling
v Pick your memory for your application § Memory core performance and capacity (DRAM vs. others) § Architecture ( Point to Point versus Chainable) § IO serial vs. parallel
v DDR DRAM is the defacto standard based on decades of evolution and optimization. § If DDR doesn’t meet your needs there are other options available.
Copyright ©MoSys, Inc. 2015. All rights reserved. 25 MemCon 2015 - October 12th
Topics
v Parallel Interface evolution – faster, wider à How long can this Last?
v Serial Interface evolution – NRZ à PAM4 à emerging
v Interface efficiency – HMC vs. GCI vs. ILA v Standards based solutions vs. proprietary v Interface for offload (abstracted)
§ serial is better (variable size transfers) § Splitting transaction layer from transport layer
v Purpose built vs. Fungible IO
Copyright ©MoSys, Inc. 2015. All rights reserved. 27 MemCon 2015 - October 12th
NPU Interface Options Today
NPU SSTL/HSTL SerDes
DDR-3 SDRAM
RLDRAM
QDR SRAM
KBP/ TCAM
SSTL/HSTL
SSTL/HSTL SerDes
SerDes
DDR Style Serial Style
Net
wor
k &
Bac
kpla
ne In
terf
aces
XAUI
10G KR
Interlaken
PCIex
Mem
ory
& C
oPro
cess
or
28 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
NPU Interfaces Using Serial
NPU SerDes
DDR-3 SDRAM
SerDes
SerDes
Serial Style Serial Style
Net
wor
k &
Bac
kpla
ne In
terf
aces
SerDes
SerDes
DDR-3 Bridge
Enabled by 10G KR GCI enabled SerDes
SSTL/HSTL
3x to 4x Bandwidth Density per mm2
GCI
GCI
Interlaken
KBP/ TCAM
Serial SRAM?
BE
XAUI
10G KR
Interlaken
PCIex Mem
ory
& C
oPro
cess
or
29 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
NPU Interfaces Using Serial
NPU SerDes
SerDes
SerDes
Serial Style Serial Style
Net
wor
k &
Bac
kpla
ne In
terf
aces
SerDes
SerDes
HMC or Ser. HBM
Enabled by 10G KR GCI enabled SerDes
SSTL/HSTL
3x to 4x Bandwidth Density per mm2
GCI
Interlaken
KBP/ TCAM
Serial SRAM?
BE
XAUI
10G KR
Interlaken
PCIex Mem
ory
& C
oPro
cess
or
30 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
GigaChip Interface Layers & Frame Format
Transaction Application Specific
Data Link
Physical Coding Sublayer (PCS)
Physical Media Access Electrical
Link initialization Lane Deskew Scrambling
Reliable transport of Frames via CRC & Positive Ack
GigaChip Interface Protocol
PC Board Trace
BE QDR,TCAM…
32 Copyright ©MoSys, Inc. 2015. All rights reserved.
CEI CompaUble SerDes
Payload DLL Rx Ack CRC
Data Link Layer Frame Format
v Frame striped across SerDes lanes (1, 2, 4, 8,16) § Modulo 10 UI, Fixed size § Sized to meet needs of application § >90% bandwidth efficiency at 80b
v Data Link Layer operations § DLL Indicates if payload is Transaction Link Layer
operation or Data Payload § Data Link Layer operations: Replay, Pause (no-op)
v Data Payload format up to application § Op codes, address, data…formatting left to higher level § For memory transactions: 1 frame = transaction § For packets: variable number of frames can be used
72b 1b 1b 6b
MemCon 2015 - October 12th
CRC Error Handling w/Positive Ack
Tx Request Transactor
Queue
Device A CSI Tx
Device B CSI Rx
CRC Error Check
Rx Target Transactor
Queue
Rx Ack Counter
Tx SerDes
Rx SerDes CRC
Gen
Ack Count
Compare, Set Tx
Replay if “stuck”
Tx Replay Queue
Rx SerDes
Prev Rx Ack Count
Rx SerDes
PISO SIPO
6
1
Ack Count
1
Compare Ack, Replay when
“stuck”
Freeze Ack If CRC Error, Resume Replay Frame
Post if CRC OK, Freeze if not OK, Resume posUng on Replay Frame
72 72
72 + 6 72 + 6
33 Copyright ©MoSys, Inc. 2015. All rights reserved. MemCon 2015 - October 12th
Multi Core => Multi-Partition & Multi-bank
Copyright ©MoSys, Inc. 2015. All rights reserved. 34
Packet Processor 0
1
n-1
n
Serial Link
Serial Link
Serial Link
Serial Link
…
…
…
Bandwidth Engine
Multi-cycle Scheduler
10 GA
800 Gb/s
BIST Self- repair
…
…
ingress egress
Multi-bank Multi-partitions allow for high access availability
Multi-threaded Multi-Cores allow for high processing throughput Multi-linked
allow for concurrent transport operations
ALU for functional Acceleration Local processing minimizes intra-chip traffic
Allows Extended Carrier Class & In package Repair
ALU
MemCon 2015 - October 12th
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 5 10 15 20 25 30 35 40 Payload Size (B)
Read-‐Only Data Efficiency
BE
ILA
HMC
Protocol Transfer Efficiency Comparison: Range of Payload Sizes and Applications
35 Copyright ©MoSys, Inc. 2015. All rights reserved.
Transfer Efficiency = Data / (CMD + Address + Data + Transport Protocol)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 20 40 60 80 100 120 140 160 180 Payload Size (B)
Read/Write Data Transfer Efficiency
BE 50:50
HMC 50:50
HMC 128B Block Size HMC 64B HMC 32B
Packet Header Processing Application Packet Buffering Applications
Efficiency includes Transaction & Transport protocol:
Note GCI: GCI + TL 2.0
HMC 32B Block Size
MemCon 2015 - October 12th
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 10 20 30 40 50 60 70 80 Frame Size (Bytes)
ILA
Interlaken
GCI 2.0
Protocol Transport Efficiency Comparison: GCI Optimized For Smaller Transfers
36 Copyright ©MoSys, Inc. 2015. All rights reserved.
GCI + TL 2.0
GCI ≈ Interlaken
GCI ~ 2x Interlaken
Packet Transfers
Header Processing
MemCon 2015 - October 12th
Serial Link Rate Road Map
v Xilinx UltraScale+ 2016 33G GTY SerDes
v BE3 2016 Q1 31G SerDes
v 56G PAM4 is being demonstrated now
Copyright ©MoSys, Inc. 2015. All rights reserved. 37 MemCon 2015 - October 12th
CEI-56G Will Address Chip to Chip, Module, +
Copyright ©MoSys, Inc. 2015. All rights reserved. 38 MemCon 2015 - October 12th
Summary
v GCI is a proven chip to chip reliable transport protocol § Multiple designs in FPGA, ASIC and ASSP in production systems
v GCI Specification is freely available without restriction on use § Same as Interlaken model
v GCI protocol is designed to evolve as the CEI standard evolves
v The inherent performance efficiency of GCI naturally equates to improved energy efficiency
Copyright ©MoSys, Inc. 2015. All rights reserved. 39 MemCon 2015 - October 12th
top related