7/30/2019 Async2006
1/23
1
A 24-port 10G Ethernet Switch(with asynchronous circuitry)
Andrew Lines
7/30/2019 Async2006
2/23
2
Agenda
Product Information
Technical Details
Photos
7/30/2019 Async2006
3/23
3
Tahoe: First FocalPoint Family Member
10G Ethernet switch- 24 Ports
Line rate performance- 240Gb/s bandwidth- 360M frames/s
- Full-speed multicast Fully-integrated single chip
- 1MB frame memory- 16K MAC addresses
Lowest latency Ethernet- 200ns with copper cables
Rich Feature Set- Extensive layer 2 features
Flexible SERDES interfaces- 10G XAUI (CX-4)- 1G SGMII
The lowest-latency feature-rich 10GE switch chipTahoe
Asynchronous Blocks
Frame Processor
SPI
XA
UI(CX-4
)
XA
UI(CX-4
)
Ne
xus
Ne
xus
(packet storage)RapidArray
(Scheduler)
LEDCPU JTAG
7/30/2019 Async2006
4/23
7/30/2019 Async2006
5/23
5
Tahoe Chip Plot
Ethernet Port Logic- SerDes- PCS- MAC
Nexus Crossbars- 1.5Tb/s total- 3ns latency
MAC Table- 16K addresses
RapidArray Memory- 1MB shared
Scheduler- Highly optimized- High event rate
Management- CPU interface- JTAG- EEPROM interface- LEDs
Frame Control- Frame handler- Lookup- Statistics
Fabricated in TSMC 0.13um
7/30/2019 Async2006
6/23
6
Bridge Features
General Bridge Features- 16K MAC entries- STP: multiple, rapid, standard- Learning and Ageing- Multicast GMRP and IGMPv3
VLAN Tag (IEEE 802.1Q-2003)
- Add / Remove tags- Per port association default- 4K-entry VLAN-ID table- Per VLAN, per-port STP
Scheduling, Pause, Congestion
- 16 traffic classes for WRED
- 4 queues per port scheduling
- WRR or strict priority- Pause support
Security- 802.1x; MAC Address Security
Monitoring- Rich monitoring terms
logical combination of terms Src Port, Dst Port, VLAN,
Traffic Type, Priority, SrcMA, Dst MA, etc.- Monitoring action
Drop, Mirror, Redirect,Count, Change Priority
- 16 rules per frame
Statistics- RFC 2819 compliant
- All counters are 64 bits- 13 counter groups
RMON and SMON Fulcrum extensions
Robust set of layer-2 features
7/30/2019 Async2006
7/237
Fabri
cChip
Fabri
cChip
LineChi
p
LineChi
p
LineChi
p
Fabri
cChip
LineChi
p
LineChi
p
Intra-switchLink (ISL)
Link Aggregation and Fat Tree Support
Ingress tofabric hopuses LinkAggregationhardware toload balance
True IEEE-compliant LinkAggregation used to group linksbetween line and fabric switches
Symmetric hashing guaranteesa conversation resolves to thesame fabric switch
MAC A MAC B
Link Aggregationchip features
Configuration
- 12 trunk groups
- Any ports in a group
- Up to 12 members
Hash: Ethernet CRC
- Programmable Input
- SA, DA, Type, VLAN-ID, Priority, Source port
- SA-DA hash symmetryforcing
- Group renumbering Other HW hooks
- Slow protocol traps
7/30/2019 Async2006
8/238
Two Versions Sampling in Q1 2006
FM2224- 24 10GE Interfaces- 1433-ball BGA
- 40mm- $450
FM2112- 8 10GE Interfaces and- 16 1-2.5GE Interfaces- 897-ball BGA- 32mm- $265
Announced pricing at SC|05
First company to break through $20/port for 10GE
7/30/2019 Async2006
9/239
24-Port Reference Design (Now Shipping)
1 2 3 4 5 6 7 8 9 10 11 12 ETH
CSL
13 14 15 16 17 18 19 20 21 22 23 24
Evaluation Platform
7/30/2019 Async2006
10/2310
Agenda
Product Information
Technical Details
Photos
7/30/2019 Async2006
11/2311
Tahoe Hardware Features
Multiple Frequency Requirements- 3.125GHz serial links (licensed from RAMBUS)
- 312.5MHz 32-bit datapaths (sync and async)
- 750MHz MAC Table, Scheduler, Main Memory, Statistics,cross-chip interconnect (async)
- 360MHz Frame Processing (sync)
- 66MHz Management (sync)
Mixed design styles- 3 synchronous blocks: synthesize, place, and route
- Many custom async blocks (most of the transistors)
- Licensed cores: SERDES, PLL, TTL pads, fusebox
7/30/2019 Async2006
12/2312
Tahoe Chip Statistics
TSMC 0.13um LVOD FSG 1.2V
105M transistors
Over 3000 unique cells
1.5MB total SRAM (all asynchronous)
0.5-1.5W per port depending on activity (36W peak)
Flip-chip BGA package
7/30/2019 Async2006
13/2313
Sync and Async together?
Use existing 3rd party IP cores for synchronous I/O,
such as high-speed SERDES from RAMBUS.
Use standard synchronous synthesis, place, and
route flow to implement logically complex units with
lower speed requirements.
Use async flow only where it has the biggest
advantages SRAMs, crossbars, chip-wide
interconnect, FIFO's, and high-speed blocks.
Must partition the problem in Architecture.
Some day everything will be Async, but not yet!
7/30/2019 Async2006
14/2314
Simple Sync-to-Async Conversion
Synchronous Request / Grant FIFO protocol
S2A
SynchronousDatapath
Request
Grant
clock
AsynchronousDatapath
A2S
Synchronous
Datapath
Request
Grant
clock
AsynchronousDatapath
Seamlessly Bridges Different Clock Domains
7/30/2019 Async2006
15/2315
Digital Verification
Often overlooked in Academia, but crucial in Industry!
There are nearly as many engineers in verification as thereare in design.
Use industry-standard approach of a full-chip simulationwith test-bench, test suite, regression engine.
Try to get full line and conjunct coverage. Convert CSP/PRS into Verilog for chip-level simulation
combined with synchronous blocks.
Also use simple closed-environment self-tests to check thatdifferent levels of async decomposition match, but this is
not sufficient.
7/30/2019 Async2006
16/2316
Design For Test
Must be able to check for manufacturing defects in
async blocks.
Introduce special scan-buffers which integrate a
serial shift register into an async buffer.
Connect the scan-buffers into 16 serial scan-chains.
Can issue an inject, drain, or skip command to each
scan-buffer on a scan-chain.
External clocked interface to standard testers.
Commercial fault-grading tool (ZOIX).
7/30/2019 Async2006
17/2317
Async SRAM in FocalPoint
Use TSMC 6T state bit layout
Multi-bank design connected with async crossbars and busses
Supports up to 32 write ports and 32 read ports in parallel
Bank runs at 600MHz, but interconnect sustains 750MHz
7/30/2019 Async2006
18/2318
SRAM Test and Repair
Scan-buffers integrated into most SRAM banks.
On-chip accelerated testing for largest SRAM.
Tester produces a defect map.
Burn fusebox to use spare addresses to repair bit or
address-line errors.
In many SRAMs, can simply remove a block of bad
segments of storage from the free memory pool.
This can repair many more types of errors.
Yield looks quite good so far, as expected.
7/30/2019 Async2006
19/2319
Agenda
Product Information
Technical Details
Photos
7/30/2019 Async2006
20/2320
FocalPoint Test Platform
7/30/2019 Async2006
21/2321
FocalPoint EP Board
7/30/2019 Async2006
22/23
22
FocalPoint EP Rack
7/30/2019 Async2006
23/23
Wishlist
CSP vs CSP formal verification
CSP vs PRS formal verification
ATPG tools for async circuits
Static timing for async circuits
Async synthesis from CSP
65nm advice
If you've working on any of these, talk to me!