Top Banner

of 23

Async2006

Apr 14, 2018

Download

Documents

thangnm
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/30/2019 Async2006

    1/23

    1

    A 24-port 10G Ethernet Switch(with asynchronous circuitry)

    Andrew Lines

  • 7/30/2019 Async2006

    2/23

    2

    Agenda

    Product Information

    Technical Details

    Photos

  • 7/30/2019 Async2006

    3/23

    3

    Tahoe: First FocalPoint Family Member

    10G Ethernet switch- 24 Ports

    Line rate performance- 240Gb/s bandwidth- 360M frames/s

    - Full-speed multicast Fully-integrated single chip

    - 1MB frame memory- 16K MAC addresses

    Lowest latency Ethernet- 200ns with copper cables

    Rich Feature Set- Extensive layer 2 features

    Flexible SERDES interfaces- 10G XAUI (CX-4)- 1G SGMII

    The lowest-latency feature-rich 10GE switch chipTahoe

    Asynchronous Blocks

    Frame Processor

    SPI

    XA

    UI(CX-4

    )

    XA

    UI(CX-4

    )

    Ne

    xus

    Ne

    xus

    (packet storage)RapidArray

    (Scheduler)

    LEDCPU JTAG

  • 7/30/2019 Async2006

    4/23

  • 7/30/2019 Async2006

    5/23

    5

    Tahoe Chip Plot

    Ethernet Port Logic- SerDes- PCS- MAC

    Nexus Crossbars- 1.5Tb/s total- 3ns latency

    MAC Table- 16K addresses

    RapidArray Memory- 1MB shared

    Scheduler- Highly optimized- High event rate

    Management- CPU interface- JTAG- EEPROM interface- LEDs

    Frame Control- Frame handler- Lookup- Statistics

    Fabricated in TSMC 0.13um

  • 7/30/2019 Async2006

    6/23

    6

    Bridge Features

    General Bridge Features- 16K MAC entries- STP: multiple, rapid, standard- Learning and Ageing- Multicast GMRP and IGMPv3

    VLAN Tag (IEEE 802.1Q-2003)

    - Add / Remove tags- Per port association default- 4K-entry VLAN-ID table- Per VLAN, per-port STP

    Scheduling, Pause, Congestion

    - 16 traffic classes for WRED

    - 4 queues per port scheduling

    - WRR or strict priority- Pause support

    Security- 802.1x; MAC Address Security

    Monitoring- Rich monitoring terms

    logical combination of terms Src Port, Dst Port, VLAN,

    Traffic Type, Priority, SrcMA, Dst MA, etc.- Monitoring action

    Drop, Mirror, Redirect,Count, Change Priority

    - 16 rules per frame

    Statistics- RFC 2819 compliant

    - All counters are 64 bits- 13 counter groups

    RMON and SMON Fulcrum extensions

    Robust set of layer-2 features

  • 7/30/2019 Async2006

    7/237

    Fabri

    cChip

    Fabri

    cChip

    LineChi

    p

    LineChi

    p

    LineChi

    p

    Fabri

    cChip

    LineChi

    p

    LineChi

    p

    Intra-switchLink (ISL)

    Link Aggregation and Fat Tree Support

    Ingress tofabric hopuses LinkAggregationhardware toload balance

    True IEEE-compliant LinkAggregation used to group linksbetween line and fabric switches

    Symmetric hashing guaranteesa conversation resolves to thesame fabric switch

    MAC A MAC B

    Link Aggregationchip features

    Configuration

    - 12 trunk groups

    - Any ports in a group

    - Up to 12 members

    Hash: Ethernet CRC

    - Programmable Input

    - SA, DA, Type, VLAN-ID, Priority, Source port

    - SA-DA hash symmetryforcing

    - Group renumbering Other HW hooks

    - Slow protocol traps

  • 7/30/2019 Async2006

    8/238

    Two Versions Sampling in Q1 2006

    FM2224- 24 10GE Interfaces- 1433-ball BGA

    - 40mm- $450

    FM2112- 8 10GE Interfaces and- 16 1-2.5GE Interfaces- 897-ball BGA- 32mm- $265

    Announced pricing at SC|05

    First company to break through $20/port for 10GE

  • 7/30/2019 Async2006

    9/239

    24-Port Reference Design (Now Shipping)

    1 2 3 4 5 6 7 8 9 10 11 12 ETH

    CSL

    13 14 15 16 17 18 19 20 21 22 23 24

    Evaluation Platform

  • 7/30/2019 Async2006

    10/2310

    Agenda

    Product Information

    Technical Details

    Photos

  • 7/30/2019 Async2006

    11/2311

    Tahoe Hardware Features

    Multiple Frequency Requirements- 3.125GHz serial links (licensed from RAMBUS)

    - 312.5MHz 32-bit datapaths (sync and async)

    - 750MHz MAC Table, Scheduler, Main Memory, Statistics,cross-chip interconnect (async)

    - 360MHz Frame Processing (sync)

    - 66MHz Management (sync)

    Mixed design styles- 3 synchronous blocks: synthesize, place, and route

    - Many custom async blocks (most of the transistors)

    - Licensed cores: SERDES, PLL, TTL pads, fusebox

  • 7/30/2019 Async2006

    12/2312

    Tahoe Chip Statistics

    TSMC 0.13um LVOD FSG 1.2V

    105M transistors

    Over 3000 unique cells

    1.5MB total SRAM (all asynchronous)

    0.5-1.5W per port depending on activity (36W peak)

    Flip-chip BGA package

  • 7/30/2019 Async2006

    13/2313

    Sync and Async together?

    Use existing 3rd party IP cores for synchronous I/O,

    such as high-speed SERDES from RAMBUS.

    Use standard synchronous synthesis, place, and

    route flow to implement logically complex units with

    lower speed requirements.

    Use async flow only where it has the biggest

    advantages SRAMs, crossbars, chip-wide

    interconnect, FIFO's, and high-speed blocks.

    Must partition the problem in Architecture.

    Some day everything will be Async, but not yet!

  • 7/30/2019 Async2006

    14/2314

    Simple Sync-to-Async Conversion

    Synchronous Request / Grant FIFO protocol

    S2A

    SynchronousDatapath

    Request

    Grant

    clock

    AsynchronousDatapath

    A2S

    Synchronous

    Datapath

    Request

    Grant

    clock

    AsynchronousDatapath

    Seamlessly Bridges Different Clock Domains

  • 7/30/2019 Async2006

    15/2315

    Digital Verification

    Often overlooked in Academia, but crucial in Industry!

    There are nearly as many engineers in verification as thereare in design.

    Use industry-standard approach of a full-chip simulationwith test-bench, test suite, regression engine.

    Try to get full line and conjunct coverage. Convert CSP/PRS into Verilog for chip-level simulation

    combined with synchronous blocks.

    Also use simple closed-environment self-tests to check thatdifferent levels of async decomposition match, but this is

    not sufficient.

  • 7/30/2019 Async2006

    16/2316

    Design For Test

    Must be able to check for manufacturing defects in

    async blocks.

    Introduce special scan-buffers which integrate a

    serial shift register into an async buffer.

    Connect the scan-buffers into 16 serial scan-chains.

    Can issue an inject, drain, or skip command to each

    scan-buffer on a scan-chain.

    External clocked interface to standard testers.

    Commercial fault-grading tool (ZOIX).

  • 7/30/2019 Async2006

    17/2317

    Async SRAM in FocalPoint

    Use TSMC 6T state bit layout

    Multi-bank design connected with async crossbars and busses

    Supports up to 32 write ports and 32 read ports in parallel

    Bank runs at 600MHz, but interconnect sustains 750MHz

  • 7/30/2019 Async2006

    18/2318

    SRAM Test and Repair

    Scan-buffers integrated into most SRAM banks.

    On-chip accelerated testing for largest SRAM.

    Tester produces a defect map.

    Burn fusebox to use spare addresses to repair bit or

    address-line errors.

    In many SRAMs, can simply remove a block of bad

    segments of storage from the free memory pool.

    This can repair many more types of errors.

    Yield looks quite good so far, as expected.

  • 7/30/2019 Async2006

    19/2319

    Agenda

    Product Information

    Technical Details

    Photos

  • 7/30/2019 Async2006

    20/2320

    FocalPoint Test Platform

  • 7/30/2019 Async2006

    21/2321

    FocalPoint EP Board

  • 7/30/2019 Async2006

    22/23

    22

    FocalPoint EP Rack

  • 7/30/2019 Async2006

    23/23

    Wishlist

    CSP vs CSP formal verification

    CSP vs PRS formal verification

    ATPG tools for async circuits

    Static timing for async circuits

    Async synthesis from CSP

    65nm advice

    If you've working on any of these, talk to me!