Silicon Nanophotonic Network- On-Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf, Luca P. Carloni, Keren Bergman
Jan 14, 2016
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
Gilbert Hendry – Columbia University
Johnnie Chan, Shoaib Kamil, Lenny Oliker,
John Shalf, Luca P. Carloni, Keren Bergman
2
Why Photonics?
TX RX
ELECTRONICS: Buffer, receive and re-
transmit at every router.
Each bus lane routed independently. (P NLANES)
Off-chip BW is pin-limited and power hungry.
Photonics changes the rules for Bandwidth, Energy, and Distance.
OPTICS: Modulate/receive high
bandwidth data stream once per communication event.
Broadband switch routes entire multi-wavelength stream.
Off-chip BW = On-chip BW for nearly same power.
RX
TX
RX RX
TX
RX
TXRXTX
TX TXTXTX TX
RX
Silicon Photonic Integration
Cornell, 2005
Sandia, 2008 Ghent, 2007
Columbia, 2008
Cornell, 2009
Photonic Networks-on-Chip
[U. of Wisconsin, HP] [MIT] [Columbia]
Corona Photonic Clos PhotonicTorus
Ring Resonators
Modulator/filter
λ λ
Broadband
Circuit-switched P-NoCs
SD
0V1V
n-region
p-region
Electronic Control
0V1V
Ohmic Heater
Thermal Control
Tran
sm
issi
on
Injected Wavelengths
Off-resonance profile
On-resonance profile
Energy-efficient end-to-end transmission
High bandwidth through WDM
Electronic network still available for small control messages*
Network-level support for secure regions
Pros:
Cons:
* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
Circuit-switched P-NoCs
Path setup latency Path setup contention
(no fairness) Longer paths block more
Head-of-line blocking at gateways
Head of Line Blocking
Core
Core
Core
Core
Tx/Rx
Netw
ork
IF
Bidirectional Waveguide
Bidirectional Electronic Channel
Control Router
Electronic Crossbar
5-port photonic switch
To/From Control plane
To/From Data plane
Seri
aliz
atio
n
Dri
vers
Des
eria
liza
tion
Rec
eive
rs
* [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]
External Concentration*
TDM Arbitration
Tim
e sl
ot
0 Tim
e sl
ot
1 Tim
e sl
ot
T
…
t0t1t2
t3t4
tC-3tC-2tC-1
Synchronous Gateway/Control
Time slot ~ 10nsTDM sync clock ~ 100MHz
Nonblocking Network Scheduling
Time slot 0
Time slot 1
Time slot 2
Required time slots = N-1
However…
0
10
20
30
40
50
Inse
rtio
n L
oss
(dB
)
Topology Size (nodes)
Non-BlockingTorus Topology
18.7 25.331.5
38.044.1
50.656.8
63.2
[M. Petracca et al. IEEE Micro, 2008]
Nonblocking topology difficult to implement because of Insertion Loss
* [J. Chan et al. Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis. JLT, May 2010
Scheduling Time Slots
Problem: Blocking Network Full coverage Minimize Time
Slots (most comm. per
slot)
Constraints: Source contention Destination
contention Topology
contention
Solution: Genetic Search
S
S
S
S
SS
S
S S
S
S
S S
SS
S
S
S S
SS
S
S
S
S
S
SS
S
S S
SS
S
S
S
S
S
SS
S
Population
(size P)
Selection(down to size
psxP)
Reproduction(back to P)
Mutation(still P)
Slot 0: c0, c5, c7, c8Slot 1: c23, c6, c58…Slot T: c42, c65, c1
Initialization
S
Slot 0: c0Slot 1: c1…Slot N2: cN2
Fitness = 1/(number of time slots)
Reproduction: Birds and Bees
S0
S1
c0, c3, c60, c19c27, c4
c100, c71, c9
c1, c17, c23
…
C
c12, c2, c1, c60c100, c82, c9
c0
c89, c56, c16, c63
…
c0, c3, c60, c19c12, c2, c1, c60
Mutation: Secret of the Ooze
S
c0, c3, c60, c19c27, c4
c100, c71, c9
c1, c17, c23
…
c100c71c9
S
c0, c3, c60, c19, c9c27, c4, c100
c1, c17, c23, c71
…
c100c71c9
Schedule Results
Pop size = 50 Mutation prob = 0.8
16-node 36-node 64-node
10 20 30 40 50 60 701
10
100
1000
10000
10
100
1000
10000
Network size
Exe
cuti
on T
ime
(s)
Sol
utio
n (N
umbe
r of
slo
ts)
Implementation: Photonic Switch
200µm rings Total switch size =
1.4mm x 1.4mm No
S->W, S->E, N->W, N->E (X-then-Y routing)
Implementation: Switch Control Width of LUT = 12
(number of rings) Length of LUT = T
(number of time slots)
Implementation: Network Gateway 1. Send request 2. Grant, set x-
bar and transmit to serializer
3. Receive, deserialize
4. Store in temp buffer, request to core
Simulation Setup PhoenixSim* – Photonic and Electronic
network simulator 64 cores E-mesh, P-mesh, P-TDM Traffic
Random – 32B, 1kB, 32kB messages Scientific application traces
* [Chan et al. PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks. In DATE 2010]
Results – Random Traffic
1 10 100 10000.01
0.1
1
10
100
1000E-MeshP-MeshP-TDM
Measured Bandwidth (GB/s)
Avg
. Lat
ency
(µ
s)32B
1 10 100 10000.01
0.1
1
10
100
1000
E-Mesh
Measured Bandwidth (GB/s)
Avg
. Lat
ency
(µ
s)
Results – Random Traffic
32B1kB
1 10 100 10000.01
0.1
1
10
100
1000E-Mesh
Measured Bandwidth (GB/s)
Avg
. Lat
ency
(µ
s)
Results – Random Traffic
32B1kB32kB
Results – Scientific Applications
Cactus GTC MADbench PARATEC
0.00001
0.0001
0.001
0.01
E-Mesh P-Mesh P-TDM
Exe
cuti
on T
ime
(s)
Cactus GTC MADbench PARATEC
0.00001
0.0001
0.001
0.01
0.1
E-Mesh P-Mesh P-TDM
Ene
rgy
(J)
Benchmark
Num Phases
Num Messages
Total Size (MB)
Avg Msg Size (B)
Cactus 2 285 7.3 25600
GTC 2 63 8.1 129796
MADbench 195 15414 86.5 5613
PARATEC 34 126059 5.4 43.3
Conclusion TDM implements fairness TDM improves network utilization Genetic Search useful for finding full-coverage
static schedule Future Work:
Scaling gracefully* Reducing time slots* Dynamic scheduling
Contact: [email protected]
* [Hendry et al. Time-Division-Multiplexed Arbitration in Silicon Nanophotonic Networks-on-Chip for High Perf. CMPs. In JPDC, Jan 2011]