CDC/CRA CDC/CRA CHiPs Mentoring Workshop CHiPs Mentoring Workshop High Performance High Performance Interconnects Interconnects Timothy M. Pinkston Timothy M. Pinkston Professor, USC Professor, USC July 25-27, 2009 July 25-27, 2009
Jan 12, 2016
CDC/CRACDC/CRACHiPs Mentoring WorkshopCHiPs Mentoring Workshop
High Performance InterconnectsHigh Performance Interconnects
Timothy M. PinkstonTimothy M. PinkstonProfessor, USCProfessor, USC
July 25-27, 2009July 25-27, 2009
My BackgroundMy Background
• Education:Education:• BSEE (minor in CS): The Ohio State Univ., ’85• MSEE (Computer Engineering): Stanford U., ’86• PhDEE (Computer Engineering, Comp Arch): Stanford U., ’93
• Experience:Experience:• Industry: Industry: AT&T Bell Labs, ’85-’86; IBM Intern, ‘89-’90 (summers);
Hughes Research Labs (HRL) Doctoral Fellow ’90-’93• Academia: Academia: University of Southern California ’93 - present• Government: Government: NSF, Jan. ‘06 – Dec. ‘08
• Research Interests:Research Interests:• Computer systems architecture: interconnection networks,
on-chip networks for multicore and multiprocessor systems• Recent Activities:Recent Activities:
• “Interconnection Networks” with Jose Duato , book chapter in Computer Architecture: A Quantitative Approach, 4th edition, J. L. Hennessy and D. A. Patterson (2006)
• Lead Program Director for Expeditions in Computing program: NSF CISE, $40M award portfolio in inaugural year (2008)
Interconnection NetworksInterconnection Networks
• The subsystem that connects individual devices together The subsystem that connects individual devices together into a community of communicating devicesinto a community of communicating devices
• Device (End Node):Device (End Node):– Component in a computerComponent in a computer– A computerA computer– System of computersSystem of computers
• Interconnection Network:Interconnection Network:– Interfaces and LinksInterfaces and Links– Communication protocolCommunication protocol– Routers (switches)Routers (switches)
• Goal: Goal: Transfer maximumTransfer maximum
amount of data reliably inamount of data reliably in
least amount of time (& energy, cost)least amount of time (& energy, cost)
so as not to bottleneck overall system performanceso as not to bottleneck overall system performance
End NodeEnd NodeEnd NodeEnd Node
…
…
HW Interface HW Interface HW Interface HW Interface
SW Interface SW Interface SW Interface SW Interface
Device Device Device Device
…
Router Router Router Router
Router Router Router Router
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Inte
rco
nn
ec
tio
n N
etw
ork
Internetworking
LANs
Different Networks for Different ScalesDifferent Networks for Different ScalesD
ista
nce
(m
eter
s)
5 x 10-3
5 x 100
5 x 103
5 x 106
Number of devices interconnected
1 10 100 1,000 10,000 >100,000
WANs
Wide-Area Networks (WANs)
On-Chip Networks (OCNs)
Local Area Networks (LANs)
System-Area Networks (SANs)
SANs
OCNs
Increasing Parallelism on ChipsIncreasing Parallelism on Chips
• Level 1Level 1– Level 2Level 2
• Level 3Level 3– Level 4Level 4
0
0.5
1
1.5
2
2.5
1980 1985 1990 1995 2000 2005 2010 2015Year of Technology Availability
Min
imu
m F
ea
ture
Siz
e
(um
)
CPU FPU
L1
MCM
P P P P
2um
CPU
L1
MC
FPU
1um
L2
MC
FPU
L1
CPU
0.35um
FPU
L1
L2
L3
MC
CPU
0.18um
L3 + MC
0.09um
multicore era!
(adapted from Nhon Quach, Intel)
many-core chips
FPU
L1
L2
FPU
L1
L2
CPUCPU
L3 + MC
0.045umCPU CPU
L1 L1
L2 L2
CPU CPU
L1 L1
L2 L2
L3 + MC
6
• Blue Gene/L 3D Torus NetworkBlue Gene/L 3D Torus Network
Increasing Parallelism in SystemsIncreasing Parallelism in Systems
IBM Blue Gene
(www.ibm.com)
3-dimensional (XYZ) torus
interconnection network
(10s to 100s thousand devices)
Defects, Faults, Chip Yield and LifetimeDefects, Faults, Chip Yield and Lifetime
Trends in chip (system) failure rateTrends in chip (system) failure rate
infant mortality period aging perioduseful lifetime period
fault resilient
designs
fau
lt r
esili
ent
des
ign
s
time
failu
re r
ate
technology
scaling
tech
nolo
gy
scal
ing
- Intel predicts at least 5-10% of chip resources will be used for ensuring reliability
(Source: “Platform 2015” www.intel.com/go/platform2015)
0
10
20
30
40
2004 2007 2010 2013 2016
Year of Production
No
rmal
izat
ion
to
Yea
r 20
04
Defect data volumn
ITRS’04
- Technology scaling adversely impacts chip yield and chip/system failure rate
(manufacturing defects, soft and hard faults, wear-out lifetime)
- Adaptive, self-correcting, self-repairable architectures are needed to combat decreasing
chip reliability with successive technology generations
Transporting Packets within a NetworkTransporting Packets within a Network
• Goal: Goal: Transfer maximum amount of data reliably in least amount of Transfer maximum amount of data reliably in least amount of time (& energy, cost) so as not to bottleneck overall system perf.time (& energy, cost) so as not to bottleneck overall system perf.
• Network Structure and Functions for Transporting Data PacketsNetwork Structure and Functions for Transporting Data Packets
– Topology: Topology: What network paths are possible for packets?What network paths are possible for packets?
– Routing: Routing: Which of the possible paths are allowable for packets?Which of the possible paths are allowable for packets?
– Flow Control & Arbitration: Flow Control & Arbitration: When are paths available for packets?When are paths available for packets?
– Switching: Switching: How are paths allocated to packets?How are paths allocated to packets?
– Router Microarchitecture: Router Microarchitecture: Implementation of router internal pathsImplementation of router internal paths
Flow Control of Data PacketsFlow Control of Data Packets
• Poor flow control can reduce link efficiencyPoor flow control can reduce link efficiency– ““Handshaking” flow controlHandshaking” flow control
receiversender
Router Router
Receiver transmits
handshake when ready
for next packet
Sender can transmit
only after receiving
handshake signal
Handshake
data link
control link
queued packets buffer queue
X
Queue is
not serviced
receiversender
Handshake
Receiver transmits
handshake when ready
for next packet
Sender can transmit
only after receiving
Handshake signal
non-pipelined transfer
Flow Control of Data PacketsFlow Control of Data Packets
Router Router
data link
control link
• Poor flow control can reduce link efficiencyPoor flow control can reduce link efficiency– ““Handshaking” flow controlHandshaking” flow control– simple, but low throughput and high latencysimple, but low throughput and high latency
queued packets buffer queue
X
Queue is
not serviced
receiversender
Go
Stop
When Stop threshold is
reached, a Stop
notification is signaled
Control bit Stop
Go
When in Stop,
sender cannot
inject packets
X
Queue is
not serviced
Flow Control of Data PacketsFlow Control of Data Packets
Router Router
data link
control link
pipelined transfer
• Poor flow control can reduce link efficiencyPoor flow control can reduce link efficiency– ““Stop & Go” flow controlStop & Go” flow control
a packet is injected if
control bit is a “Go”
buffer queuequeued packets
receiversender
Go
Stop
When GoGo threshold is
reached, a “GoGo”
notification is sent
Control bit
Go
Stop
X
Queue is
not serviced
Flow Control of Data PacketsFlow Control of Data Packets
Router Router
data link
control link
pipelined transfer
• Poor flow control can reduce link efficiencyPoor flow control can reduce link efficiency– ““Stop & Go” flow controlStop & Go” flow control– improved throughput and latency improved throughput and latency if large enough buffer queuesif large enough buffer queues
a packet is injected if
control bit is a “GoGo”
queued packets buffer queue
receiversender
Sender sends packets
whenever credit counter
is not zero
109876543210
X
Queue is
not serviced
Credit counter
Flow Control of Data PacketsFlow Control of Data Packets
Router Router
data link
control link
pipelined transfer
• Poor flow control can reduce link efficiencyPoor flow control can reduce link efficiency– ““Credit-based” flow controlCredit-based” flow control
buffer queuequeued packets
receiversender
10Credit counter 9876543210
+5
5432
X
Queue is
not serviced
Receiver sends credits
after they become
available
Sender resumes
injecting when credit
counter > 0
Flow Control of Data PacketsFlow Control of Data Packets
Router Router
data link
control link
pipelined transfer
• Poor flow control can reduce link efficiencyPoor flow control can reduce link efficiency– ““Credit-based” flow controlCredit-based” flow control– improved throughput and latency improved throughput and latency with smaller buffer queueswith smaller buffer queues
buffer queuequeued packets
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– A router with a single buffer queue per input portA router with a single buffer queue per input port
buffer queue
Y-Y+X-Y-
Input port i
Output port X+
X+
Output port X-
Output port Y+
Output port Y-
X+
Flow Control of Data PacketsFlow Control of Data Packets
K x K Router
queued packets
2-dimensional mesh network
with dimension-order routing
X
physical
channel
X
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– Head-of-line Head-of-line blockingblocking in a router with single queue per input port in a router with single queue per input port
Flow Control of Data PacketsFlow Control of Data Packets
Y+
X+
Y-
X-
Router
split buffer queue
X-
Input port i
Output port X+
X+
Output port X-
Output port Y+
Output port Y-
Y+Y-
DE
MU
X
X+
Y-
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– A router with two queues per input port A router with two queues per input port two virtual channels two virtual channels
Flow Control of Data PacketsFlow Control of Data Packets
K x K Router
queued packets
X
virtual channel 0
virtual channel 1
X
2-dimensional mesh network
with dimension-order routing
(two virtual channels/physical)
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– HoL blocking HoL blocking reducedreduced in a router with two queues per input port in a router with two queues per input port
Flow Control of Data PacketsFlow Control of Data Packets
Y+
X+
Y-
X-
Router
X
X
X
No VCs
available
virtual channel 0
virtual channel 1
2-dimensional mesh network
with dimension-order routing
(two virtual channels/physical)
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– HoL blocking HoL blocking not eliminated not eliminated in a router with virtual channelsin a router with virtual channels
Flow Control of Data PacketsFlow Control of Data Packets
Y+
X+
Y-
X-
Router
Split buffer queue
Input port iOutput port X+
X+
Output port X-
Output port Y+
Output port Y-
DEM
UX
Y-
X-
X+
Y-
Y+
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– A router with virtual output queuing (VOQ) requires A router with virtual output queuing (VOQ) requires kk queues queues
Flow Control of Data PacketsFlow Control of Data Packets
K x K Router
queued packets
X
X
Y+
X+
Y-
X-
Y+
X+
Y-
X-
HOL
blocking at
neighboring
router!!
• Improving flow control with split buffer organizationsImproving flow control with split buffer organizations– HoL blocking HoL blocking eliminated eliminated at router with VOQat router with VOQ
2-dimensional mesh network
with dimension-order routing
(two virtual channels/physical)
Flow Control of Data PacketsFlow Control of Data Packets
,, but not at neighbor but not at neighbor
Y+
X+
Y-
X-
Router
skyline
regionroot
• Reduce chip-kill in the presence of permanent faults Reduce chip-kill in the presence of permanent faults with with dynamic reconfiguration dynamic reconfiguration of on-chip networksof on-chip networks
new
root
- A 2-D mesh network with XY
routing (deadlock-free)- If a core’s router & link is faulty
- Network can be dynamically
reconfigured to up*/down* (u*/d*)
routing remaining deadlock-free!- Later, if the u*/d* root fails
- Only the up*/down* link directions
within the skyline region are
affected by the fault- Reconfigured again to regain
connectivity no chip-kill!!
→ causes five failed links
→ causes four links to fail
Resilient Interconnection NetworksResilient Interconnection Networks
Many such fascinating problems in
need of innovative solutions!
In ConclusionIn Conclusion
• Interconnection networks are key to exploiting parallelismInterconnection networks are key to exploiting parallelism– on-chip networks between cores within a chipon-chip networks between cores within a chip– off-chip networks between chips and boards across a systemoff-chip networks between chips and boards across a system
• Many open research questions remain:Many open research questions remain:– network topology, routing, arbitration, switching, and flow control network topology, routing, arbitration, switching, and flow control
designs that maximize throughput and minimize latencydesigns that maximize throughput and minimize latency– innovative resource management techniques that enable adaptive, innovative resource management techniques that enable adaptive,
power-aware, fault-resilient, reliable interprocessor communicationpower-aware, fault-resilient, reliable interprocessor communication– the list goes on … the list goes on …
• High performance interconnection network designHigh performance interconnection network design is an is an exciting area of computer systems architecture researchexciting area of computer systems architecture research
• The future awaits!The future awaits!
OCNs SANs LANs WANs
Interconnect Media & Form FactorsInterconnect Media & Form FactorsM
edia
typ
es
Distance (meters)
0.01 1 10 100 >1,000
Fiber Optics
Coaxial
cables
Myrinet
connectors
Cat5E twisted pair
Metal layers
Printed
circuit
boards
InfiniBand
connectors
Ethernet
• Comparison of “Stop & Go” with “Credit-based”Comparison of “Stop & Go” with “Credit-based”
Sto
p &
Go
Time
Cre
dit
bas
ed
Time
Go
Stop
Go
Stop
Go
Stop
Sender
stops
transmission
Last packet
reaches receiver
buffer
Stop
Go
Packets in
buffer get
processed
Go signal
returned to
sender
Sender
resumes
transmission
First packet
reaches
buffer
# credits
returned
to sender
Sender
uses
last credit
Last packet
reaches receiver
buffer
Pacekts get
processed and
credits returned
Sender
transmits
packets
First packet
reaches
buffer
Flow control latency
observed by
receiver buffer
Stop signal
returned by
receiver
Flow Control of Data PacketsFlow Control of Data Packets