Transport Layer Transport Layer Enhancements Enhancements for Unified for Unified Ethernet in Ethernet in Data Centers Data Centers K. Kant K. Kant Raj Ramanujan Raj Ramanujan Intel Corp Intel Corp Exploratory work only, not a committed Intel positi Exploratory work only, not a committed Intel positi
26
Embed
Transport Layer Enhancements for Unified Ethernet in Data Centers K. Kant Raj Ramanujan Intel Corp Exploratory work only, not a committed Intel position.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Transport Layer Transport Layer Enhancements Enhancements for Unified for Unified Ethernet in Ethernet in Data CentersData Centers
K. KantK. KantRaj RamanujanRaj Ramanujan
Intel CorpIntel Corp
Exploratory work only, not a committed Intel positionExploratory work only, not a committed Intel position
2*Third party marks and brands are the property of their respective owners
InsertLogoHere
ContextContext Data center is evolving Data center is evolving
Fabric should too.Fabric should too. Last talk: Last talk:
–Enhancements to Ethernet, already on trackEnhancements to Ethernet, already on track This talk:This talk:
–Enhancements to Transport LayerEnhancements to Transport Layer–Exploratory, not in any standards track.Exploratory, not in any standards track.
3*Third party marks and brands are the property of their respective owners
InsertLogoHere
OutlineOutline
–Data Center evolution & transport Data Center evolution & transport impactimpact
–Transport deficiencies & remediesTransport deficiencies & remedies– Many areas of deficiencies …Many areas of deficiencies …
– Only Congestion Control and QoS Only Congestion Control and QoS addressed in detailaddressed in detail
–Summary & Call to ActionSummary & Call to Action
4*Third party marks and brands are the property of their respective owners
Enet dominant, but convergence really on IP.Enet dominant, but convergence really on IP.– New layer2: PCI-Exp, Optical, WLAN, UWB, …New layer2: PCI-Exp, Optical, WLAN, UWB, …
Most ULP’s run over transport over IPMost ULP’s run over transport over IP Need to comprehend transport implications Need to comprehend transport implications
business trans
client req/ resp
iSCSI storage
database query
6*Third party marks and brands are the property of their respective owners
New models New models New fabric requirements New fabric requirements
8*Third party marks and brands are the property of their respective owners
InsertLogoHere
Fabric ImpactFabric Impact More types of traffic, more demanding needs.More types of traffic, more demanding needs. Protocol impact at all levelsProtocol impact at all levels
DC evolution DC evolution Transport evolution Transport evolution
9*Third party marks and brands are the property of their respective owners
InsertLogoHere
Transport Issues & Transport Issues & enhancementsenhancements Transport (TCP) enhancement areasTransport (TCP) enhancement areas
– Better Congestion control and QoSBetter Congestion control and QoS– Support media evolutionSupport media evolution– Support for high availabilitySupport for high availability– Many othersMany others
– Message based & unordered data delivery.Message based & unordered data delivery.– Connection migration in virtual clusters.Connection migration in virtual clusters.– Transport layer multicasting.Transport layer multicasting.
How do we enhance transport?How do we enhance transport?– New TCP compatible protocol? New TCP compatible protocol? – Use an existing protocol (SCTP)?Use an existing protocol (SCTP)?– Evolutionary changes to TCP from DC perspective.Evolutionary changes to TCP from DC perspective.
10*Third party marks and brands are the property of their respective owners
InsertLogoHere
What’s wrong with TCP What’s wrong with TCP Congestion controlCongestion control
TCP congestion control (CC) works TCP congestion control (CC) works independentlyindependently for each connection for each connection – By default TCP equalizes throughput By default TCP equalizes throughput undesirable undesirable
– Sophisticated QoS can change this, but …Sophisticated QoS can change this, but …
Lower level CC Lower level CC Backpressure on transport Backpressure on transport – Transport layer congestion control is crucialTransport layer congestion control is crucial
MACMAC
routerswitch switch
Congfeedback
TL cong cntrl IP
MAC
Apptranspo
rtIP
MAC
ECN/ICMPApptranspo
rtIP
MAC
11*Third party marks and brands are the property of their respective owners
InsertLogoHere
What’s wrong with QoS?What’s wrong with QoS? Elaborate mechanismsElaborate mechanisms
… … But a nightmare to useBut a nightmare to use– App knowledge, many parameters, sensitivity, …App knowledge, many parameters, sensitivity, …
What do we need?What do we need?– Simple/intuitive parameters Simple/intuitive parameters
– e.g., streaming or not, normal vs. premium, etc.e.g., streaming or not, normal vs. premium, etc.
– Automatic estimation of BW needs.Automatic estimation of BW needs.– Application focus, not flow focus!Application focus, not flow focus!
QoS relevant primarily under congestionQoS relevant primarily under congestion
Fix TCP congestion control, use IP QoS sparingly.Fix TCP congestion control, use IP QoS sparingly.
12*Third party marks and brands are the property of their respective owners
InsertLogoHere
TCP Congestion Control TCP Congestion Control EnhancementsEnhancements1)1) Collective control of all flows of an appCollective control of all flows of an app
– Applicable to both TCP & UDPApplicable to both TCP & UDP– Ensures proportional fairness of multiple Ensures proportional fairness of multiple inter-inter-
relatedrelated flowsflows– Tagging of connections to identify related flows.Tagging of connections to identify related flows.
2)2) Packet loss highly undesirable in DCPacket loss highly undesirable in DC– Move towards a delay based TCP variant.Move towards a delay based TCP variant.
3)3) Multilevel CoordinationMultilevel Coordination– Socket vs. RDMA apps, TCP vs. UDP, … Socket vs. RDMA apps, TCP vs. UDP, … – A layer above transport for coordinationA layer above transport for coordination
13*Third party marks and brands are the property of their respective owners
InsertLogoHere
Collective Congestion Collective Congestion ControlControl Control connections thru a congested device Control connections thru a congested device
together (control set)together (control set) Determining control set is challengingDetermining control set is challenging BW requirement estimated automatically BW requirement estimated automatically
during non-congested periodsduring non-congested periods
Cong. Control
S21
S23
SW1SW2
CL1
SW0
S11
S13
CL2
14*Third party marks and brands are the property of their respective owners
15*Third party marks and brands are the property of their respective owners
InsertLogoHere
Sample ResultsSample Results Cong. Control
Collective control highly desirable within a DC
Modified TCP can maintain 2:1 throughput ratio Modified TCP can maintain 2:1 throughput ratio – Also yields lower losses & smaller RTT.Also yields lower losses & smaller RTT.
16*Third party marks and brands are the property of their respective owners
InsertLogoHere
Adaptation to MediaAdaptation to Media Problem:Problem: TCP assumes loss TCP assumes loss congestion, congestion,
and designed for WAN (high loss/delay)and designed for WAN (high loss/delay) Effects:Effects:
– Wireless (e.g. UWB) attractive in DC (wiring Wireless (e.g. UWB) attractive in DC (wiring reduction, mobility, self configuration).reduction, mobility, self configuration).
– … … but TCP is not a suitable transport.but TCP is not a suitable transport.– Overkill for communications within a DC.Overkill for communications within a DC.
Solution:Solution: A self-adjusting transport A self-adjusting transport– Support multiple congestion/flow-control regimes.Support multiple congestion/flow-control regimes.
– Automatically selected during connection setup.Automatically selected during connection setup.
17*Third party marks and brands are the property of their respective owners
InsertLogoHere
High Availability IssuesHigh Availability Issues Problem:Problem: Single failure Single failure broken connection, broken connection,
weak robustness check, …weak robustness check, … Effect:Effect: Difficult to achieve high availability. Difficult to achieve high availability.
A B
Path 1
Path 2
Solution: Solution: – Multi-homed connections w/ load sharing among paths.Multi-homed connections w/ load sharing among paths.
– Ideally, controlled diversity & path managementIdeally, controlled diversity & path management– Difficult: need topology awareness, spanning tree problem, Difficult: need topology awareness, spanning tree problem,
18*Third party marks and brands are the property of their respective owners
InsertLogoHere
Summary & call to actionSummary & call to action Data Centers are evolvingData Centers are evolving
– Transport must evolve too, but a difficult Transport must evolve too, but a difficult proposition proposition
– TCP is heavily entrenched, change needs an TCP is heavily entrenched, change needs an industry wide effortindustry wide effort
Call to ActionCall to Action– Need to get an industry effort going to defineNeed to get an industry effort going to define
– New features & their implementationNew features & their implementation
Multicast connections to Multicast connections to others nodes via leaders others nodes via leaders – Ack consolidation at leaders Ack consolidation at leaders
(multicast)(multicast)
– Msg consolidation at Msg consolidation at leaders (reverse multicast)leaders (reverse multicast)
Done by a layer above? Done by a layer above? (layer 4.5?)(layer 4.5?)