Top Banner
Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks (Huawei) [email protected] +1 802 capable 10 November 2018 1 Mentor DCN 802.1-18-0068-00-ICne
22

Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

Jun 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

Overview: IEEE 802 Nendica Report on The Lossless Network for Data CentersRoger Marks (Huawei)[email protected]+1 802 capable

10 November 2018

1Mentor DCN 802.1-18-0068-00-ICne

Page 2: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

2

Disclaimer

• All speakers presenting information on IEEE standards speak as individuals, and their views should be considered the personal views of that individual rather than the formal position, explanation, or interpretation of the IEEE.

2

Page 3: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

3

Nendica

• Nendica: IEEE 802 “Network Enhancements for the Next Decade” Industry Connections Activity▫ An IEEE Industry Connections Activity

• Organized under the IEEE 802.1 Working Group• Chartered March 2017 - March 2019▫ may be extended

• Chair (until March 2018): Glenn Parsons• Chair (from March 2018): Roger Marks

3

Page 4: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

4

IEEE Industry Connections Activity

• Under IEEE-SA, but not standardization.• “Industry Connections activities provide an

efficient environment for building consensus and developing many different types of shared results. Such activities may complement, supplement, or be precursors of IEEE Standards projects, but they do not themselves develop IEEE Standards.”

• IEEE 802.3 manages another Industry Connections Activity (“New Ethernet Applications”).

4

Page 5: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

5

Nendica Motivation and Goals• “The goal of this activity is to assess… emerging

requirements for IEEE 802 wireless and higher-layer communication infrastructures, identify commonalities, gaps, and trends not currently addressed by IEEE 802 standards and projects, and facilitate building industry consensus towards proposals to initiate new standards development efforts.

• Encouraged topics include enhancements of IEEE 802 communication networks and vertical networks as well as enhanced cooperative functionality among existing IEEE standards in support of network integration.

• Findings related to existing IEEE 802 standards and projects are forwarded to the responsible working groups for further considerations.”

5

Page 6: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

6

Nendica Work Items

• The Lossless Network for Data Centers▫ published Nendica Report, 2018-08-17� IEEE 802.1-18-0042-00� [Circulated to IETF New Work during development]▫ Published report invites further comments▫ Stimulated new standardization project IEEE

P802.1Qcz (Congestion Isolation)• Flexible Factory IOT▫ Draft report 802.1-18-0025-06▫ Significant focus on wireless▫ Comment resolution underway

6

Page 7: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

7

Nendica Report: The Lossless Network for Data Centers

• Paul Congdon, Editor• Key messages regarding the data center :▫ Packet loss leads to large delays.▫ Congestion leads to packet loss.▫ Conventional methods are problematic.▫ Even in a Layer 3 network, we can take action at

Layer 2 to reduce congestion and thereby loss.▫ The paper is not specifying a “lossless” network but

describing a few prospective methods to progress towards a lossless data center network in the future.

• The report is open to comment and may be revised.

7

Page 8: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

8

Use Cases: The Lossless Network for Data Centers

• Online Data Intensive (OLDI) Services• Deep Learning and Model Training• Non-Volatile Memory Express (NVMe) over Fabrics• Cloudification of the Central Office

• Overall theme is dependence of parallel computation on the network

8

Page 9: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

9

Data Center Applications are distributed and latency-sensitive

9

3

Copyright © 2018 IEEE. All rights reserved.

experience is highly dependent upon the system responsiveness, and even moderate delays of less  than  a  second  can  have  a measurable  impact  on  individual  queries  and  their  associated advertising revenue. A large chunk of unavoidable delay, due to the speed of light, is inherently built  into a system that uses the remote cloud as the source of decision and  information. This puts even more pressure on the deadlines within the data center itself. To address these latency concerns, OLDI services deploy individual requests across thousands of servers simultaneously. The  responses  from  these  servers  are  coordinated  and  aggregated  to  form  the  best recommendations or answers. Delays in obtaining these answers are compounded by delayed or ‘straggler’ communication flows between the servers. This creates a long tail latency distribution in  the  data  center  for  highly  parallel  applications.  To  combat  tail  latency,  servers  are  often arranged in a hierarchy, as shown in Figure 1, with strict deadlines given to each tier to produce an  answer.  If  valuable  data  arrives  late  because of  latency  in  the network,  the data  is  simply discarded,  and a  sub‐optimal  answer may be  returned.  Studies  have  shown  that  the network becomes a significant component of overall data center latency when congestion occurs in the network [2]. 

 

Figure 1 – Parallel Application Hierarchy 

The  long  tail of  latency distribution  in OLDI data centers can be caused by various  factors  [3]. One is simply related to the mix of traffic between control messages (mice) and data messages (elephants). While most of the flows in the data center are mice, most of the bytes transferred across the network are due to elephants. Therefore, a small number of elephant flows can delay the set‐up of control channels established by mice flows. Since OLDI data centers are processing requests over thousands of servers simultaneously, the mix and interplay of mice and elephant flows is highly uncoordinated. An additional complexity is that flows can change behavior over time; what was once an elephant can transform into a mouse after an application has reached steady state. Another cause of latency is due to incast at the lower tiers of the node hierarchy. Leaf worker nodes return their answers to a common parent in the tree at nearly the same time. This  can  cause  buffer  over‐runs  and  packet  loss  within  an  individual  switch.  It  may  invoke congestion management  schemes  such as  flow‐control  or  congestion notification, which have little effect on mice flows and tail latency. 

 

Authorized licensed use limited to: Roger Marks. Downloaded on September 28,2018 at 03:47:05 UTC from IEEE Xplore. Restrictions apply.

• Tend toward congestion; e.g. due to incast• Packet loss leads to retransmission, more

congestion, more delay

Page 10: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

10

Folded-Clos Network:Many Paths from Server to Server

10

server

spine

rack

Page 11: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

11

Equal-Cost Multi-Path (ECMP):Path assigned per flow (~random)

11

server

spine

rack

ECMP

Page 12: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

12

ECMP may still lead to congestion;e.g. large flows may collide

12

server

spine

rack

ECMP

congestion

Page 13: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

13

Incast fills output queue(note: ECMP cannot help)

13

server

spine

rack

ECMPincast

Page 14: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

14

Priority flow control (PFC)14

server

spine

rack

PFC

incast

• Output backup fills ingress queue• PFC can be used to pause input per QoS class• IEEE 802.1Q (originally in 802.1Qbb)

Page 15: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

15

PFC pauses all flows of the classincluding “victim” flows

15

server

spine

rack

PFC stops both flows

incast

Page 16: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

16

Explicit Congestion Notification (ECN) pauses flows at source

16

server

spine

rack

incast

ECN mark

ECN CongestionFeedback

Page 17: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

17

Dynamic Virtual Lanes (DVL)

17

DownstreamUpstream1 3 1 3

2

4

2

4Ingress Port

(Virtual Queues)Egress Port Ingress Port

(Virtual Queues)Egress Port

Congested Flows

Non-Congested Flows

1. Identify the flow causing congestion and isolate locally

CIP 2. Signal to neighbor when congested queue fills

Eliminate HoL Blocking

3. Upstream isolates the flow too, eliminating head-of-line blocking

PFC 4. If congested queue continues to fill, invoke PFC for lossless

Page 18: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

18

Load-Aware Packet Spraying (LPS)

18

LPS (Load-Aware Packet Spraying)

LPS = Packet Spraying + Endpoint Reordering + Load-Aware

Distributed Finer Granularity In-Ordering Congestion-Aware

Leaf Leaf Leaf Leaf LeafLeaf

Spine Spine Spine Spine

… … … … … …78

6

54

3Path 1

Path 2 Path 3 Path 4

21

21

3 45

6

78

Reordering @ Leaf

Path-Congestion Feedback

According to path-congestion degree, spray packets over paths

1

23

Page 19: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

19

Push & Pull Hybrid Scheduling(PPH)19

PPH = Congestion aware edge switch schedulingPush when load is lightPull when load is high

Leaf Leaf Leaf Leaf LeafLeaf

Spine Spine Spine Spine

… … … … … …

Request

Grant

Data Data

Request

Grant

1

2

3

source sourcedestination

Push Data

Grant(Pull)Long RTT

Short RTTRequest

(Pull) Pull Data

Request (Pull)Push Data

Light load: All Push. Acquire low latency.

Light congestion: Open Pull for part of the congested path

Heavy load: All Pull. Reduce queuing delay, improve throughput.Congestion aware edge switch scheduling

• Push when load is light• Pull when load is high

Push Data

Grant(Pull)Long

RTTShort RTT

Request(Pull)

Pulled Data

Request (Pull)PushData

Pulled Data

Light load: All Push. Acquire low latency.

Light congestion: Open Pull for part of the congested path

Heavy load: All Pull. Reduce queuing delay, improve throughput.

Page 20: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

20

Key Issues: Nendica Report on Lossless Network for Data Centers

Dynamic Virtual Lane

Priority-based Flow Control is coarse. Victim flows paused due to congested flows

Allow time for end-to-end congestion control. Move congested flows out of the way. Eliminate victim blocking.

Push & PullHybrid

Scheduling

Unscheduled incast without awareness of network resources leads to packet loss.

Source

Network

Destination

Schedule using integrated information from source, network, and destination.

Source

Network

Destination

Load-aware Packet

Spraying

Unbalanced load sharing. Multiple elephant flows congest and block mice flows..

Load-balance flows at higher granularity. Use congestion awareness to avoid collisions

Isolate Congestion

Schedule Appropriately

Spread the Load

Congestion Cause Mitigation Innovation

20

Page 21: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

21

Bibliography

• IEEE 802 “Network Enhancements for the Next Decade” Industry Connections Activity (Nendica)▫ https://1.ieee802.org/802-nendica/

• IEEE 802 Nendica Report: “The Lossless Network for Data Centers” (18 August 2018)▫ https://mentor.ieee.org/802.1/dcn/18/1-18-0042-00.pdf

• Paul Congdon, “The Lossless Network in the Data Center,” IEEE 802.1-17-0007-01, 7 November 2017▫ https://mentor.ieee.org/802.1/dcn/17/1-17-0007-01.pdf

21

Page 22: Overview: IEEE 802 Nendica Report on The … › 802.1 › dcn › 18 › 1-18-0068-00-ICne...Overview: IEEE 802 Nendica Report on The Lossless Network for Data Centers Roger Marks

22

Next Steps

• IEEE 802 Nendica Report: “The Lossless Network for Data Centers” (18 August 2018) is published but open to further comment.

• Would a useful revision document point to complementary directions in 802 and IETF?

• Is is time to open a revision activity?

22