Top Banner
Reconstructing network states in cloud using NIC and system timestamps: A case study of Cloudlab Shiyu Liu, Balaji Prabhakar, Mendel Rosenblum Stanford University Feb 7, 2018
15

Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Jul 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Reconstructing network states in cloud using NIC and system timestamps:

A case study of CloudlabShiyu Liu, Balaji Prabhakar, Mendel Rosenblum

Stanford UniversityFeb 7, 2018

Page 2: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

SIMON: A Simple and Scalable Method for Sensing, Inference and Measurement in Data Center Networks (NSDI’19)

• Using edge-based measurement to reconstruct key network state variables• Packet queuing times at switches• Link utilizations• Queue and link compositions at the flow-level

• SIMON enables:• Sensitive A/B tests• Network troubleshooting & diagnosis• Network performance monitoring

Page 3: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

HW (NIC) or SW (system) timestamps?• HW (NIC) timestamps:

• Accurate inputs for estimating queueing

delays. SIMON’s default.

• Not available in many cases: e.g. cloud

• SW (system) timestamps:

• Widely available

• Could we use SW timestamps to still

get fairly good reconstructions?

• Will improve the deployability of SIMON

CPU+RAM

APP

Kernel

Driver

NIC

PCIe

CPU+RAM

APP

Kernel

Driver

NIC

PCIe

Tx HW Rx HW

Tx SW

Rx SW

!(#$%&) − !()$%&) v.s. !(#$*&) − !()$*&)• Software processing delays in driver,

interrupt handling, interrupt coalescing

• PCI-E delays

• NIC queueing & hardware processing delays

Page 4: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Contents

• Overview of Cloudlab environment• Performance of SIMON w/ HW timestamps• Study of the difference between SW & HW measured one-way delays• Performance of SIMON w/ SW timestamps

Page 5: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

• Cloudlab: 2-stage switching fabric, 10G links

• Use Huygens to sync SW (system) and HW (NIC) clocks respectively among all servers.

A case study of Cloudlab

OS: Linux v4.15

NIC: Mellanox ConnectX-4

ToR: Dell S4048-ON

12MB shared pkt buffer

Spine: Mellanox MSN2410

16MB shared pkt buffer

Topology of Cloudlab experiment

Page 6: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Contents

• Overview of Cloudlab environment• Performance of SIMON w/ HW timestamps• Study of the difference between SW & HW measured one-way delays• Performance of SIMON w/ SW timestamps

Page 7: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Estimate queue recon errors without ground truth

• Send two independent probe meshes, i.e. two independent sets of measurements.

!"# $% − $' = !"# $% + !"#($')

,-./"0% ,% = , + $%

,' = , + $'

Measure 1 SIMON

SIMON-./"0'Measure 2

The diff between two independent reconstruction (,% and ,') bounds the diff between these reconstructions and the ground

truth (i.e. $% and $').

Page 8: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

SIMON w/ HW timestamps in CloudlabCross-validation by 2 independent meshes of probes. Recon interval = 1ms.

All queues Queues > 100usRMS(blue-red) 29.33 us 108.54 us

Relative error = !"#(%&'()*(+)!"#(-./012034 ) 7.2% 6.9%

Page 9: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Contents

• Overview of Cloudlab environment• Performance of SIMON w/ HW timestamps• Study of the difference between SW & HW measured one-way delays• Performance of SIMON w/ SW timestamps

Page 10: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

SW & HW one-way delay in Cloudlab

• Our goal is to use SW one-way delay (red line) to estimate the HW one-way delay (blue line)• The noise is instantaneous, but the switch queueing delays are prolonged

DC bias

High-freqnoise

Page 11: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Remove the noise in SW one-way delays

DC bias

High-pass filter to remove the DC bias

> threshold > threshold

Remove peak noises Remaining noise:LASSO will take avg

Page 12: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Contents

• Overview of Cloudlab environment• Performance of SIMON w/ HW timestamps• Study of the difference between SW & HW measured one-way delays• Performance of SIMON w/ SW timestamps

Page 13: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Filtering improves the approximation of SW recon results to HW results

SW one-way delaysw/o or w/ filtering

HW one-way delays

SIMON SW recon results

HW recon resultsSIMON

Approximate

23.04

29.08

12.67

24.80

0.005.00

10.0015.0020.0025.0030.0035.0040.00

All queues Queues > 100us

RMS(

diffe

renc

e) (u

s)

RMS(SW recon - HW recon) (us)

w/o filter w/ filter

Page 14: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

29.33

108.54

32.37

109.79

0.00

20.00

40.00

60.00

80.00

100.00

120.00

All queues Queues > 100us

RMSE

(us)

RMSE (us)

HW SW w/ filter

7.23% 6.89%7.98%

6.97%

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

All queues Queues > 100us

Rela

tive

erro

r

Relative error

HW SW w/ filter

Recon errors using SW & HW timestamps

• SW recon errors close to HW, esp. for large queues• SW (system) timestamps are good replacements of HW (NIC)

timestamps for SIMON

Page 15: Reconstructing network states in cloud using NIC and ... · A case study of Cloudlab ShiyuLiu, Balaji Prabhakar, Mendel Rosenblum Stanford University ... •Study of the difference

Conclusion

• By applying proper filters on SW (system) timestamps, they become good replacements of HW (NIC) timestamps for reconstructing network states. • This improves the deployability of SIMON, e.g. in cloud environment

Welcome to our poster for more details and Q/A