Top Banner
1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC www.slac.stanford.edu/grp/scs/net/talk/ mon-escc-apr00/ Presented at the ESCC meeting Pleasanton April 26, 2000 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end
32

1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC Presented at the.

Jan 01, 2016

Download

Documents

Elwin Foster
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

1

Internet Performance Monitoring Update

Les Cottrell & Warren Matthews – SLACwww.slac.stanford.edu/grp/scs/net/talk/mon-escc-apr00/

Presented at the ESCC meeting Pleasanton April 26, 2000Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring

(IEPM), also supported by IUPAP

Page 2: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

2

Overview• Pinger

• Validations

• Results

• Quality of Service

• Coming soon

• Summary

Page 3: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

3

PingER• Measurements from

– 30 monitors in 15 countries– Over 500 remote hosts– Over 70 countries – Over 2100 monitor-remote site pairs

• Recent monitor additions: ANL, UWisc, NSK, ITEP, RIKEN, KAIST, ILAN, Brazil, Melbourne; working on: Caltech, SDSC

• Over 50% of HENP collaborator sites are explicitly monitored as remote sites by PingER project– Atlas (37%), BaBar (68%), Belle (23%), CDF (73%), CMS (31%),

D0 (60%), LEP (44%), Zeus (35%), PPDG (100%), RHIC(64%)

• Remainder covered by Beacons– Currently 56, extending to 76

Page 4: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

4

Beacons & UK seen from ESnet

Sites in UK track one another, so can represent with single site

2 Beacons in UK Indicates common source of congestionIncreased capacity by 155 times in 5 years

Effect of ACLs

Direct peering betweenJANet and ESnet

Page 5: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

5

PingER Deployment Jan-00

Page 7: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

7

RIPE vs Surveyor 1/2

Little short term correlationeven for time differences of< 2 secs

Little structureoutliersdon’t match

Page 8: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

8

RIPE vs Surveyor 2/2

Optimum agreement ifdisplace RIPE by ~ 0.2 ms(packet size difference)

Page 9: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

9

PingER vs AMP

Little obvious short term agreement (R2<0.1)Same if compare ping vs. ping

Avg Ping distribution agrees with AMPBoth show >=95% of samples are 58-59 msecR2 > 0.95 for min & avg

Time series

Page 10: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

10

Rate Limiting 1/3 (Mit Shah)

“Tail-drop” behavior

• Rate-limiting kicks in after the first few packets and hence later packets are more likely to be dropped

Calculate slope and histogram slope frequency for all nodes, look at outliers (8)

Added as PingER metric, Still validating, some sites consistentothers vary from month to month

Page 11: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

11

Rate Limiting 2/3Asymmetry of Ping vs Sting losses

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

clan

2.fit

.uni

mas

.my

ultr

a.he

pi.e

du.g

e

ww

w.d

olph

inic

s.no

ns.u

cr.a

c.cr

cab.

cnea

.gov

.ar

tjev.

tel.f

er.h

r

lhr.

com

sats

.net

.pk

pknt

.utm

.my

tnp.

saha

.ern

et.in

ww

w.ji

nr.d

ubna

.su

ns.it

ep.r

u

gam

ma.

carn

et.h

r

intr

ans.

baku

.az

ns1b

.itb.

ac.id

daim

on.u

nian

des.

edu.

co

groa

.uct

.ac.

za

tifr.

res.

in

ww

w.b

u.ac

.th

ww

w.u

sm.m

y

moo

n.at

omki

.hu

cni.m

d

sun.

ihep

.ac.

cnAs

ym

= (

p-s

)/(p

+s

)Measured 4/22/00 for hosts seen from SLACwith high tail-drop.Hosts selected with > 0.7% loss and no sting pathologies

Hosts mainly in former E. block, S. Asia, Latin America & S. AfricaLarge asymmetry means ping loss >> sting loss, maybe limiting

Page 12: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

12

Rate Limiting 3/3• Have identified about 2% of sites possibly limiting • Using Sting (Stefan Savage) & SynAck (SLAC)

tools to identify loss(sting or synack probes) << loss(ping)

• www.vincy.bg.ac.yu blocked 884 rounds of 10 ICMP packets each, out of 903

• islamabad-server2.comsats.net.pk – blocked 554 out of 903

• leonis.nus.edu.sg– blocked all non 56Byte packets

• All low loss with sting or synack

Page 13: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

13

Results:How are the U.S.

Nets doing?

In general performance is good (i.e. <= 1%)ESnet holding steadyEdu (vBNS/Abilene) improving, got bad recentlyXIWT (70% .com) 5-10 times worse than ESnet

Page 14: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

14

How are DoE funded Edu sites doingEdu seen from ESnet Labs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Jun-97 Jan-98 Jul-98 Feb-99

Aug-99

Mar-00

Oct-00

Ma

r-0

0 m

ed

ian

% lo

ss

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Median loss

% sites with > 1% loss

Expon. (Median loss)

Expon. (% sites with >1% loss)

V. poor (> 5% & < 12%):PVAMU, VTechvBNS,

Acceptable (> 1% & < 2.5%): Brandeis, RicevBNS, UCRvBNS, UIUCvBNS (2 bad days in March), TAMUI2

Pairs = 137Fraction NOT good: reduced by 2 in 1.5 yrs

Page 15: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

15

Europe seen from U.S.

650ms

200 ms

7% loss10% loss

1% loss

Monitor siteBeacon site (~10% sites)HENP countryNot HENPNot HENP & not monitored

Page 16: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

16

Asia seen from U.S.

3.6% loss

10% loss

0.1% loss

640 ms

450 ms

250ms

Page 17: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

17

Latin America, Africa & Australasia4% Loss

2% Loss

350 ms

700ms

170 ms

220 ms

Page 18: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

18

Quality of Service: How to improve• More bandwidth

– Keep network load low (< 30%) – Costs (at least in the W) are coming down dramatically,

but non-trivial to keep up

• Reserved/managed bandwidth generally on ATM via PVCs today

• Differentiated services

Page 19: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

19

Effect of more & managed bandwidth

German Universities as good as DESY after Oct-99 upgradeDFN closes Perryman POP loses direct ESnet peeringPeering re-established via Dante @ 60 Hudson

RTT

Loss

Page 20: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

20

RTT from ESnet to Groups of Sites

ITU G.114 300 ms RTT limit for voice

Page 21: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

21

Loss seen from ESnet to groups of Sites

ITU limit for loss

Page 22: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

22

Bulk transfer - Performance TrendsBandwidth TCP < 1460/(RTT * sqrt(loss))

Note: E. Europe not catching up

ESnetFlatteningout

Page 23: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

23

Interactive apps - JitterSLAC<=>CERN two-way

instantaneous packet delay variation

0

10

20

30

40

50

60

70

80

90

-100 -8

0

-60

-40

-20 0

20

40

60

80

100

Ping inter packet delay difference in msec.

Fre

qu

en

cy

0

10

20

30

40

50

60

70

80

90

Frequency

Gaussian

Average = -0.03 msec.Std dev = 35 msec.Median = 0 msec.IQR = 29 msecLoss = 0.3%1000 samples

Gaussian-prob=79*exp(-x**2/(2*(IQR/2)**2))

IPDD(i) = RTT(i) - RTT(i-1)

Page 24: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

24

SLAC-CERNJitter

IQR(ipdv) between CERN & SLAC from Surveyor measurements (12/15/98 & medians for Dec-98)

0.1

1

10

100

0 5 10 15 20 25

Time since midnight (GMT)

IQR

(IP

DV

) in

ms

ec

.

IQR(ipdv) CERN>SLAC IQR(ipdv) SLAC>CERN

Monthly IQR(ipdv) CERN>SLAC Monthly IQR(ipdv) SLAC>CERN

ITU/TIPHON delayjitter threshold

(75 ms)

Page 25: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

25

Voice over IP: Reachability Within N. America, & W. Europe loss, RTT and jitter is acceptable for VoIP

But what about reachability

Page 28: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

28

• SLAC & LBNL have a DS testbed with a 3.5Mbps ATM PVC carved out of 43Mbps

• Made measurements with Becca Nitzan @ ESnet

Differentiated services & VoIP

PBX

VoIP ESnet

ATM

Bottleneck3.5Mbps

Prod

Edge

WFQ

CAR marking

•Apply WFQ & policing (via CAR)

•With WFQ call sounds fine

–Next use ping to characterize:•Mark ping TOS bits with CAR, & use WFQ in routers and see how it affects loss, RTT, jitter etc.

4Mbps

–Inject 4Mbps UDP load•No WFQ can’t make call

–If make call then terrible quality

–Make phone call–< 50% load call OK

24kbps

Page 29: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

29

Plans 1/2• HEPNRC now rejoined at 50% person

• Monitoring – next 2 weeks: select packet sizes, number in stream -

need for• better statistics for high performance links (e.g. PPDG)

• lower impact on low capacity links

– select scheduling, what is logged, mechanism (synack, ping sting)

• Beacons extend from 50 => 70 (requires new mon)

Page 30: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

30

Plans 2/2• With XIWT/DARPA

– Anomaly detection and alerting

– NIMI integration

• More graphical reports– Maps, Java servlet graphs of more metrics and more selectability

– Health watch – upper level displays

– Near realtime for SC2000 – possible interest from ESnet NOC• Maps with colored links with playback

• 3D bar charts

• Extended PPDG support– Higher statistics, better coverage

Page 31: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

31

Summary• Long term agreement between AMP, PingER,

Surveyor, & RIPE– need persistent structure (e.g. congestion or route

changes) for short term point by point agreement

• Rate limiting still a minor effect, but could become a problem, trying to get good signature, have alternates

• International performance from US to sites outside W. Europe, JP, KR, SG, TW is generally poor to bad

• Managed bandwidth can be big help.• ESnet & Internet 2 doing well, even for VoIP,

except reachability has a way to go

Page 32: 1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC  Presented at the.

32

More Information• This talk:

– www.slac.stanford.edu/grp/scs/net/talk/mon-escc-apr00/

• IEPM/PingER home site– www-iepm.slac.stanford.edu/

• Comparison of Surveyor & RIPE & PingER– www.slac.stanford.edu/comp/net/wan-mon/surveyor-vs-ripe.html– www.slac.stanford.edu/comp/net/wan-mon/surveyor-vs-pinger.html

• Detecting ICMP Rate Limiting– www.slac.stanford.edu/grp/scs/net/talk/limiting-feb00/