1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC www.slac.stanford.edu/grp/scs/net/talk/ mon-escc-apr00/ Presented at the ESCC meeting Pleasanton April 26, 2000 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end
32
Embed
1 Internet Performance Monitoring Update Les Cottrell & Warren Matthews – SLAC Presented at the.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Internet Performance Monitoring Update
Les Cottrell & Warren Matthews – SLACwww.slac.stanford.edu/grp/scs/net/talk/mon-escc-apr00/
Presented at the ESCC meeting Pleasanton April 26, 2000Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring
(IEPM), also supported by IUPAP
2
Overview• Pinger
• Validations
• Results
• Quality of Service
• Coming soon
• Summary
3
PingER• Measurements from
– 30 monitors in 15 countries– Over 500 remote hosts– Over 70 countries – Over 2100 monitor-remote site pairs
• Recent monitor additions: ANL, UWisc, NSK, ITEP, RIKEN, KAIST, ILAN, Brazil, Melbourne; working on: Caltech, SDSC
• Over 50% of HENP collaborator sites are explicitly monitored as remote sites by PingER project– Atlas (37%), BaBar (68%), Belle (23%), CDF (73%), CMS (31%),
D0 (60%), LEP (44%), Zeus (35%), PPDG (100%), RHIC(64%)
• Remainder covered by Beacons– Currently 56, extending to 76
4
Beacons & UK seen from ESnet
Sites in UK track one another, so can represent with single site
2 Beacons in UK Indicates common source of congestionIncreased capacity by 155 times in 5 years
Little short term correlationeven for time differences of< 2 secs
Little structureoutliersdon’t match
8
RIPE vs Surveyor 2/2
Optimum agreement ifdisplace RIPE by ~ 0.2 ms(packet size difference)
9
PingER vs AMP
Little obvious short term agreement (R2<0.1)Same if compare ping vs. ping
Avg Ping distribution agrees with AMPBoth show >=95% of samples are 58-59 msecR2 > 0.95 for min & avg
Time series
10
Rate Limiting 1/3 (Mit Shah)
“Tail-drop” behavior
• Rate-limiting kicks in after the first few packets and hence later packets are more likely to be dropped
Calculate slope and histogram slope frequency for all nodes, look at outliers (8)
Added as PingER metric, Still validating, some sites consistentothers vary from month to month
11
Rate Limiting 2/3Asymmetry of Ping vs Sting losses
-0.60
-0.40
-0.20
0.00
0.20
0.40
0.60
0.80
1.00
clan
2.fit
.uni
mas
.my
ultr
a.he
pi.e
du.g
e
ww
w.d
olph
inic
s.no
ns.u
cr.a
c.cr
cab.
cnea
.gov
.ar
tjev.
tel.f
er.h
r
lhr.
com
sats
.net
.pk
pknt
.utm
.my
tnp.
saha
.ern
et.in
ww
w.ji
nr.d
ubna
.su
ns.it
ep.r
u
gam
ma.
carn
et.h
r
intr
ans.
baku
.az
ns1b
.itb.
ac.id
daim
on.u
nian
des.
edu.
co
groa
.uct
.ac.
za
tifr.
res.
in
ww
w.b
u.ac
.th
ww
w.u
sm.m
y
moo
n.at
omki
.hu
cni.m
d
sun.
ihep
.ac.
cnAs
ym
= (
p-s
)/(p
+s
)Measured 4/22/00 for hosts seen from SLACwith high tail-drop.Hosts selected with > 0.7% loss and no sting pathologies
Hosts mainly in former E. block, S. Asia, Latin America & S. AfricaLarge asymmetry means ping loss >> sting loss, maybe limiting
12
Rate Limiting 3/3• Have identified about 2% of sites possibly limiting • Using Sting (Stefan Savage) & SynAck (SLAC)
tools to identify loss(sting or synack probes) << loss(ping)
• www.vincy.bg.ac.yu blocked 884 rounds of 10 ICMP packets each, out of 903
• islamabad-server2.comsats.net.pk – blocked 554 out of 903
• leonis.nus.edu.sg– blocked all non 56Byte packets
• All low loss with sting or synack
13
Results:How are the U.S.
Nets doing?
In general performance is good (i.e. <= 1%)ESnet holding steadyEdu (vBNS/Abilene) improving, got bad recentlyXIWT (70% .com) 5-10 times worse than ESnet
14
How are DoE funded Edu sites doingEdu seen from ESnet Labs
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Jun-97 Jan-98 Jul-98 Feb-99
Aug-99
Mar-00
Oct-00
Ma
r-0
0 m
ed
ian
% lo
ss
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Median loss
% sites with > 1% loss
Expon. (Median loss)
Expon. (% sites with >1% loss)
V. poor (> 5% & < 12%):PVAMU, VTechvBNS,
Acceptable (> 1% & < 2.5%): Brandeis, RicevBNS, UCRvBNS, UIUCvBNS (2 bad days in March), TAMUI2
Pairs = 137Fraction NOT good: reduced by 2 in 1.5 yrs
15
Europe seen from U.S.
650ms
200 ms
7% loss10% loss
1% loss
Monitor siteBeacon site (~10% sites)HENP countryNot HENPNot HENP & not monitored
16
Asia seen from U.S.
3.6% loss
10% loss
0.1% loss
640 ms
450 ms
250ms
17
Latin America, Africa & Australasia4% Loss
2% Loss
350 ms
700ms
170 ms
220 ms
18
Quality of Service: How to improve• More bandwidth
– Keep network load low (< 30%) – Costs (at least in the W) are coming down dramatically,
but non-trivial to keep up
• Reserved/managed bandwidth generally on ATM via PVCs today
• Differentiated services
19
Effect of more & managed bandwidth
German Universities as good as DESY after Oct-99 upgradeDFN closes Perryman POP loses direct ESnet peeringPeering re-established via Dante @ 60 Hudson
RTT
Loss
20
RTT from ESnet to Groups of Sites
ITU G.114 300 ms RTT limit for voice
21
Loss seen from ESnet to groups of Sites
ITU limit for loss
22
Bulk transfer - Performance TrendsBandwidth TCP < 1460/(RTT * sqrt(loss))
Note: E. Europe not catching up
ESnetFlatteningout
23
Interactive apps - JitterSLAC<=>CERN two-way
instantaneous packet delay variation
0
10
20
30
40
50
60
70
80
90
-100 -8
0
-60
-40
-20 0
20
40
60
80
100
Ping inter packet delay difference in msec.
Fre
qu
en
cy
0
10
20
30
40
50
60
70
80
90
Frequency
Gaussian
Average = -0.03 msec.Std dev = 35 msec.Median = 0 msec.IQR = 29 msecLoss = 0.3%1000 samples
Gaussian-prob=79*exp(-x**2/(2*(IQR/2)**2))
IPDD(i) = RTT(i) - RTT(i-1)
24
SLAC-CERNJitter
IQR(ipdv) between CERN & SLAC from Surveyor measurements (12/15/98 & medians for Dec-98)