SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF Acknowledgments
68
Embed
SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SCinet Caltech-SLAC experiments
netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
PrototypeC. Jin, D. Wei
TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)
Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams StarLight: T. deFanti, L. Winkler
Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Acknowledgments
FAST Protocols for Ultrascale Networks
netlab.caltech.edu/FAST
Internet: distributed feedback control system TCP: adapts sending rate to congestion AQM: feeds back congestion information
Rf (s)
Rb’(s)
x
))((1
lll
l ctyc
p
)()(1)( tan)(
)()(1-2
tqtttT
wx iid
tqtxi
ii ii
ii
y
pq
TCP AQM
Theory
Calren2/Abilene
Chicago
Amsterdam
CERN
Geneva
SURFNet
StarLight
WAN in LabCaltech
research & production networks
Multi-Gbps50-200ms delay
Experiment
155Mb/s
slowstart
equilibrium
FASTrecovery
FASTretransmit
timeout
10Gb/s
Implementation
Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS)
Theorem (Low et al, Infocom’02) Reno/RED is stable if
Stability: Reno/RED
F1
FN
G1
GL
Rf(s)
Rb’(s)
TCP Network AQM
x y
q p
TCP: Small Small c Large N
RED: Small Large delay
netlab.caltech.edu
Stability: scalable control
F1
FN
G1
GL
Rf(s)
Rb’(s)
TCP Network AQM
x y
q p
lll
l ctyc
tp )(1
)()(
)(tq
mii
iii
i
extx
Theorem (Paganini, Doyle, Low, CDC’01) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology
netlab.caltech.edu
Stability: Vegas
ii
ii
dtqtx
i tTx
)()(
21sgn
)(
1
F1
FN
G1
GL
Rf(s)
Rb’(s)
TCP Network AQM
x y
q p
lll
l ctyc
tp )(1
)(
Theorem (Choe & Low, Infocom’03) Provided R is full rank, feedback loop is locally stable if
), ;( max 20 kMTx ii
netlab.caltech.edu
Stability: Stabilized Vegas
)()(1)( tan)(
1 )()(1-
2tqtt
tTx iid
tqtxi ii
ii
F1
FN
G1
GL
Rf(s)
Rb’(s)
TCP Network AQM
x y
q p
lll
l ctyc
tp )(1
)(
Theorem (Choe & Low, Infocom’03) Provided R is full rank, feedback loop is locally stable if
),( max aTx ii
netlab.caltech.edu
Stability: Stabilized Vegas
)()(1)( tan)(
1 )()(1-
2tqtt
tTx iid
tqtxi ii
ii
F1
FN
G1
GL
Rf(s)
Rb’(s)
TCP Network AQM
x y
q p
lll
l ctyc
tp )(1
)(
Application Stabilized TCP with current routers Queueing delay as congestion measure has right scaling Incremental deployment with ECN
netlab.caltech.edu
Fast AQM Scalable TCP
Equilibrium properties Uses end-to-end delay and loss Achieves any desired fairness, expressed by utility function Very high utilization (99% in theory)
Stability properties Stability for arbitrary delay, capacity, routing & load Robust to heterogeneity, evolution, … Good performance
Negligible queueing delay & loss (with ECN) Fast response
netlab.caltech.edu
Implementation
Sender-side kernel modification Build on
Reno, NewReno, SACK, Vegas New insights
Difficulties due to Effects ignored in theory Large window size
First demonstration in SuperComputing Conf, Nov 2002 Developers: Cheng Jin & David Wei FAST Team & Partners
92%FAST Standard MTU Utilization averaged over 1hr
txq=100 txq=10000
95%
16%
48%
Linux TCP Linux TCP FAST
2G
1G
netlab.caltech.edu
Effect of MTU
(Sylvain Ravot, Caltech/CERN)
Linux TCP
SCinet Caltech-SLAC experiments
netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
PrototypeC. Jin, D. Wei
TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)
Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams StarLight: T. deFanti, L. Winkler
Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Linux TCP handshake differs from the TCP specification
Is 64 KB too small for ssthresh? 1 Gbps x 100 ms = 12.5 MB !
What about pacing? Gamma parameter in Vegas
netlab.caltech.edu
TCP Congestion States
Established
Slow Start
High Throughput
FAST’sRetransmitTime-out *
3 dup acks
retransmision timer fired
netlab.caltech.edu
High Throughput
Update cwnd as follows: +1 pkts in queue < + kq’ - 1 otherwise
Packet reordering may be frequent Disabling delayed ack can generate
many dup acks Is THREE the right number for Gbps?
netlab.caltech.edu
TCP Congestion States
Established
Slow Start
High Throughput
FAST’sRecovery
FAST’sRetransmit
3 dup acks
retransmit packetrecord snd_nxt
reduce cwnd/ssthresh
snd_una > recorded snd_nxt
send packet if in_flight < cwnd
netlab.caltech.edu
When Loss Happens
Reduce cwnd/ssthresh only when loss is due to congestion
Maintain in_flight and send data when in_flight < cwnd
Do FAST’s Recovery until snd_una >= recorded snd_nxt
netlab.caltech.edu
TCP Congestion States
Established
Slow Start
High Throughput
FAST’sRecovery
FAST’sRetransmitTime-out *
3 dup acks
retransmit packetrecord snd_nxt
reduce cwnd/ssthresh
retransmision timer fired
netlab.caltech.edu
When Time-out Happens
Very bad for throughput Mark all unacknowledged pkts as lost and
do slow start Dup acks cause false retransmits since
receiver’s state is unknown Floyd has a “fix” (RFC 2582).
netlab.caltech.edu
TCP Congestion States
Established
Slow Start
High Throughput
FAST’sRecovery
FAST’s RetransmitTime-out *
ack for syn/ackcwnd > ssthresh
3 dup acks
retransmit packetrecord snd_nxt
reduce cwnd/ssthresh
snd_una > recorded snd_nxt
retransmision timer fired
netlab.caltech.edu
Individual Packet States
Birth Sending In Flight Received
Queued Dropped Buffered
Freed Delivered
queueing
out of order queueand no memoryack’d
SCinet Bandwidth Challenge
netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
Experiment
Sunnyvale Baltimore
Chicago
Geneva
3000km 1000km
70
00
km
C. Jin, D. Wei, S. LowFAST Team and Partners
Internet: distributed feedbacksystem Rf (s)
Rb’(s)
x
p
TCP AQM
Theory
22.8.02IPv6
9.4.021 flow
29.3.00multiple
Balt
imore
-Geneva
Baltim
ore-
Sunn
yval
eSC20021 flow
SC20022 flows
SC200210 flows
I2 LSR
Sunnyvale
-Geneva
FAST TCP Standard MTU Peak window = 14,100 pkts 940 Mbps single flow/GE card
9.4 petabit-meter/sec 1.9 times LSR
9.4 Gbps with 10 flows 37.0 petabit-meter/sec 6.9 times LSR
16TB in 6 hours with 7 flows
Implementation Sender-side modification Delay based Stabilized Vegas
Highlights
netlab.caltech.edu
Baltim
ore-
Sunn
yval
e
Sun
nyv
ale-
Gen
eva
29.3.2000multiple
22.8.2002IPv6
9.4.20021 flow
SC2002 1 flow
SC200210 flows
FAST BMPS
I2 L
SR
Bmps Thruput Duration
37.0 9.40 Gbps min
9.42 940 Mbps 19 min
5.38 1.02 Gbps 82 sec
4.93 402 Mbps 13 sec
0.03 8 Mbps 60 min
FA
ST
netlab.caltech.edu
FAST: 7 flows
Statistics Data: 2.857 TB Distance: 3,936 km Delay: 85 msAverage Duration: 60 mins Thruput: 6.35 Gbps Bmps: 24.99 petab-m/sPeak Duration: 3.0 mins Thruput: 6.58 Gbps Bmps: 25.90 petab-m/s
Network SC2002 (Baltimore) SLAC (Sunnyvale), GE , Standard MTU
18 Nov 2002 Mon
cwnd = 6,658 pkts per flow
17 Nov 2002 Sun
netlab.caltech.edu
FAST: single flow
Statistics Data: 273 GB Distance: 10,025 km Delay: 180 msAverage Duration: 43 mins Thruput: 847 Mbps Bmps: 8.49 petab-m/sPeak Duration: 19.2 mins Thruput: 940 Mbps Bmps: 9.42 petab-m/s
Network CERN (Geneva) SLAC (Sunnyvale), GE, Standard MTU
17 Nov 2002 Sun
cwnd = 14,100 pkts
SCinet Bandwidth Challenge
netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
PrototypeC. Jin, D. Wei
TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)
Experiment/facilities Caltech: J. Bunn, S. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J.
Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, L. Cottrell, C. Logg, W. Matthews, R. Mount, J. Navratil StarLight: T. deFanti, L. Winkler
Major sponsors/partnersARO, CACR, Cisco, DataTAG, DoE, Lee Center, Level3, NSF