1 End-host Route Selection in the CHEETAH Networking Solution Zhanxiang Huang 05/01/2006 Advisor: Malathi Veeraraghavan Master’s Project Presentation Acknowledgement: This work was carried out under the sponsorship of NSF ITR-0312376, NSF ANI-0335190, NSF ANI- 0087487, and DOE DE-FG02-04ER25640 grants.
36
Embed
End-host Route Selection in the CHEETAH Networking Solution
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
End-host Route Selection in the CHEETAH Networking Solution
Zhanxiang Huang
05/01/2006
Advisor: Malathi Veeraraghavan
Master’s Project Presentation
Acknowledgement: This work was carried out under the sponsorship of NSF ITR-0312376, NSF ANI-0335190, NSF ANI-0087487, and DOE DE-FG02-04ER25640 grants.
2
Outline
• CHEETAH project overview• End-host route selection problem• Model-based solution• Measurement-based solution• Conclusion and future work
3
Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH)
ConnectionlessBest-effort
Internet
Goal: high-speed rate-guaranteed end-to-end circuits with call-by-call-based bandwidth sharing
– Internet:• Round trip time• Bottleneck link rate• Packet loss rate
• At end-hosts:– Transport layer protocol and parameter settings– OS Process scheduling– Hard disk throughput
9
How to Estimate Data Transfer Delays?
• Model-based solution– Construct mathematical models for computing file transfer
delays over the circuit and Internet paths.
• Measurement-based solution– Estimate file transfer delays based on delay
measurements of past file transfers.
10
Model-based Solution
• Modeling TCP delay over Internet path– TCP Reno delay model [UMass98]
• Modeling delay over CHEETAH circuit– Let Pb be the call blocking probability
– Average delay over circuit is
(1 ) ( _ _ _ _ )
( _ _ _ _ _ )b
b
P setup delay transfer delay over circuit
P average setup failure delay delay over Internet
11
Inputs to Delay Models
• Inputs to TCP Reno delay model: – File size– Bottleneck link rate– Round trip time– Packet loss rate
– Initial congestion window size
– Sender and receiver buffer sizes
• Inputs to circuit delay model:– File size– Circuit rate– Round trip time over the
circuit path– Round trip time over the
signaling path– Call processing delay at
each switch– Signaling engine call load– Number of switches on the
path– Call blocking probability
12
Limitations of the Model-based Solution
• Packet loss rate is difficult to measure. (Tools that I tested include Sting, iperf, ping, badabing and etc.)
• Same are call blocking probability and signaling engine call load.
• Many TCP variants are emerging but there is no delay model for them yet.– e.g. BIC-TCP has been included in linux kernel 2.6 but has
not been modeled yet.
13
Internet
Measurement-based Solution
• Assumptions– Fixed circuit rates, e.g. 1Gbps,
100Mbps…
– The number of destinations with which an end-host typically communicates, is not large.
– Internet traffic has repeating patterns over time, which means that during a specific time period, round trip time, packet loss rate and call blocking probability are likely the same.
delay
file size
circuit
Internet
0crossover
circuit
Idea: Discretize time and file size, at each time slot, for each destination and each circuit rate, measure the delays of file transfers over both paths to find the crossover file size.
14
Active and Passive Measurements
• Active measurements – Traffic is injected into the network explicitly for
the purpose of obtaining measurements.
• Passive measurements– Data is collected under normal network usage.
15
A Best-case Active-measurement Experiment
Best-case means packet loss rate and call blocking probability are equal to zero. TCP buffers are set to Bandwidth Delay Product values.
Delays on Internet path and circuit are random variables, DI and DC.
1. Find an interval (min, max) that contains the crossover file size;
2. Measure delays on both paths for file size mid=(min+max)/2;
3. If |E(DI)-E(DC)|<e, crossover=mid;
4. If E(DI)>E(DC), max=mid;
5. If E(DI)<E(DC), min=mid;
6. Go to 2;
delay
file size
circuit
Internet
0 crossover
min max
Drawback: measurement traffic overhead
Let M be the initial max file size and N be the initial min file size. Traffic size = O(M*log(M-N)).
17
Passive Measurements
1. Initiate (min, max) with (0, +inf).
2. If file size < min, choose Internet;
3. If file size > max, choose circuit;
4. If min <= file size <= max, choose each path with probability ½. Record the data transfer delays.
5. Once there are sufficient records to compute Pr(DI-DC>0) for a file size in (min, max), adjust min or max based on Pr(DI-DC>0).
p
file size
maxmin0
1
1/2
crossover
(Note that min and max are file sizes in application queries and assume DI and DC follow normal distributions.)
18
Hybrid Measurements
• Fast startup– Find the bottleneck link rate of the Internet path and the
circuit setup delay through either passive or active measurement.
– Solve the equation for “file_size”.
– Init (min, max) with (file_size/2, file_size*2).
• Use active measurements when initiated by administrator users.
_ __ _
_ _ _ _ _
file size file sizeestimated setup delay
circuit rate Internet path bottleneck link rate
19
Bookkeeping Data Structure
Time Slot Destination Circuit Rate
Crossover File Size
Transfer Delay Records
File Size DI (sec) DC (sec)
02:00 – 03:00 Sunday
128.109.34.22 1Gbps 50MByte – 70MByte
50MByte 5.081 5.715
60MByte 5.060 5.066
70MByte 5.033 4.002
… … …
…
20
Interaction Between CHEETAH Software Modules and Applications
ApplicationDecision-making
Thread 1
Measurement Monitor
Thread 2
RDDatabase
query
reply
update
Routing Decision Module
triggerreport delays
report blocks
Administrator
Admin Interface
QueryInterface
ReportInterface
RDAPI
RSVPAPI
update
query
reply
RSVP / C-TCP Modules
TCP
trigger
trigger
Active Measurement
Scheduler
Thread 3
SysCallInterface
MeasurementTools
RSVPAPI
trigger
reportdelays orbandwidth
trigger
1 23
4
5 67
5
21
Evaluation
• Experiment setup– The Routing Decision server and an application run on a
Linux-2.6 box with 2 Xeon 2.8GHz CPUs and 1GB memory.– The application queries with parameters, <128.109.34.22,
1Gbps circuit rate, 1GByte file size, time slot 02:00 Sunday>. The database has an entry corresponding to this IP and time slot.
– Internet path: bottleneck link rate=100Mbps; round trip time =24ms. Circuit: round trip time=8ms.
• Delay– An application submits 100 queries.– Mean query delay = 0.0055 sec < round trip time << 5 sec
(the average setup delay).– Query delay standard deviation = 2.3608e-004 sec < 0.3ms
22
Conclusion and Future Work
• Conclusion– Measurement-based solution is better than the model-
based solution. Adaptive to new TCP variants Adaptive to the traffic pattern changes Adaptive to hardware or software configuration changes Low overhead
• Future work– Scalability issues
• For a computer that communicates with a large number of end-hosts (e.g. a web server), we can separate the RD module from the computer and run a separate RD server for it.
• For computers in the same LAN and with the same hardware and software configurations, we create an RD server for the whole LAN.
23
Reference
[CHEETAH] M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture, Proc. of Opticomm 2003, Oct. 13-17, 2003. Dallas, TX, Won Best Student Paper Award.
[C-TCP] A. P. Mudambi, X. Zheng, and M. Veeraraghavan, A Transport Protocol for Dedicated End-to-End Circuits, accepted by ICC 2006.
[UMass98] J. Padhye, V. Firoiu, D. Towsley and J. Kurose. Modeling TCP throughput: A simple model and its empirical validation. In SIGCOMM ’98, September 1998.
• Assume the delays observed on the Internet path and the circuit are normally distributed random variables, DI and DC. Each file size has these two random variables.
2
( ) ( ) ( )
( ) ( ) ( )
(2 ), where z is standard normal distribution,
is the sample standard deviation, is the confidence
level and w is the width of the confidence interval
E TlossTOE T l Q p E W E Z Q p E W RTTss ss ssloss
dl psswp p
Q p w wp p
G p TTOE Zp
i iG p pi
T
of the first TO in a sequence of one
or more successful timeouts].
32
TCP-Reno delay model (3)
(iii) Calculate [ ]
[ ] [ ] / ( , , , )max0
[ ] [ ]
2 8(1 ) 2 2( ) ( )3 3 3
( , , , )max0
1 ( )( , ( ))
2, when ( ) max( , ( )) ( ) 0( ( ) 1)
2 1
1 max ( , ma2
E Tca
E T E d R p RTT T Wca ca
E d d E dca ss
b p bW p
b bp b
R p RTT T W
p W pQ p W p
pW p W
Q p W p G p TbRTT W p
p
WpQ p W
p
)x
,( , ) ( )1 max 0( 2)max
8 1max
otherwiseQ p W G p Tb p
RTT WpW p
33
Start Setup Delay Timer
Call Bandwidth Requester
Setup Success
Yes
Stop Setup Delay Timer
Init sl = s = su setup_delay*ci
rcuit_rate, cover = false
No
Start Circuit Transfer Delay
Timer
Transfer file of size s over circuit
Stop Circuit Transfer Delay
Timer
Compute Circuit Throughput
Start Internet Transfer Delay
Timer
Transfer file of size s over the Internet
Stop Internet Transfer Delay
Timer
Compute Internet Throughput
Internet Throughput
>Circuit
Throughput
Yes
sl = sIf ( !cover ) su
= 2*sus = (sl+su)/2
No sl = su
Yes
| T_Internet -T_Circuit | <
delta
Yes
Crossover File Size is s and
update the DB
No
sl = 0s = (sl+su)/2If ( !cover ) cover = true
No
Start
End
Too many fails
s denotes File Size, sl denotes the Lower Bound of s, su denotes the upper Bound of s, cover denotes whether or not (sl, su) has covered the crossover file size and delta is the threshold for the difference between circuit and Internet throughputs.
Tear down circuit
Binary Search Algorithm for Determining the Crossover File Size for One Destination
su = ss = (sl+su)/2If ( !cover ) cover = true
34
Measurement example room in
35
Experiment setup
mvstu6
CPU 2 CPUs, each is Intel(R) Xeon(TM) CPU
2.80GHz with 1024KB cache
Memory 1GB
Hard disk 1 MegaRAID Model: LD 0 RAID0 69G
OS 2.6.12-1.1381_FC3smp
File system EXT3
NIC Intel PRO/1000 Single Port Adapters working at rate 100Mbps, Full Duplex
36
Acronym
• CHEETAH – Circuit-switched High-speed End-to-End Transport ArcHitecture
• PLR – Packet Loss Rate• SD – Setup/Teardown Delay• RTT – Round Trip Time• AB – Available Bandwidth• GMPLS – Generalized Multiple Protocol Label