TDC561 Network Programming Camelia Zlatea, PhD Email: czlatea@ cs . depaul . edu Week 10: Performance Aspects of End-to-End (Transport) Protocols and API Programming
Jan 04, 2016
TDC561 Network Programming
Camelia Zlatea, PhD
Email: [email protected]
Week 10:
Performance Aspects of End-to-End (Transport) Protocols and API Programming
Page 2Network Programming (TDC561) Winter 2003
Underlying best-effort network– drops messages– re-orders messages– delivers duplicate copies of a given message– limits messages to some finite size– delivers messages after an arbitrarily long delay
Common end-to-end services– guarantee message delivery– deliver messages in the same order they are sent– deliver at most one copy of each message– support arbitrarily large messages– support synchronization– allow the receiver to apply flow control to the sender– support multiple application processes on each host
End-to-End (Transport) Protocols
Page 3Network Programming (TDC561) Winter 2003
Simple Demultiplexor (UDP)
Unreliable and unordered datagram service Adds multiplexing No flow control Endpoints identified by ports
– servers have well-known ports– see /etc/services on Unix
Optional checksum– pseudo header + udp header + data
Header formatSrcPort DstPort
Checksum LengthData
0 16 31
Page 4Network Programming (TDC561) Winter 2003
Applicationprocess
Applicationprocess
Applicationprocess
UDP
Packets arrive
Ports
Queues
Packetsdemultiplexed
Simple Demultiplexor (UDP)
Page 5Network Programming (TDC561) Winter 2003
Reliable Byte-Stream (TCP)
Connection-oriented Byte-stream
– sending process writes some number of bytes
– TCP breaks into segments and sends via IP
– receiving process reads some number of bytes
Full duplex Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning
network
Page 6Network Programming (TDC561) Winter 2003
Application process
W rite
bytes
TCP
Send buffer
Segment Segment Segment
Transmit segments
Application process
Read
bytes
TCP
Receive buffer
…
… …
Reliable Byte-Stream (TCP)
Page 7Network Programming (TDC561) Winter 2003
End-to-End Issues
Based on sliding window protocol used at data link level, but the situation is very different. Potentially connects many different hosts
– need explicit connection establishment and termination
Potentially different RTT (Round Trip Time)– need adaptive timeout mechanism
Potentially long delay in network– need to be prepared for arrival of very old packets
Potentially different capacity at destination– need to accommodate different amounts of buffering
Potentially different network capacity– need to be prepared for network congestion
Page 8Network Programming (TDC561) Winter 2003
Segment Format
Each connection identified with 4-tuple:– <SrcPort, SrcIPAddr, DstPort,
DstIPAddr>
Sliding window + flow control– Acknowledgment, SequenceNum,
AdvertisedWindow
Flags: SYN, FIN, RESET, PUSH, URG, ACK
Checksum: pseudo header + tcp header + data
Options (variable)
Data
Checksum
SrcPort DstPort
HdrLen 0 Flags
UrgPtr
AdvertisedWindow
SequenceNum
Acknowledgment
0 4 10 16 31
Sender
Data (SequenceNum)
Acknowledgment +AdvertisedWindow
Receiver
Page 9Network Programming (TDC561) Winter 2003
TCP Sliding Window Revisited
Sending application
LastByteWritten
TCP
LastByteSentLastByteAcked
Receiving application
LastByteRead
TCP
LastByteRcvdNextByteExpected
(a) (b)
Relationship between TCP send buffer (a) and receive buffer (b)
Each byte has a sequence number ACKs are cumulative
Page 10Network Programming (TDC561) Winter 2003
Sending side– LastByteAcked <= LastByteSent– LastByteSent <= LastByteWritten– bytes between LastByteAcked and LastByteWritten must be
buffered Receiving side
– LastByteRead < NextByteExpected– NextByteExpected <= LastByteRcvd + 1– bytes between NextByteRead and LastByteRcvd must be buffered
TCP Sliding Window Revisited
Page 11Network Programming (TDC561) Winter 2003
Flow Control
Sender buffer size: MaxSendBuffer Receive buffer size: MaxRcvBuffer Receiving side
– LastByteRcvd - NextByteRead Š MaxRcvBuffer– AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd -
NextByteRead)
Sending side– NextByteExpected Š LastByteRcvd + 1– LastByteSent - LastByteAcked Š AdvertisedWindow– EffectiveWindow = AdvertisedWindow - (LastByteSent -
LastByteAcked) – LastByteWritten - LastByteAcked Š MaxSendBuffer– block sender if (LastByteWritten - LastByteAcked) + y >
MaxSendBuffer
Always send ACK in response to an arriving data segment
Persist when AdvertisedWindow=0
Page 12Network Programming (TDC561) Winter 2003
Bandwidth
Delay/Latency
Network as a Pipe
Page 13Network Programming (TDC561) Winter 2003
Keeping the Pipe Full
TCP Correctness and Performance Aspect– Size of the SequenceNum and AdvertizedWindow affects the
correctness and performance of TCP– Wrap Around: 32-bit SequenceNum– Bandwidth & Time Until Wrap Around
BandwidthT1 (1.5Mbps)Ethernet (10Mbps)T3 (45Mbps)FDDI (100Mbps)STS-3 (155Mbps)STS-12 (622Mbps)STS-24 (1.2Gbps)
Time Until Wrap Around6.4 hours57 minutes13 minutes6 minutes4 minutes55 seconds28 seconds
Page 14Network Programming (TDC561) Winter 2003
TCP Correctness and Performance Aspect– Bytes in Transit: 16-bit AdvertisedWindow ( up to 64KBytes)– AdvertisedWindow large enough to allow sender to keep the pipe full (Delay x Bandwidth) Product– Bandwidth and (Delay x Bandwidth) Product dictates how big AdvertisedWindow needs to be– Required window size for 100ms RTT
TCP protocol extensionsBandwidth
T1 (1.5Mbps)Ethernet (10Mbps)T3 (45Mbps)FDDI (100Mbps)STS-3 (155Mbps)STS-12 (622Mbps)STS-24 (1.2Gbps)
Delay x Bandwidth Product
18KB122KB549KB1.2MB1.8MB7.4MB14.8MB
Keeping the Pipe Full
Page 15Network Programming (TDC561) Winter 2003
Separate the implementation of protocols from the interface they export.
Important at the transport layer since this defines the point where application programs typically access the network.
This interface is often called the application programming interface, or API.
Notes The API is usually defined by the OS. Example API: sockets Defined by BSD Unix, but ported to other systems
Application Programming Interface
API
Application
Transport protocol
Page 16Network Programming (TDC561) Winter 2003
Socket Operations Creating a socket
– int socket(int domain, int type, int protocol)– domain=AF_INET, PF_UNIX– type=SOCK_STREAM, SOCK_DGRAM
Passive open on serverint bind(int socket, struct sockaddr *address, int addr_len)int listen(int socket, int backlog)int accept(int socket, struct sockaddr *address, int *addr_len)
Active open on clientint connect(int socket, struct sockaddr *address,
int addr_len) Sending and receiving messages
int write(int socket, char *message, int msg_len, int flags) int read(int socket, char *buffer, int buf_len,
int flags)
Page 17Network Programming (TDC561) Winter 2003
Performance
Page 18Network Programming (TDC561) Winter 2003
Performance Overview
Bandwidth– a measure of of the width of a frequency band
» voice grade telephone line supports a frequency band ranging from 300MHz to 3300MHz ; it is said to have”a bandwidth” of 3000MHZ; when given in Hz, it probably refers to the range of signals that can be accommodated.
– Bandwidth of a communication link = number of bits per second that can be transmitted on that link.» ETH bandwidth is 10Mbps» available bandwidth» Measured bandwidth = number of bits per second that can be actually
transmitted on that link, in practice» Throughput = Measured Performance of a system
• A pair of nodes connected by a link with a bandwidth of 10Mbs might achieve a throughput of 2Mbps; an application on one host could sedn data to the other host at 2Mbps.
– Bandwidth requirements for an application» Number of bits per second that it needs to transmit over the network to perform
acceptably
Page 19Network Programming (TDC561) Winter 2003
10,000
5000
2000
1000
500
200
100
50
20
10
5
2
1
10010RTT (ms)
1-MB object, 1.5-Mbps link1-MB object, 10-Mbps link2-KB object, 1.5-Mbps link2-KB object, 10-Mbps link1-byte object, 1.5-Mbps link
1-byte object, 10-Mbps linkPer
ceiv
ed l
ate
nc
y (m
s)
Latency (Response Time, delay) vs. RTT
Latency = Propagation + Transmit + Queue
Page 20Network Programming (TDC561) Winter 2003
Source1-Mbps crosscountry link
Destination.1 Mb
.1 Mb
.1 Mb
.1 Mb
(a) 84 pipes full of data = 8.4Mb file
(b) 1/12 of one pipe full of data = 8.4Mb file
Source1-Gbps crosscountry link
Destination8.4
Mb
…
Performance Overview
Latency and Bandwidth
Page 21Network Programming (TDC561) Winter 2003
Performance Overview
1Mbps and 1Gbps links have the same latency– limited by the speed of light
To transfer a 1MB file takes...– 100ms RTTs on a 1Mbps link
– doesn't fill a 1Gbps link (12.5MB delay x bandwidth)
In other words:– 1MB file is to 1Gbps network what 1KB packet is to 1Mbps
network
Bandwidth
Delay/Latency
Page 22Network Programming (TDC561) Winter 2003
Latency/Bandwidth Tradeoff
Throughput = TransferSize / TransferTime
– if it takes 10ms to transfer 1MB, then the effective throughput is 1MB/10ms = 100MBps = 800Mbps
TransferTime = Latency + 1/Bandwidth x TransferSize
– if network bandwidth is 1Gbps (it takes 1/1Gbps x 1MB = 0.8ms to transmit 1MB), an end-to-end transfer that requires 1 RTT of 100ms has a total transfer time of 100.8ms
– effective throughput is 1MB/100.8ms = 79.4Mbps, not 1Gbps
Page 23Network Programming (TDC561) Winter 2003
Round-Trip Latency (s)
Per-Layer Latency (1 byte latency) ETH + wire: 216s UDP/IP: 58s
Message size (bytes)11002003004005006007008009001000
UDP297413572732898106712261386155117191878
TCP3655196918531016118513541514167618452015
IP
PHY
ETH
App2
UDP
App1
TCP
Page 24Network Programming (TDC561) Winter 2003
Throughput (UDP/IP/ETH)
10
9.6
9.2
9
8.6
1 2 4 8 16 32
Th
rou
gh
pu
t (M
bp
s)
Message size (KB)
9.8
9.4
8.8
8
8.2
8.4
Throughput improves as the message get larger, up to a limit when per-message overhead becomes insignificant = message overhead/number of bytes = ~16KB
Flattens at ~9.5Mbps < ETH 10Mbs
Page 25Network Programming (TDC561) Winter 2003
Notes– transferring a large amount of data helps improve the effective
throughput; in the limit, an infinitely large transfer size causes the effective throughput to approach the network bandwidth
– having to endure more than one RTT will hurt the effective throughput for any transfer of finite size, and will be most noticeable for small transfers
Page 26Network Programming (TDC561) Winter 2003
Implications Congestion control
– feedback based mechanisms require an RTT to adjust– can send 10MB in one 100ms RTT on a 1Gbps network– that 10MB might congest a router and lead to massive losses– can lose half a delay x bandwidth's of data during slow start– reservations work for continuous streams (e.g., video), but
require an extra RTT for bulk transfers Retransmissions
– retransmitting a packet costs 1 RTT– dropping even one packet (cell) halves effective bandwidth– retransmission also implies buffering at the sender– possible solution: forward error correction (FEC)
Trading bandwidth for latency– each RTT is precious– willing to “waste” bandwidth to save latency– example: pre-fetching
Page 27Network Programming (TDC561) Winter 2003
Host Memory Bottleneck
Issue– turning host-to-host bandwidth into application-
to-application bandwidth
– have to deliver data across I/O and memory buses into cache and registers
Page 28Network Programming (TDC561) Winter 2003
Memory bandwidth– I/O bus must keep up with network speed
(currently does for STS-12, assuming peak rate is achievable)
– 114MBps (measured number) is only slightly faster than I/O bus; can't afford to go across memory bus twice
– caches are of questionable value (rather small)– lots of reason to access buffers
» user/kernel boundary» certain protocols (reassembly, check-summing)» network device and its driver
Same latency/bandwidth problems as high-speed networks
Page 29Network Programming (TDC561) Winter 2003
Integrated Services
High-speed networks have enabled new applications– they also need “deliver on time” assurances from the network
Applications that are sensitive to the timeliness of data are called real-time applications– voice
– video
– industrial control
Timeliness guarantees must come from inside the network– end-hosts cannot correct late packets like they can correct for
lost packets
Need more than best-effort– IETF is standardizing extensions to best-effort model