How to Minimize Transport Protocol Processing:
Implementation and Evaluation of Network Level Framing
Pål Halvorsen, Thomas Plagemann, and Vera Goebel
Institute for Informatics, University of OsloNorway
4th International Workshop on Multimedia Network Systems and Applications
(MNSA ’02),Vienna, Austria, July 2002
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Overview
Application scenario
The INSTANCE project
Network Level Framing (NLF) design and implementation performance evaluation
Summary and conclusions
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Network
Application ScenarioMedia-on-Demand server:Applicable in applications like News- or Video-on-Demand provided by city-wide cable or pay-per-view companies
Multimedia Storage Server
Project goals:Optimize performance within a single server:• Reduce resource requirements • Maximize number of clients
Retrieval is the bottleneck:Some important factors:• Memory management• Communication protocol processing• Error management
Network
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
The INSTANCE Project
We try to make optimal use of agiven set of resources:
memory architecture
integrated error management
network level framing (NLF)network level framing (NLF)Project goals:Optimize performance within a single server:• Reduce resource requirements • Maximize number of clients
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Traditional Approach
TRANSPORT
NETWORK
LINK
TRANSPORT
NETWORK
LINK
TRANSPORT
NETWORK
LINK
TRANSPORT
NETWORK
LINK
Upload to serverFrequency: low (1)
Download from serverFrequency: very high
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Network Level Framing (NLF): Basic Idea
TRANSPORT
NETWORK
LINK
TRANSPORT
NETWORK
LINK
Upload to serverFrequency: low (1)
Download from serverFrequency: very high
NETWORK
LINK
NETWORK
LINK
TRANSPORT TRANSPORTTRANSPORT TRANSPORT
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
When to Store Packets
IP
Transport Layer
Network Layer
Link Layer
IP
UDP
UDP
IP
TCPor
UDP/FEC
IP
TCPor
UDP/FEC
IP
UDP
IP
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
UDP
UDP
Splitting the UDP Protocol
Prepend UDP and IP headers
Prepare pseudo header for checksum
Calculate checksum
Fill in some other IP header fields
Hand over datagram to IP
udp_output()
Temporarily connect
Disconnect connectsocket
udp_output()Prepend UDP and IP headers
Prepare pseudo header for checksum, clear unknown fields
Precalculate checksum
udp_PreOut()
Update UDP and IP headers
Update checksum, i.e., only add checksum of prior unknown fields
Fill in other IP header fields
Hand over datagram to IP
udp_QuickOut()UDP
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Traditional Checksum Operations – I
The UDP checksum covers three fields: A 12 byte pseudo header containg fields from the IP header The 8 byte UDP header The UDP data (payload)
Simplified checksum calculation function (in_cksum):u_16int_t *w;int checksum;
for each mbuf in packet {w = mbuf -> m_data;while data in mbuf {
checksum += w;w++;
}}
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Traditional Checksum Operations – II
Traditional checksum operation:
u_16int_t *w;int checksum;
for each mbuf in packet {w = mbuf -> m_data;while data in mbuf {
checksum += w;w++;
}}
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Modified Checksum Operations
NLF checksum operation:
+
+
=
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Implementation – I
Straight forward implementation:
To allow flexibility, we have one data and one meta-data file:
data
precalculated header
(meta-data)
data
meta-data
UDP
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Implementation – II NLF version 1:
most of the UDP/IP processing is spent on checksum calculation
precalculate checksum over data payload during transmission time:
generate header calculate checksum over header and add precalculated payload checksum
NLF version 2: several reports show increased performance using header templates
precalculate checksum over data payload during stream open:
generate header template calculate header checksum
during transmission time: block copy header template add header template checksum, payload checksum, and packet length
field
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Performance: Test Setup Implemented in NetBSD 1.5.2
Dell Precision Workstation 620 PIII 933 MHz CPU 3 COM 1 Gbps NIC
Software probe RDTSC instruction CPUID instruction probe overhead 206 cycles
Performed tests using 1 KB, 2 KB, 4 KB, and 8 KB UDP packets
Transmitting 225 MB of data
Data is transmitted using the zero-copy data path
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Performance: Checksum
0
1000
2000
3000
4000
5000
6000
7000
1 KB 2 KB 4 KB 8 KB
Traditional
UDP data
UDP data +header
11899 23674
~ 50 cycles less
Overhead increases linearly with payload size
Overhead is constant regardless of payload
CPU
cycl
es
Packet size
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Performance: Header Overhead
0
200
400
600
800
1000
1 KB 2 KB 4 KB 8 KB
NLF, v1NLF, v2
NLF version 3: use header template checksum, but generate header instead of block copy
~25 cycles more
CPU
cycl
es
Packet size
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Performance: UDP
0
1000
2000
3000
4000
5000
6000
7000
1 KB 2 KB 4 KB 8 KB
TraditionalNLF, v1NLF, v2NLF, v3
12304 24108
CPU
cycl
es
Packet size
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Conclusions and Future Work Network Level FramingNetwork Level Framing reduces communication
system processing by precalculating payload checksum (off-line) header checksum (stream open)
Gain per packet is dependent of packet payload size, e.g., 1 KB (8 KB) 97.3 % (99.6 %)
Our mechanisms (at least) double the number of concurrent clients
Ongoing and future work: NLF in lower protocols (ongoing) On-board processing
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Questions??
© 2002 Pål HalvorsenMNSA’02, Vienna, Austria, July 2002
Related Work
Checksum caching in memory high data rates cached elements will be
removed before it can be reused
Header templates block-copying is time consuming
On-Board processing useful and becoming “off-the-shelve” hardware may be nice to combine with NLF