Profiling Grid Data Profiling Grid Data Transfer Protocols Transfer Protocols and Servers and Servers George Kola, Tevfik Kosar and George Kola, Tevfik Kosar and Miron Livny Miron Livny University of Wisconsin- University of Wisconsin- Madison Madison USA USA
33
Embed
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Profiling Grid Data Profiling Grid Data Transfer Protocols Transfer Protocols
and Serversand ServersGeorge Kola, Tevfik Kosar and George Kola, Tevfik Kosar and
Miron LivnyMiron Livny
University of Wisconsin-MadisonUniversity of Wisconsin-Madison
MotivationMotivation Grid enables large scale computationGrid enables large scale computation ProblemsProblems
Data intensive applications have Data intensive applications have suboptimal performancesuboptimal performance
Scaling up creates problemsScaling up creates problems Storage servers thrash and crashStorage servers thrash and crash
Users want to reduce failure rate and Users want to reduce failure rate and improve throughput improve throughput
4/33
Profiling Protocols and Profiling Protocols and ServersServers
Profiling is a first step Profiling is a first step Enables us to understand how time is spentEnables us to understand how time is spent Gives valuable insightsGives valuable insights HelpsHelps
computer architects add processor featurescomputer architects add processor features OS designers add OS featuresOS designers add OS features middleware developers to optimize the middleware developers to optimize the
Profiling SetupProfiling Setup Two server machines
Moderate server: 1660 MHzAthlon XP CPU with 512 MB RAM
Powerful server: dual Pentium 4 Xeon 2.4 GHz CPU with 1 GB RAM.
Client Machines were more powerful – Client Machines were more powerful – dual Xeonsdual Xeons To isolate server performanceTo isolate server performance
100 Mbps network connectivity Linux kernel 2.4.20, GridFTP server , GridFTP server
2.4.3 , NeST prerelease2.4.3 , NeST prerelease
8/33
GridFTP ProfileGridFTP Profile
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
Perc
en
tag
e o
f C
PU
Tim
e
Idle EthernetDriver
InterruptHandling
Libc Globus Oprofile IDE File I/O Rest ofKernel
Read From GridFTP Write To GridFTP
Read Rate = 6.45 MBPS, Write Rate = 7.83 MBPS
=>Writes to server faster than reads from it
9/33
GridFTP ProfileGridFTP Profile
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc Globus Oprofile IDE File I/O Rest ofKernel
Read From GridFTP Write To GridFTP
Writes to the network more expensive than Writes to the network more expensive than readsreads
=> Interrupt coalescing=> Interrupt coalescing
10/33
GridFTP ProfileGridFTP Profile
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc Globus Oprofile IDE File I/O Rest ofKernel
Read From GridFTP Write To GridFTP
IDE reads more expensive than writesIDE reads more expensive than writes
11/33
GridFTP ProfileGridFTP Profile
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc Globus Oprofile IDE File I/O Rest ofKernel
Read From GridFTP Write To GridFTP
File system writes costlier than readsFile system writes costlier than reads=> Need to allocate disk blocks=> Need to allocate disk blocks
12/33
GridFTP ProfileGridFTP Profile
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc Globus Oprofile IDE File I/O Rest ofKernel
Read From GridFTP Write To GridFTP
More overhead for writes because of higher More overhead for writes because of higher transfer ratetransfer rate
13/33
GridFTP Profile SummaryGridFTP Profile Summary Writes to the network more expensive than Writes to the network more expensive than
readsreads Interrupt coalescingInterrupt coalescing DMA would helpDMA would help
IDE reads more expensive than writesIDE reads more expensive than writes Tuning the disk elevator algorithm would helpTuning the disk elevator algorithm would help
Writing to file system is costlier than Writing to file system is costlier than readingreading Need to allocate disk blocksNeed to allocate disk blocks Larger block size would helpLarger block size would help
14/33
NeST ProfileNeST Profile
0.0
10.0
20.0
30.0
40.0
50.0
60.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc NeST Oprofile IDE File I/O Rest ofKernel
Read From NeST Write To NeST
Read Rate = 7.69 MBPS, Write Rate = 5.5 MBPS
15/33
NeST ProfileNeST Profile
0.0
10.0
20.0
30.0
40.0
50.0
60.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc NeST Oprofile IDE File I/O Rest ofKernel
Read From NeST Write To NeST
Similar trend as GridFTP
16/33
NeST ProfileNeST Profile
0.0
10.0
20.0
30.0
40.0
50.0
60.0
Perc
en
tag
e o
f C
PU
Tim
e
Idle EthernetDriver
InterruptHandling
Libc NeST Oprofile IDE File I/O Rest ofKernel
Read From NeST Write To NeST
More overhead for reads because of higher transfer rate
17/33
NeST ProfileNeST Profile
0.0
10.0
20.0
30.0
40.0
50.0
60.0
Pe
rce
nta
ge
of
CP
U T
ime
Idle EthernetDriver
InterruptHandling
Libc NeST Oprofile IDE File I/O Rest ofKernel
Read From NeST Write To NeST
Meta data updates (space allocation) makes NeST writes
more expensive
18/33
GridFTP versus NeSTGridFTP versus NeST GridFTP
Read Rate = 6.45 MBPS, write Rate = 7.83 MBPS
NeST Read Rate = 7.69 MBPS, write Rate = 5.5
MBPS GridFTP is 16% slower on reads
GridFTP I/O block size 1 MB (NeST 64 KB) Non-overlap of disk I/O & network I/O
NeST is 30% slower on writes Lots (space reservation/allocation)
19/33
Effect of Protocol Effect of Protocol ParametersParameters
Different tunable parametersDifferent tunable parameters I/O block sizeI/O block size TCP buffer sizeTCP buffer size Number of parallel streamsNumber of parallel streams Number of concurrent transfersNumber of concurrent transfers
20/33
Read Transfer RateRead Transfer Rate
21/33
Server CPU Load on ReadServer CPU Load on Read
22/33
Write Transfer RateWrite Transfer Rate
23/33
Server CPU Load on WriteServer CPU Load on Write
24/33
Transfer Rate and CPU Transfer Rate and CPU LoadLoad
25/33
Server CPU Load and L2 Server CPU Load and L2 DTLB missesDTLB misses
26/33
L2 DTLB MissesL2 DTLB Misses
Parallelism triggers the kernel to use larger page size
=> lower DTLB miss
27/33
Profiles on powerful serverProfiles on powerful server Next set of graphs were obtained using Next set of graphs were obtained using
the powerful serverthe powerful server
28/33
Parallel Streams versus Parallel Streams versus ConcurrencyConcurrency
29/33
Effect of File Size (Local Effect of File Size (Local Area)Area)
30/33
Transfer Rate versus Transfer Rate versus Parallelism in Short Latency Parallelism in Short Latency
(10 ms) Wide Area(10 ms) Wide Area
31/33
Server CPU UtilizationServer CPU Utilization
32/33
ConclusionConclusion Full system profile gives valuable insightsFull system profile gives valuable insights Larger I/O block size may lower transfer rateLarger I/O block size may lower transfer rate
Network, disk I/O not overlappedNetwork, disk I/O not overlapped Parallelism may reduce CPU loadParallelism may reduce CPU load
May cause kernel to use larger page sizeMay cause kernel to use larger page size Processor feature for variable sized pages would Processor feature for variable sized pages would
be usefulbe useful Operating system support for variable page size Operating system support for variable page size
would be usefulwould be useful Concurrency improves throughput at increased Concurrency improves throughput at increased