Oct 13, 2015
PerformanceAnalysisandSystemTuningLarryWoodman
DJohnShakshober
AgendaRedHatEnterpriseLinux(RHEL)PerformanceandTuning
Referencesvaluabletuningguides/books Part1MemoryManagement/FileSystemCaching Part2DiskandFileSystemIO
Part3PerformanceMonitoringTools
Part4PerformanceTuning/Analysis
Part5CaseStudies
LinuxPerformanceTuningReferences Alikins,?SystemTuningInfoforLinuxServers,
http://people.redhat.com/alikins/system_tuning.html Axboe,J.,?DeadlineIOSchedulerTunables,SuSE,EDFR&D,2003. Braswell,B,Ciliendo,E,?TuningRedHatEnterpriseLinuxon
IBMeServerxSeriesServers,http://www.ibm.com/redbooks Corbet,J.,?TheContinuingDevelopmentofIOScheduling?,
http://lwn.net/Articles/21274. Ezolt,P,OptimizingLinuxPerformance,www.hp.com/hpbooks,Mar
2005. Heger,D,Pratt,S,?WorkloadDependentPerformanceEvaluationofthe
Linux2.6IOSchedulers?,LinuxSymposium,Ottawa,Canada,July2004.
RedHatEnterpriseLinuxPerformanceTuningGuidehttp://people.redhat.com/dshaks/rhel3_perf_tuning.pdf
Network,NFSPerformancecoveredinseparatetalkshttp://nfs.sourceforge.net/nfshowto/performance.html
MemoryManagement PhysicalMemory(RAM)Management
NUMAversusUMA VirtualAddressSpaceMaps
32bit:x86up,smp,hugemem,1G/3Gvs4G/4G 64bit:x86_64,IA64
KernelWiredMemory StaticBoottime Slabcache Pagetables HughTLBfs
ReclaimableUserMemory Pagecache/Anonymoussplit
PageReclaimDynamics kswapd,bdflush/pdflush,kupdated
PhysicalMemory(RAM)Management PhysicalMemoryLayout NUMANodes
Zones mem_maparray Pagelists
Freelist Active Inactive
Memory Zones
Upto64GB(PAE)
HighmemZone
896MBor3968MB
NormalZone
16MBDMAZone0
EndofRAM
NormalZone
16MB(or4GB)
DMAZone
0
32bit 64bit
PerNUMANodeResources Memoryzones(DMA&Normalzones) CPUs IO/DMAcapacity Pagereclamationdaemon(kswapd#)
NUMA Nodes and Zones
EndofRAM
NormalZone
NormalZone
16MB(or4GB)DMAZone0
64bit
Node0
Node1
Memory Zone Utilization
DMA Normal Highmem(x86)
24bitI/O KernelStaticKernelDynamicslabcachebouncebuffersdriverallocationsUserOverflow
UserAnonymousPagecachePagetables
PerZoneResources mem_map Freelists Activeandinactivepagelists Pagereclamation Pagereclamationwatermarks
mem_map Kernelmaintainsapagestructforeach4KB(16KBonIA64)
pageofRAM Themem_maparrayconsumessignificantamountof
lowmematboottime. Pagestructsize:
RHEL332bit=60bytes RHEL364bit=112bytes RHEL432bit=32bytes RHEL464bit=56bytes
16GBx86runningRHEL3: 17179869184/4096*60=~250MBmem_maparray!!!
RHEL4mem_mapisonlyabout50%oftheRHEL3mem_map.
PerzoneFreelist/buddyallocatorlists
Kernelmaintainsperzonefreelist Buddyallocatorcoalescesfreepagesintolargerphysicallycontiguouspieces
DMA1*4kB4*8kB6*16kB4*32kB3*64kB1*128kB1*256kB1*512kB0*1024kB1*2048kB2*4096kB=11588kB)
Normal217*4kB207*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=3468kB)
HighMem847*4kB409*8kB17*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=7924kB)
Memoryallocationfailures Freelistexhaustion. Freelistfragmentation.
Perzonepagelists
Activemostrecentlyreferenced Anonymousstack,heap,bss Pagecachefilesystemdata
Inactiveleastrecentlyreferenced Dirtymodified Laundrywritebackinprogress Cleanreadytofree
Free Coalescedbuddyallocator
VirtualAddressSpaceMaps
32bit 3G/1Gaddressspace 4G/4Gaddressspace
64bit X86_64 IA64
Linux 32-bit Address Spaces
0GB3GB4GBRAM
VIRT
DMANormalHighMem
3G/1GKernel(SMP)
4G/4GKernel(Hugemem)User(s)VIRT
0GB3968MBKernel
DMANormal3968MBHighMem
Linux 64-bit Address Space
01TB(2^40)RAM
x86_64
VIRT
IA64
User Kernel
VIRT
RAM
0 1 2 3 4 5 6 7
MemoryPressure
32bit
64bit
DMA Normal Highmem
DMA Normal
KernelAllocationsUserAllocations
KernelandUserAllocations
KernelMemoryPressure StaticBoottime(DMAandNormalzones)
Kerneltext,data,BSS Bootmemallocator Tablesandhashes(mem_map)
Slabcache(Normalzone) Kerneldatastructs Inodecache,dentrycacheandbufferheaderdynamics
Pagetables(Highmem/Normalzone) 32bitversus64bit
HughTLBfs(Highmem/Normalzone) ie4Kpagew/4GBmemory=1MillionTLBentries 4Mpagew/4GBmemory=1000TLBentries
UserMemoryPressureAnonymous/pagecachesplit
pagecache anonymous
PagecacheAllocationsPageFaults
PageCache/Anonymousmemorysplit Pagecachememoryisglobalandgrowswhenfilesystemdataisaccessed
untilmemoryisexhausted. Pagecacheisfreed:
Underlyingfilesaredeleted. Unmountofthefilesystem. Kswapdreclaimspagecachepageswhenmemoryisexhausted.
Anonymousmemoryisprivateandgrowsonuserdemmand Allocationfollowedbypagefault. Swapin.
Anonymousmemoryisfreed: Processunmapsanonymousregionorexits. Kswapdreclaimsanonymouspages(swapout)whenmemoryis
exhausted Balancebetweenpagecacheandanonymousmemory.
Dynamic. Controlledvia/proc/sys/vm/pagecache.
32-bit Memory Reclamation
DMA Normal Highmem
KernelAllocationsUserAllocations
KernelReclamationUserReclamation(kswapd)(kswapd,bdflush/pdflush)slapcachereaping pageaging
inodecachepruningpagecacheshrinkingbufferheadfreeing swappingdentrycachepruning
64-bit Memory Reclamation
RAM
KernelandUserAllocations
KernelandUserReclamation
Anonymous/pagecachereclaiming
pagecache anonymous
PagecacheAllocationsPageFaults
kswapd(bdflush,kupdated) kswapdpagereclaim pagereclaim(swapout)deletionofafile unmapunmountfilesystem exit
Per Node/Zone Paging Dynamics
ACTIVEINACTIVE
(Dirty>Clean)FREE
UserAllocations
Reactivate
Pageaging Swapoutbdflush
Reclaiming
Userdeletions
Part2PerformanceMonitoringTools StandardUnixOStools
Monitoringcpu,memory,process,disk oprofile
KernelTools /proc,info(cpu,mem,slab),dmesg,AltSysrq Profilingnmi_watchdog=1,profile=2
Tracing(separatesummittalk) strace,ltrace dprobe,kprobe
3rdpartyprofiling/capacitymonitoring Perfmon,Caliper,vtune SARcheck,KDE,BEAPatrol,HPOpenview
RedHatTopTools CPUTools1top2vmstat3psaux4mpstatPall5saru6iostat7oprofile8gnomesystemmonitor9KDEmonitor10/proc
MemoryTools1top2vmstats3psaur4ipcs5sarrBW6free7oprofile8gnomesystemmonitor9KDEmonitor10/proc
ProcessTools1top2psopmem3gprof4strace,ltrace5sar DiskTools1iostatx2vmstatD3sarDEV#4nfsstat5NEEDMORE!
toppresshhelp,mmemory,tthreads,>columnsorttop09:01:04up8days,15:22,2users,loadaverage:1.71,0.39,0.12
Tasks:114total,1running,113sleeping,0stopped,0zombie
Cpu0:5.3%us,2.3%sy,0.0%ni,0.0%id,92.0%wa,0.0%hi,0.3%si
Cpu1:0.3%us,0.3%sy,0.0%ni,89.7%id,9.7%wa,0.0%hi,0.0%si
Mem:2053860ktotal,2036840kused,17020kfree,99556kbuffers
Swap:2031608ktotal,160kused,2031448kfree,417720kcached
PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND
27830oracle1601315m1.2g1.2gD1.360.90:00.09oracle
27802oracle1601315m1.2g1.2gD1.061.00:00.10oracle
27811oracle1601315m1.2g1.2gD1.060.80:00.08oracle
27827oracle1601315m1.2g1.2gD1.061.00:00.11oracle
27805oracle1701315m1.2g1.2gD0.761.00:00.10oracle
27828oracle1502758466484620S0.30.30:00.17tpcc.exe
1root1604744580480S0.00.00:00.50init
2rootRT0000S0.00.00:00.11migration/0
3root3419000S0.00.00:00.00ksoftirqd/0
vmstatofIOzonetoEXT3fs6GBmem#!depletememoryuntilpdflushturnson
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200448352420052423457600546315251303096
020169784020052429314400057850482108539941221463
3001537884200524384109200193589463243144307321842
02052812020052462281720047888810177133921322246
01046140200524671373600179110719144718251303535
22050972200524670574400232119698131619710253144
....
#!nowtransitionfromwritetoreads
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
14051040200524670554400213351912658390265618
1103506420052467127240040118911136720210354223
01068264234372664702000767445420484032072073
01034468234372667801600773913416202834091872
01047320234372669035600810507717832916072073
10038756234372669834400761364420273705191972
01031472234372670653200767253316012807081973
iostatxofsameIOzoneEXT3filesystemIostatmetrics
ratesperfsecsizesandresponsetimer|wrqm/srequestmerged/saverqszaveragerequestszr|wsec/s512bytesectors/savequszaveragequeueszr|wKB/sKilobyte/sawaitaveragewaittimemsr|w/soperations/ssvcmaveservicetimems
Linux2.4.2127.0.2.ELsmp(node1)05/09/2005
avgcpu:%user%nice%sys%iowait%idle
0.400.002.630.9196.06
Device:rrqm/swrqm/sr/sw/srsec/swsec/srkB/swkB/savgrqszavgquszawaitsvctm%util
sdi16164.600.00523.400.00133504.000.0066752.000.00255.071.001.911.8898.40
sdi17110.100.00553.900.00141312.000.0070656.000.00255.120.991.801.7898.40
sdi16153.500.00522.500.00133408.000.0066704.000.00255.330.981.881.8697.00
sdi17561.900.00568.100.00145040.000.0072520.000.00255.311.011.781.76100.00
SAR[root@localhostredhat]#saru33Linux2.4.2120.EL(localhost.localdomain)05/16/200510:32:28PMCPU%user%nice%system%idle10:32:31PMall0.000.000.00100.0010:32:34PMall1.330.000.3398.3310:32:37PMall1.340.000.0098.66Average:all0.890.000.1199.00
[root]sarnDEVLinux2.4.2120.EL(localhost.localdomain)03/16/200501:10:01PMIFACErxpck/stxpck/srxbyt/stxbyt/srxcmp/stxcmp/srxmcst/s01:20:00PMlo3.493.49306.16306.160.000.000.0001:20:00PMeth03.893.532395.34484.700.000.000.0001:20:00PMeth10.000.000.000.000.000.000.00
free/numastatmemoryallocation[root@localhostredhat]#freeltotalusedfreesharedbufferscachedMem:511368342336169032029712167408Low:511368342336169032000High:000000/+buffers/cache:145216366152Swap:104324001043240
numastat(on2cpux86_64basedsystem)node1node0numa_hit980333210905630numa_miss20490181609361numa_foreign16093612049018interleave_hit5868954749local_node977092710880901other_node20814231634090
ps,mpstat[root@localhostroot]#psaux
[root@localhostroot]#psaux|more
USERPID%CPU%MEMVSZRSSTTYSTATSTARTTIMECOMMAND
root10.10.11528516?S23:180:04init
root20.00.000?SW23:180:00[keventd]
root30.00.000?SW23:180:00[kapmd]
root40.00.000?SWN23:180:00[ksoftirqd/0]
root70.00.000?SW23:180:00[bdflush]
root50.00.000?SW23:180:00[kswapd]
root60.00.000?SW23:180:00[kscand]
[root@localhostredhat]#mpstat33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005
10:40:34PMCPU%user%nice%system%idleintr/s
10:40:37PMall3.000.000.0097.00193.67
10:40:40PMall1.330.000.0098.67208.00
10:40:43PMall1.670.000.0098.33196.00
Average:all2.000.000.0098.00199.22
pstree[root@dhcp8336proc]#pstreeinit atd
auditd
2*[automount]
bdflush
2*[bonoboactivati]
cannaserver
crond
cupsd
dhclient
eggcups
gconfd2
gdmbinary gdmbinary X
gnomesession sshagent
2*[gnomecalculato]
gnomepanel
gnomesettings
gnometerminal bash xchat
bash cscope bash cscope bash cscope bash cscope bash cscope bash
bash cscope bash cscope bash cscope bash cscope vi
gnomeptyhelpe
gnometerminal bash su bash pstree
bash cscope vi
gnomeptyhelpe
The/procfilesystem /proc
acpi bus irq net scsi sys tty pid#
32bit/proc//maps[root@dhcp8336proc]#cat5808/maps
0022e0000023b000rxp0000000003:034137068/lib/tls/libpthread0.60.so
0023b0000023c000rwp0000c00003:034137068/lib/tls/libpthread0.60.so
0023c0000023e000rwp0000000000:000
0037f00000391000rxp0000000003:03523285/lib/libnsl2.3.2.so
0039100000392000rwp0001100003:03523285/lib/libnsl2.3.2.so
0039200000394000rwp0000000000:000
00c4500000c5a000rxp0000000003:03523268/lib/ld2.3.2.so
00c5a00000c5b000rwp0001500003:03523268/lib/ld2.3.2.so
00e5c00000f8e000rxp0000000003:034137064/lib/tls/libc2.3.2.so
00f8e00000f91000rwp0013100003:034137064/lib/tls/libc2.3.2.so
00f9100000f94000rwp0000000000:000
080480000804f000rxp0000000003:031046791/sbin/ypbind
0804f00008050000rwp0000700003:031046791/sbin/ypbind
09794000097b5000rwp0000000000:000
b5fdd000b5fde000p0000000000:000
b5fde000b69de000rwp0000100000:000
b69de000b69df000p0000000000:000
b69df000b73df000rwp0000100000:000
b73df000b75df000rp0000000003:033270410/usr/lib/locale/localearchive
b75df000b75e1000rwp0000000000:000
bfff6000c0000000rwpffff800000:000
64bit/proc//maps#cat/proc/2345/maps004000000100b000rxp00000000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd30110b00001433000rwp00c0b000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd301433000014eb000rwxp0143300000:0004000000040001000p4000000000:0004000100040a01000rwxp4000100000:0002a95f730002a96073000p0012b000fd:00819273/lib64/tls/libc2.3.4.so2a960730002a96075000rp0012b000fd:00819273/lib64/tls/libc2.3.4.so2a960750002a96078000rwp0012d000fd:00819273/lib64/tls/libc2.3.4.so2a960780002a9607e000rwp2a9607800000:0002a9607e0002a98c3e000rws0000000000:06360450/SYSV0100401e(deleted)2a98c3e0002a98c47000rwp2a98c3e00000:0002a98c470002a98c51000rxp00000000fd:00819227/lib64/libnss_files2.3.4.so2a98c510002a98d51000p0000a000fd:00819227/lib64/libnss_files2.3.4.so2a98d510002a98d53000rwp0000a000fd:00819227/lib64/libnss_files2.3.4.so2a98d530002a98d57000rxp00000000fd:00819225/lib64/libnss_dns2.3.4.so2a98d570002a98e56000p00004000fd:00819225/lib64/libnss_dns2.3.4.so2a98e560002a98e58000rwp00003000fd:00819225/lib64/libnss_dns2.3.4.so2a98e580002a98e69000rxp00000000fd:00819237/lib64/libresolv2.3.4.so2a98e690002a98f69000p00011000fd:00819237/lib64/libresolv2.3.4.so2a98f690002a98f6b000rwp00011000fd:00819237/lib64/libresolv2.3.4.so2a98f6b0002a98f6d000rwp2a98f6b00000:00035c7e0000035c7e08000rxp00000000fd:00819469/lib64/libpam.so.0.7735c7e0800035c7f08000p00008000fd:00819469/lib64/libpam.so.0.7735c7f0800035c7f09000rwp00008000fd:00819469/lib64/libpam.so.0.7735c800000035c8011000rxp00000000fd:00819468/lib64/libaudit.so.0.0.035c801100035c8110000p00011000fd:00819468/lib64/libaudit.so.0.0.035c811000035c8118000rwp00010000fd:00819468/lib64/libaudit.so.0.0.035c900000035c900b000rxp00000000fd:00819457/lib64/libgcc_s3.4.420050721.so.135c900b00035c910a000p0000b000fd:00819457/lib64/libgcc_s3.4.420050721.so.135c910a00035c910b000rwp0000a000fd:00819457/lib64/libgcc_s3.4.420050721.so.17fbfff10007fc0000000rwxp7fbfff100000:000
/proc/meminfo#cat/proc/meminfo
MemTotal:514060kB
MemFree:23656kB
Buffers:53076kB
Cached:198344kB
SwapCached:0kB
Active:322964kB
Inactive:60620kB
HighTotal:0kB
HighFree:0kB
LowTotal:514060kB
LowFree:23656kB
SwapTotal:1044216kB
SwapFree:1044056kB
Dirty:40kB
Writeback:0kB
Mapped:168048kB
Slab:88956kB
Committed_AS:372800kB
PageTables:3876kB
VmallocTotal:499704kB
VmallocUsed:6848kB
VmallocChunk:491508kB
HugePages_Total:0
HugePages_Free:0
Hugepagesize:2048kB
/proc/slabinfoslabinfoversion:2.0
biovec128256260153652:tunables24128:slabdata52520
biovec6425626076851:tunables54278:slabdata52520
biovec16256270256151:tunables120608:slabdata18180
biovec425630564611:tunables120608:slabdata550
biovec159069385907188162261:tunables120608:slabdata26138261380
bio59069465907143128311:tunables120608:slabdata1905531905530
file_lock_cache712396411:tunables120608:slabdata330
sock_inode_cache296351271:tunables54278:slabdata990
skbuff_head_cache202540256151:tunables120608:slabdata36360
sock610384101:tunables54278:slabdata110
proc_inode_cache139209360111:tunables54278:slabdata19190
sigqueue227148271:tunables120608:slabdata110
idr_layer_cache82116136291:tunables120608:slabdata440
buffer_head6602713380052751:tunables120608:slabdata178417840
mm_struct447076851:tunables54278:slabdata14140
kmem_cache150150256151:tunables120608:slabdata10100
AltSysrqMRHEL3/UMASysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:2929min:0low:0high:0
Zone:Normalfreepages:1941min:510low:2235high:3225
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:4870(0HighMem)
(Active:72404/13523,inactive_laundry:2429,inactive_clean:1730,free:4870)
aa:0ac:0id:0il:0ic:0fr:2929
aa:46140ac:26264id:13523il:2429ic:1730fr:1941
aa:0ac:0id:0il:0ic:0fr:0
1*4kB4*8kB2*16kB2*32kB1*64kB2*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11716kB)
1255*4kB89*8kB5*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=7764kB)
Swapcache:add958119,delete918749,find4611302/5276354,race0+1
27234pagesofslabcache
244pagesofkernelstacks
1303lowmempagetables,0highmempagetables
0bouncebufferpages,0areontheemergencylist
Freeswap:598960kB
130933pagesofRAM
0pagesofHIGHMEM
3497reservedpages
34028pagesshared
39370pagesswapcached
AltSysrqMRHEL3/NUMASysRq:ShowMemoryMeminfo:Zone:DMAfreepages:0min:0low:0high:0Zone:Normalfreepages:369423min:1022low:6909high:9980Zone:HighMemfreepages:0min:0low:0high:0Zone:DMAfreepages:2557min:0low:0high:0Zone:Normalfreepages:494164min:1278low:9149high:13212Zone:HighMemfreepages:0min:0low:0high:0Freepages:866144(0HighMem)(Active:9690/714,inactive_laundry:764,inactive_clean:35,free:866144)aa:0ac:0id:0il:0ic:0fr:0aa:746ac:2811id:188il:220ic:0fr:369423aa:0ac:0id:0il:0ic:0fr:0aa:0ac:0id:0il:0ic:0fr:2557aa:1719ac:4414id:526il:544ic:35fr:494164aa:0ac:0id:0il:0ic:0fr:02497*4kB1575*8kB902*16kB515*32kB305*64kB166*128kB96*256kB56*512kB39*1024kB30*2048kB300*4096kB=1477692kB)Swapcache:add288168,delete285993,find726/2075,race0+04059pagesofslabcache146pagesofkernelstacks388lowmempagetables,638highmempagetablesFreeswap:1947848kB917496pagesofRAM869386freepages30921reservedpages21927pagesshared2175pagesswapcachedBuffermemory:9752kBCachememory:34192kBCLEAN:696buffers,2772kbyte,51used(last=696),0locked,0dirty0delayDIRTY:4buffers,16kbyte,4used(last=4),0locked,3dirty0delay
AltSysrqMRHEL4/UMA
SysRq:ShowMemory
Meminfo:
Freepages:20128kB(0kBHighMem)
Active:72109inactive:27657dirty:1writeback:0unstable:0free:5032slab:19306mapped:41755pagetables:945
DMAfree:12640kBmin:20kBlow:40kBhigh:60kBactive:0kBinactive:0kBpresent:16384kBpages_scanned:847all_unreclaimable?yes
protections[]:000
Normalfree:7488kBmin:688kBlow:1376kBhigh:2064kBactive:288436kBinactive:110628kBpresent:507348kBpages_scanned:0all_unreclaimable?no
protections[]:000
HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?no
protections[]:000
DMA:4*4kB4*8kB3*16kB4*32kB4*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=12640kB
Normal:1052*4kB240*8kB39*16kB3*32kB0*64kB1*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB=7488kBHighMem:empty
Swapcache:add52,delete52,find3/5,race0+0
Freeswap:1044056kB
130933pagesofRAM
0pagesofHIGHMEM
2499reservedpages
71122pagesshared
0pagesswapcached
AltSysrqMRHEL4/NUMA
Freepages:16724kB(0kBHighMem)Active:236461inactive:254776dirty:11writeback:0unstable:0free:4181slab:13679mapped:34073pagetables:853Node1DMAfree:0kBmin:0kBlow:0kBhigh:0kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?noprotections[]:000Node1Normalfree:2784kBmin:1016kBlow:2032kBhigh:3048kBactive:477596kBinactive:508444kBpresent:1048548kBpages_scanned:0all_unreclaimable?noprotections[]:000Node1HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?noprotections[]:000Node0DMAfree:11956kBmin:12kBlow:24kBhigh:36kBactive:0kBinactive:0kBpresent:16384kBpages_scanned:1050all_unreclaimable?yesprotections[]:000Node0Normalfree:1984kBmin:1000kBlow:2000kBhigh:3000kBactive:468248kBinactive:510660kBpresent:1032188kBpages_scanned:0all_unreclaimable?noprotections[]:000Node0HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?noprotections[]:000Node1DMA:emptyNode1Normal:0*4kB0*8kB30*16kB10*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=2784kBNode1HighMem:emptyNode0DMA:5*4kB4*8kB4*16kB2*32kB2*64kB3*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11956kBNode0Normal:0*4kB0*8kB0*16kB0*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=1984kBNode0HighMem:emptySwapcache:add44,delete44,find0/0,race0+0Freeswap:2031432kB524280pagesofRAM10951reservedpages363446pagesshared0pagesswapcached
AltSysrqTbashRcurrent016091606
(NOTLB)
CallTrace:[]snprintf[kernel]0x27(0xdb3c5e90)
[]call_console_drivers[kernel]0x63(0xdb3c5eb4)
[]printk[kernel]0x153(0xdb3c5eec)
[]printk[kernel]0x153(0xdb3c5f00)
[]show_trace[kernel]0xd9(0xdb3c5f0c)
[]show_trace[kernel]0xd9(0xdb3c5f14)
[]show_state[kernel]0x62(0xdb3c5f24)
[]__handle_sysrq_nolock[kernel]0x7a(0xdb3c5f38)
[]handle_sysrq[kernel]0x5d(0xdb3c5f58)
[]write_sysrq_trigger[kernel]0x53(0xdb3c5f7c)
[]sys_write[kernel]0x97(0xdb3c5f94)
*thiscangetBIGloggedin/var/log/messages
Kernelprofiling1.Enablekernelprofiling.
Onthekernelbootlineaddprofile=2nmi_watchdog=1i.e.kernel/vmlinuz2.6.928.EL.smproprofile=2nmi_watchdog=1root=0805
thenreboot.2.Createaandrunashellscriptcontainingthefollowinglines:
#!/bin/shwhile/bin/true;doecho;date/usr/sbin/readprofilev|sortnr+2|head15/usr/sbin/readprofilersleep5done
Kernelprofiling
[root]tiobench]#morerhel4_read_64k_prof.logFriJan2808:59:19EST20050000000000000000total2394230.1291ffffffff8010e3a0do_arch_prctl238564213.0036ffffffff80130540del_timer950.5398ffffffff80115940read_ldt500.6250ffffffff8015d21c.text.lock.shmem440.1048ffffffff8023e480md_do_sync400.0329ffffffff801202f0scheduler_tick380.0279ffffffff80191cf0dma_read_proc300.2679ffffffff801633b0get_unused_buffer_head250.0919ffffffff801565d0rw_swap_page_nolock250.0822ffffffff8023d850status_unused240.1500ffffffff80153450scan_active_list240.0106ffffffff801590a0try_to_unuse230.0288ffffffff80192070read_profile220.0809ffffffff80191f80swaps_read_proc180.1607Linux2.6.95.ELsmp(perf1.lab.boston.redhat.com)01/28/2005
/usr/sbin/readprofilev|sortnr+2|head15
oprofilebuiltintoRHEL4(smp)
opcontrolon/offdata startstartcollection stopstopcollection dumpoutputtodisk event=:name:count
Example:#opcontrolstart#/bin/timetest1sleep60#opcontrolstop#opcontroldump
opreportanalyzeprofile rreverseordersort t[percentage]thesholdtoview
f/path/filename ddetails
opannotate s/path/source a/path/assembly
oprofileopcontrolandopreportcpu_cycles#vmlinux2.6.9prepCPU:Itanium2,speed1300MHz(estimated)CountedCPU_CYCLESevents(CPUCycles)withaunitmaskof0x00(Nounitmask)count100000samples%imagenameappnamesymbolname909368968.9674vmlinuxvmlinuxdefault_idle9698857.3557vmlinuxreread_spin_unlock_irq7444455.6459vmlinuxreread_spin_unlock_irqrestore4201033.1861vmlinuxvmlinux_spin_unlock_irqrestore1464131.1104vmlinuxreread__blockdev_direct_IO749180.5682vmlinuxvmlinux_spin_unlock_irq652130.4946vmlinuxrereadkmem_cache_alloc594530.4509vmlinuxvmlinuxdio_bio_complete586360.4447vmlinuxrereadmempool_alloc566750.4298scsi_mod.korereadscsi_decide_disposition539650.4093vmlinuxrereaddio_bio_complete530790.4026vmlinuxrereadbio_check_pages_dirty530350.4022vmlinuxvmlinuxbio_check_pages_dirty474300.3597vmlinuxvmlinux__end_that_request_first472630.3584vmlinuxrereadget_request433830.3290vmlinuxreread__end_that_request_first402510.3053qla2xxx.korereadqla2xxx_get_port_name359190.2724scsi_mod.koreread__scsi_device_lookup355640.2697vmlinuxrereadaio_read_evt328300.2490vmlinuxrereadkmem_cache_free327380.2483scsi_mod.koscsi_modscsi_remove_host
Red Hat Confidential
Open source project http://oprofile.sourceforge.net
Upstream; Red Hat contributes Originally modeled after DEC Continuous
Profiling Infrastructure (DCPI) System-wide profiler (both kernel and
user code) Sample-based profiler with SMP machine
support Performance monitoring hardware support Relatively low overhead, typically
Red Hat Confidential
Profiling Tools: SystemTap Open Source project (started 01/05)
Collaboration between Red Hat, Intel, and IBM
Linux answer to Solaris DTrace
A tool to take a deeper look into a running system:
Provides insight into system operation Assists in identifying causes of
performance problems Simplifies building instrumentation
Current snapshots available from: http://sources.redhat.com/systemtap
Scheduled for inclusion Red Hat Enterprise Linux Update 2 (Fall 2005) X86, X86-64, PPC64, Itanium2
probescript
probesetlibrary
probekernelobject
probeoutput
parse
elaborate
translatetoC,compile*
loadmodule,startprobe
extractoutput,unload
*SolarisDtraceisinterpretive
HowtotuneLinux Capacitytuning
Fixedbyaddingresources CPU,memory,disk,network
PerformanceTuning Methodology
1)Documentconfig2)Baselineresults3)Whileresultsnonoptimal
a)Monitor/Instrumentsystem/workloadb)Applytuning1changeatatimec)Analyzeresults,exitorloop
4)Documentfinalconfig
Part3GeneralSystemTuning
/proc
[root@hairballfs]#cat/proc/sys/kernel/sysrq
0
[root@hairballfs]#echo1>/proc/sys/kernel/sysrq
[root@hairballfs]#cat/proc/sys/kernel/sysrq
1 Sysctlcommand
[root@hairballfs]#sysctlkernel.sysrq
kernel.sysrq=0
[root@hairballfs]#sysctlwkernel.sysrq=1
kernel.sysrq=1
[root@hairballfs]#sysctlkernel.sysrq
kernel.sysrq=1 Editthe/etc/sysctl.conffile
#KernelsysctlconfigurationfileforRedHatLinux
#ControlstheSystemRequestdebuggingfunctionalityofthekernel
kernel.sysrq=1 Usegraphicaltool/usr/bin/redhatconfigproc
Tuninghowtosetkernelparameters
Memory /proc/sys/vm/overcommit_memory /proc/sys/vm/overcommit_ratio /proc/sys/vm/max_map_count /proc/sys/vm/nr_hugepages
Kernel /proc/sys/kernel/msgmax /proc/sys/kernel/msgmnb /proc/sys/kernel/msgmni /proc/sys/kernel/shmall /proc/sys/kernel/shmmax /proc/sys/kernel/shmmni /proc/sys/kernel/threadsmax
Filesystems /proc/sys/fs/aio_max_nr /proc/sys/fs/file_max
CapacityTuning
OOMkillsswapspaceexhaustionMeminfo:
Zone:DMAfreepages:975min:1039low:1071high:1103
Zone:Normalfreepages:126min:255low:1950high:2925
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:1101(0HighMem)
(Active:118821/401,inactive_laundry:0,inactive_clean:0,free:1101)
aa:1938ac:18id:44il:0ic:0fr:974
aa:115717ac:1148id:357il:0ic:0fr:126
aa:0ac:0id:0il:0ic:0fr:0
6*4kB0*8kB0*16kB1*32kB0*64kB0*128kB1*256kB1*512kB1*1024kB1*2048kB0*4096kB=3896kB)
0*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB0*512kB0*1024kB0*2048kB0*4096kB=504kB)
Swapcache:add620870,delete620870,find762437/910181,race0+200
2454pagesofslabcache
484pagesofkernelstacks
2008lowmempagetables,0highmempagetables
Freeswap:0kB
129008pagesofRAM
0pagesofHIGHMEM
3045reservedpages
4009pagesshared
0pagesswapcached
OOMkillslowmemconsumptionMeminfo:
Zone:DMAfreepages:2029min:0low:0high:0
Zone:Normalfreepages:1249min:1279low:4544high:6304
Zone:HighMemfreepages:746min:255low:29184high:43776
Freepages:4024(746HighMem)
(Active:703448/665000,inactive_laundry:99878,inactive_clean:99730,free:4024)
aa:0ac:0id:0il:0ic:0fr:2029
aa:128ac:3346id:113il:240ic:0fr:1249
aa:545577ac:154397id:664813il:99713ic:99730fr:746
1*4kB0*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB1*4096kB=8116kB)
543*4kB35*8kB77*16kB1*32kB0*64kB0*128kB1*256kB0*512kB1*1024kB0*2048kB0*4096kB=4996kB)
490*4kB2*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=2984kB)
Swapcache:add4327,delete4173,find190/1057,race0+0
178558pagesofslabcache
1078pagesofkernelstacks
0lowmempagetables,233961highmempagetables
Freeswap:8189016kB
2097152pagesofRAM
1801952pagesofHIGHMEM
103982reservedpages
115582774pagesshared
154pagesswapcached
OutofMemory:Killedprocess27100(oracle).
/proc/sys/vm/bdflush /proc/sys/vm/pagecache /proc/sys/vm/inactive_clean_percent /proc/sys/vm/pagecluster /proc/sys/vm/kscand_work_percent Swapdevicelocation Kernelselection
X86smp X86Hughmem
x86_64numa
PerformanceTuningVM(RHEL3)
intnfract;/*Percentageofbuffercachedirtytoactivatebdflush*/
intndirty;/*Maximumnumberofdirtyblockstowriteoutperwakecycle*/
intdummy2;/*old"nrefill"*/
intdummy3;/*unused*/
intinterval;/*jiffiesdelaybetweenkupdateflushes*/
intage_buffer;/*Timefornormalbuffertoagebeforeweflushit*/
intnfract_sync;/*Percentageofbuffercachedirtytoactivatebdflushsynchronously
intnfract_stop_bdflush;/*Percetangeofbuffercachedirtytostopbdflush*/
intdummy5;/*unused*/
Example:
SettingsforServerwithampleIOconfig(defaultr3gearedforws)
sysctlwvm.bdflush=505000002005000300060200
RHEL3/proc/sys/vm/bdflush
pagecache.minpercent Lowerlimitforpagecachepagereclaiming. Kswapdwillstopreclaimingpagecachepagesbelowthis
percentofRAM. pagecache.borrowpercnet
KswapdattemptstokeepthepagecacheatthispercentorRAM pagecache.maxpercent
Upperlimitforpagecachepagereclaiming. RHEL2.1hardlimit,pagecachewillnotgrowabovethispercent
ofRAM. RHEL3kswapdonlyreclaimspagecachepagesabovethis
percentofRAM. IncreasingmaxpercentwillincreaseswappingExample:echo11050>/proc/sys/vm/pagecache
RHEL3/proc/sys/vm/pagecache
/proc/sys/vm/swappiness /proc/sys/vm/dirty_ratio /proc/sys/vm/dirty_background_ratio /proc/sys/vm/vfs_cache_pressure /proc/sys/vm/lower_zone_protection Swapdevicelocation Kernelselection
X86smp X86Hughmem
x86_64numa
PerformanceTuningVM(RHEL4)
Zone:DMAfreepages:2207min:0low:0high:0
Zone:Normalfreepages:484min:1279low:4544high:6304
Zone:HighMemfreepages:266min:255low:61952high:92928
Freepages:2957(266HighMem)
(Active:245828/1297300,inactive_laundry:194673,inactive_clean:194668,free:2957)
aa:0ac:0id:0il:0ic:0fr:2207
aa:630ac:1009id:189il:233ic:0fr:484
aa:195237ac:48952id:1297057il:194493ic:194668fr:266
1*4kB1*8kB1*16kB1*32kB1*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB2*4096kB=8828kB)
48*4kB8*8kB97*16kB4*32kB0*64kB0*128kB0*256kB0*512kB0*1024kB0*2048kB0*4096kB=1936kB)
12*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=1064kB)
Swapcache:add3838024,delete3808901,find107105/1540587,race0+2
138138pagesofslabcache
1100pagesofkernelstacks
0lowmempagetables,37046highmempagetables
Freeswap:3986092kB
4194304pagesofRAM
3833824pagesofHIGHMEM
kernelselection(16GBx86runningSMP)
aa:0ac:0id:0il:0ic:0fr:0
aa:901913ac:1558id:61553il:11534ic:6896fr:10539
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:0
aa:867678ac:879id:100296il:19880ic:10183fr:17178
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:0
aa:869084ac:1449id:100926il:18792ic:11396fr:14445
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:2617
aa:769ac:2295id:256il:2ic:825fr:861136
aa:0ac:0id:0il:0ic:0fr:0
Swapcache:add2633120,delete2553093
x86_64numa
Red Hat Confidential
Socket 1Thread 0 Thread 1
CPU Scheduler Recognizes differences between
logical and physical processors I.E. Multi-core, hyperthreaded
& chips/sockets Optimizes process scheduling
to take advantage of shared on-chip cache, and NUMA memory nodes
Implements multilevel run queuesfor sockets and cores (asopposed to one run queueper processor or per system) Strong CPU affinity avoids
task bouncing Requires system BIOS to report
CPU topology correctly
Socket 2
Process
Process
Process
Process
Process
Process
Process
Process
Process
Process Process
Process
Scheduler Compute Queues
Socket 0Core 0
Thread 0 Thread 1
Core 1Thread 0 Thread 1
Red Hat Confidential
Red Hat Enterprise Linux 4 provides improved NUMA support over version3 Goal to locate application pages in low latency memory (local to CPU) AMD64, Itanium2 Enabled by default (or boot command line NUMA=[on,off]) Numactl to setup NUMA behavior Used by latest TPC/H benchmark (>5% gain)
NUMA considerations
1 2 4 80
1000
2000
3000
4000
5000
6000
7000
8000
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
RHEL4U2HPL5854dualcoreAMD64McCalpinStreamCopyb(x)=a(x)
Copynuma=offCopynuma=on%gainnuma/nonnuma
DiskIO iostacklunlimits
RHEL3255inSCSIstack RHEL42**20,18kusefulwithFiberChannel
/proc/scsituning quedepthtuningperlun
editR3/etc/modules.conf,R4modprob IRQdistributiondefault,smpaffinitymask
echo03>/proc/irq//smp_affinity scalability
Lunstestedupto64luns [email protected]/s,74kIO/sec Nodestestedupto20nodesw/DIO
Red Hat Confidential
Asynchronous I/O to File Systems Allows application to continue processing while
I/O is in progress Eliminates Synchronous I/O stall Critical for I/O intensive server applications
Red Hat Enterprise Linux feature since 2002 Support for RAW devices only
With Red Hat Enterprise Linux 4, significant improvement: Support for Ext3, NFS, GFS file system
access
Supports Direct I/O (e.g. Database applications)
Makes benchmark results more appropriate for real-world comparisons I/O
Completion
Application
DeviceDriver
I/O RequestIssue
I/O RequestCompletion
I/O
No stall forcompletion
Asynchronous I/O
App I/ORequest
Application
DeviceDriver
I/O RequestIssue
I/O RequestCompletion
I/O
Stall forcompletion
Synchronous I/O
App I/ORequest
Red Hat Confidential
1 2 4 8 16 32 640
20
40
60
80
100
120
140
160
R4 U2 FC AIO Read
4k8k16k32k64k
aios
MB
/sec
1 2 4 8 16 32 640
20
40
60
80
100
120
140
160
180
R4 U2 FC AIO Write Perf
4k8k16k32k64k
aios
MB
/sec
[root@dhcp8336sysctl]#/sbin/elvtune/dev/hda
/dev/hdaelevatorID0
read_latency:2048
write_latency:8192
max_bomb_segments:6
[root@dhcp8336sysctl]#/sbin/elvtuner1024w2048/dev/hda
/dev/hdaelevatorID0
read_latency:1024
write_latency:2048
max_bomb_segments:6
PerformanceTuningDISKRHEL3
DiskIOtuningRHEL4 RHEL44tunableI/OSchedulers
CFQelevator=cfq.CompletelyFairQueuingdefault,balanced,fairformultipleluns,adaptors,smpservers
NOOPelevator=noop.Nooperationinkernel,simple,lowcpuoverhead,leaveopttoramdisk,raidcntrletc.
Deadlineelevator=deadline.Optimizeforruntimelikebehavior,lowlatencyperIO,balanceissueswithlargeIOluns/controllers
Anticipatoryelevator=as.InsertsdelaystohelpstackaggregateIO,bestonsystemw/limitedphysicalIOSATA
Setatboottimeoncommandline
FileSystems Separateswapandbusypartitionsetc. EXT2/EXT3separatetalk
http://www.redhat.com/support/wpapers/redhat/ext3/*.html Tune2fsormountoptions
data=orderedonlymetadatajournaled data=journalbothmetadataanddatajournaled data=writebackusewithcare! SetupdefaultblocksizeatmkfsbXX
RHEL4EXT3improvesperformance Scalabilityupto5Mfile/system Sequentialwritebyusingblockreservations Increasefilesystemupto8TB
GFSglobalfilesystemclusterfilesystem
Part4RHEL3vsRHEL4PerformanceCaseStudy
SchedulerO(1)taskset IOzoneRHEL3/4
EXT3 GFS NFS
OLTPOracle10G o_direct,asyncIO,hughmem/page RHELIOelevators
IOzone Benchmark http://www.iozone.org/ IOzone is a filesystem benchmark tool. The benchmark tests file I/O performance for
the following operations: Write, re-write, random write Read, re-read, random read, read backwards, read
strided, pread Fread, fwrite, mmap, aio_read, aio_write
IOzone Sample Output
1024 2048 4096 8192 16384 32768 655368
32
128512
0
10000
20000
30000
40000
50000
60000
Rhel4 Ext 3 Seq Write 100 MB
5000060000
4000050000
3000040000
2000030000
1000020000
010000
Transfersize(bytes)Filesize(k)
BandwidthKB/sec
Understanding IOzone Results GeoMean per category are
statistically meaningful. Understand HW setup
Disk, RAID, HBA, PCI Layout file systems
LVM or MD devices Partions w/ fdisk
Baseline raw IO DD/DT EXT3 perf w/ IOzone
In-cache file sizes which fit goal -> 90% memory BW.
Out-of-cache file sizes more tan 2x memory size
O_DIRECT 95% of raw Global File System GFS goal
--> 90-95% of local EXT3
Use raw command fdisk /dev/sdX raw /dev/raw/rawX /dev/sdX1 dd if=/dev/raw/rawX bs=64k
Mount file system mkfs t ext3 /dev/sdX1 Mount t ext3 /dev/sdX1 /perf1
IOzone commands Iozone a f /perf1/t1 (incache) Iozone a -I f /perf1/t1 (w/ dio) Iozone s 2xmem f /perf1/t1 (big)
NFS vs EXT3 Comparison
IOzone cached R4 U2 EXT3 vs NFSGeoMean 1mb-4gb files, 1k-1m transfers
0
500000
1000000
1500000
2000000
Fwrite Re-fwrite Fread Re-fread OverallGeoMean
0.0%20.0%40.0%60.0%80.0%100.0%120.0%
R4_U2 EXT3
R4_U2_NFS
%Diff
Red Hat ConfidentialRed Hat Confidential
GFS vs EXT3 Iozone ComparisonIOzone cached R4 U2 EXT3 vs GFS
GeoMean 1mb-4gb files, 1k-1m transfers
0
500000
1000000
1500000
2000000
Fwrite Re-fwrite Fread Re-fread OverallGeoMean
88.0%
90.0%
92.0%
94.0%
96.0%
98.0%
R4_U2
R4_U2_GFS
%Diff
Red Hat Confidential
UsingIOzonew/o_directmimicdatabase Problem:
Filesystemsusememoryforfilecache Databasesusememoryfordatabasecache Userswantfilesystemformanagementoutsidedatabaseaccess(copy,backupetc)
YouDON'TwantBOTHtocache. Solution:
FilesystemsthatsupportDirectIO Openfileswitho_directoption DatabaseswhichsupportDirectIO(ORACLE) NODOUBLECACHING!
NFSvsEXT3DIOIozoneComparison
IOzone (DIO) R4 U2 EXT3 vs NFSGeoMean 1mb-4gb files, 1k-1m transfers
0100002000030000400005000060000700008000090000
100000
Writ
er
Re-w
riter
Read
er
Re-re
ader
Rand
omRe
adRa
ndom
Writ
eBa
ckwa
rdRe
adRe
cord
Rewr
iteSt
ride
Read
Over
allGe
oMea
n
0.0%
10.0%20.0%
30.0%40.0%
50.0%60.0%
70.0%
R4_U2 EXT3
R4_U2_NFS
%Diff
GFSGlobalClusterFileSystem
GFSseparatesummittalk V6.0shippinginRHEL3 V6.1shipsw/RHEL4U1
HintatGFSPerformanceinRHEL3 Datafromdifferentserver/setup
HPAMD644cpu,2.4Ghz,8GBmemory 1QLA2300FiberChannel,1EVA5000
ComparedGFSiozonetoEXT3
GFSvsEXT3DIOIozoneComparison
IOzone (DIO) R4 U2 EXT3 vs GFSGeoMean 1mb-4gb files, 1k-1m transfers
0
20000
40000
60000
80000
100000
120000
Writ
er
Re-w
riter
Read
er
Re-re
ader
Rand
omRe
adRa
ndom
Writ
eBa
ckw
ard
Read
Reco
rdRe
writ
eSt
ride
Read
Ove
rall
Geo
Mea
n
85.0%
90.0%
95.0%
100.0%
105.0%
110.0%
115.0%
R4_U2
R4_U2_GFS
%Diff
EvaluatingOraclePerformance UseOLTPworkloadbasedonTPCC ResultswithvariousOracleTuningoptions
RAWvsEXT3w/o_direct(iedirectIOiniozone) ASYNCIOoptionsw/Oracle,supportedinRHEL4/EXT3 HUGHMEMkernelsonx86kernels
ResultscomparingRHEL4IOschedulers CFQ DEADLINE NOOP AS RHEL3baseline
Oracle10GOLTPext3,gfs/nfssync/aio/dio
AIOinOracle10Gcd$ORACLE_HOME/rdbms/libmakefins_rdbms.mkasync_onmakefins_rdbms.mkioracle
Addtoinit.ora(usuallyin$ORACLE_HOME/dbs)disk_synch_io=true#forrawfilesystemio_options=asynchfilesystemio_options=directiofilesystemio_options=setall
Oracle OLTP Filesystem Performance
OLTPsyncio OLTPdio OLTPaio OLTPaio+dio
0100020003000400050006000700080009000
10000RHEL4U2Oracle10GOLTPPerformancewithdifferentfilesystems
EXT3NFSGFS
Tran
s/MinuteTP
M
DiskIOelevators
R3generalpurposeI/Oelevatorsparameters R44tunableI/Oelevators
CFQCompletelyFairQueuing NOOPNooperationinkernel DeadlineOptimizeforruntime AnticipatoryOptimizeforinteractiveresponse
2Oracle10Gworkloads OLTP4krandom50%R/50%W DSS32k256ksequentialRead
Red Hat Confidential
As
Noop
Rhel3
Deadline
CFQ
0.0% 25.0% 50.0% 75.0% 100.0% 125.0%
100.0%
87.2%
84.1%
77.7%
28.4%
100.0%
108.9%
84.8%
75.9%
23.2%
RHEL4IOschedulesvsRHEL3forDatabaseOracle10Goltp/dss(relativeperformance)
%tran/min%queries/hour
Red Hat Confidential
The Translation Lookaside Buffer (TLB) is a small CPU cache of recently used virtual to physical address mappings
TLB misses are extremely expensive on today's very fast, pipelined CPUs
Large memory applicationscan incur high TLB miss rates
HugeTLBs permit memory to bemanaged in very large segments
E.G. Itanium: Standard page: 16KB Default huge page: 256MB 16000:1 difference
File system mapping interface
Ideal for databases
E.G. TLB can fully map a 32GBOracle SGA
PhysicalMemory
VirtualAddressSpace
TLB
128data128instruction
HugeTLBFS
hugemem kernel(4G4G)0
5000
10000
15000
20000
25000
30000
35000
40000
RHEL3U6 with Oracle 10g TPC-C results comparing performance of the Hugemem kernel with and without Hugepages enabled
EXT3 No Hugepages
RAW No Hugepages
EXT3 With Hugepages
RAW With Hugepages
tpm
C
Testsperformedona2XeonEM64TcpuHTsystemwith6GRAMand14spindlesusingmdadmraid0
LinuxPerformanceTuningSummary
LinuxPerformanceMonitoringTools *stat,/proc/*,top,sar,ps,oprofile Determinecacacityvstunableperformanceissue TuneOSparmetersandrepeat
RHEL4vsRHEL3PerfComparison RHEL4vsRHEL3
haveityourwayIOwith4IOschedulers EXT3improvedblockreservationsupto3x! GFSwithin95%ofEXT3,NFSimproveswithEXT3 Oraclew/FSo_direct,aio,hughpages95%ofraw
Questions?
top2streamsrunningon2dualcoreAMDcpus
1)sometimesschedulerchoosescpupaironmemoryinterfacedependingonosstate
Tasks:101total,3running,96sleeping,0stopped,0zombie
Cpu0:0.0%us,0.0%sy,0.0%ni,100.0%id,0.0%wa,0.0%hi,0.0%si
Cpu1:0.1%us,0.1%sy,0.0%ni,100.0%id,0.0%wa,0.0%hi,0.0%si
Cpu2:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
Cpu3:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
2)schedulerw/tasksetccpu#./stream,roundrobinodd,thenevencpus
Tasks:101total,2running,96sleeping,0stopped,0zombie
Cpu0:0.0%us,0.0%sy,0.0%ni,100.0%id,0.0%wa,0.0%hi,0.0%si
Cpu1:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
Cpu2:0.0%us,0.3%sy,0.0%ni,99.7%id,0.0%wa,0.0%hi,0.0%si
Cpu3:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
McCalpinStreamon2cpudualcore,4CPUbindingviataskset
1 2 40
1000
2000
3000
4000
5000
6000
7000
RHEL4U12cpu,dualcoreAMD64McCalpinStreamCopyb(x)=a(x)
CopyCopyw/Aff
NumberofCPUs
Band
widthinM
B/se
c
1 2 40
1000
2000
3000
4000
5000
6000
7000
RHEL4U12cpu,dualcoreAMD64McCalpinStreamTriadc(x)=a(x)+b(x).c(x)
TriadsmpTriadw/Aff
NumberofCPUs
Band
widthinM
B/se
c