This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Session Number: W02Session Number: W02Tuning the Java Virtual Machine for Optimal Performance: Tuning the Java Virtual Machine for Optimal Performance:
Means and MethodsMeans and Methods
Rajeev Rajeev PalankiPalanki
IBM Java Technology CenterIBM Java Technology Center
• “The purpose of Garbage Collection is to identify Java Heap storage which is no longer being used by the Java application and so is available for reuse by the JVM”
•Key questions:
•Performance and Scalability: How quickly can you find garbage?
•Accuracy: Can you find all the garbage?
•Effectiveness: How much space can you make available?
Garbage Garbage CollectionCollection: : IBM TechnologyIBM Technology• Concurrent mark
Most of the marking phase is done outside of ‘Stop the World’ when the ‘mutator’ threads are still active giving a significant improvement in pause time.
• Parallelizing the garbage Collection phases
The Mark and sweep workload is distributed across available processors resulting in a significant improvement in pause times
• Adaptive sizing of thread local heaps
Reduces the amount of Java Heap locking
• Incremental compaction
The expense of compaction is distributed across GCs leading to a reduction in (an occasional) long pause time.
• Java 5 technologies
Lazy sweep, parallel compaction, generational and accurate collection
– Cache Allocation (for object allocations < 512 bytes), does not require Heap Lock. Each thread has local storage in the heap (TLH – Thread Local Heap) where the objects are allocated.
– Heap Lock Allocation (Heap Allocation occurs when the allocation request is more than 512 bytes, requires Heap Lock.
If size is less than 512 or enough space in the cachetry cacheAllocreturn if OK
HEAP_LOCKdo forever
If there is a big enough chunk on freelistTake itgoto Gotit
• All objects => 64K are termed “large” from the VM perspective
• In practice, objects of 10MB+ in size are usually considered large
• The Large Object Area is 5% of the active heap by default.
• Any object is first tried to be allocated in the free list of the main heap – if there is not enough “contiguous” space in the main heap to satisfy the allocation request for object => 64K, then it is allocated in the Large Object Area (wilderness)
• Objects < 64K can only be allocated in the main heap and never in the Large Object Area
Active heap
LOA
Xmx
Users can identify the Java stack of a thread making an allocation request of larger than the value specified with the environment variable ALLOCATION_THRESHOLD
export ALLOCATION_THRESHOLD =5400This will give java stacks for object allocations of created than 5400 bytes.
Users can specify the desired % of the Large Object Area using the Xloration option (where n determines the fraction of heap designated for LOA.
(For example: Xloratio0.3 reserves 30% of the active heap for the Large Object Area)
• Subpools provide an improved policy of object allocation and is available from JDK 1.4.1 releases only on AIX.
– Improved time for allocating objects– Avoid premature GCs due to allocation of large objects– Improve MP scalability by reducing time under HEAP_LOCK – Optimize TLH sizes and storage utilization
• The subpool algorithm uses multiple free lists rather than the single free list used by the default allocation scheme.
• It tries to predict the size of future allocation requests based on earlier allocation requests. It recreates free lists at the end of each GC based on these predictions.
• While allocating objects on the heap, free chunks are chosen using a ″best fit″ method, as against the ″first fit″ method used in other algorithms.
• It is enabled by the –Xgcpolicy:subpool option.
• Concurrent marking is disabled when subpool policy is used.
Together we achieve: Some parallel processingTogether we achieve: Some parallel processing
• GC Helper threads
On a multiprocessor system with N CPUs, a JVM supporting parallel mode starts N-1 garbage collection helper threads at the time of initialization.
These threads remain idle at all times when the application code is running; they are called into play only when garbage collection is active.
For a particular phase, work is divided between the thread driving the garbage collection and the helper threads, making a total of N threads running in parallel on an N-CPU machine.
The only way to disable the parallel mode is to use the -Xgcthreads parameter to change the number of garbage collection helper threads being started.
• Parallel MarkThe basic idea is to augment object marking through the addition of helper threads and a facility for sharing work between them.
• Parallel BitWise SweepSimilar to parallel mark, uses same helper threads as parallel markImproves sweep times by using available processors.
Designed to give reduced and consistent GC pause times as heap sizes increases.
Concurrent aims to complete the marking just before the before the heap is full.
In the concurrent phase, the Garbage Collector scans the roots by asking each thread to scan its own stack. These roots are then used to trace live objects concurrently.
Tracing is done by a low-priority background thread and by each application thread when it does a heap lock allocation.
Concurrent is enabled by the option: -Xgcpolicy:optavgpause
• Incremental compaction removes the dark matter from the heap and reduces pause times significantly
• The fundamental idea behind incremental compaction is to split the heap up into sections and compact each section just as during a full compaction.
• Incremental compaction was introduced in JDK 1.4.0; is enabled by default and triggered under particular conditions. (Called Reasons)
• -Xpartialcompactgc Option to invoke incremental compaction in every GC cycle.• -Xnopartialcompactgc Option to disable incremental compcation.• -Xnocompactgc Option to disable full compcation
The heap is divided into regionsThe heap is divided into regionsThe regions are further divided into sectionsThe regions are further divided into sectionsEach section is handled by one helper threadEach section is handled by one helper threadA region is divided into A region is divided into
(number of helper threads +1) or (number of helper threads +1) or 8 sections (whichever is less)8 sections (whichever is less)
The whole heap is covered in a few GC cycles.The whole heap is covered in a few GC cycles.
Understanding a typical Understanding a typical verbosegcverbosegc outputoutput
<AF[71]: Allocation Failure. need 65552 bytes, 3 ms since last AF><AF[71]: managing allocation failure, action=2 (142696/10484224)><GC(71): GC cycle started Fri Mar 19 17:59:06 2004<GC(71): freed 94184 bytes, 2% free (236880/10484224), in 12 ms><GC(71): mark: 5 ms, sweep: 0 ms, compact: 7 ms><GC(71): refs: soft 0 (age >= 32), weak 0, final 0, phantom 0><GC(71): moved 3095 objects, 188552 bytes, reason=1>
<AF[71]: managing allocation failure, action=3 (236880/10484224)><AF[71]: managing allocation failure, action=4 (236880/10484224)><AF[71]: managing allocation failure, action=6 (236880/10484224)>JVMDG217: Dump Handler is Processing a Signal - Please Wait.JVMDG315: JVM Requesting Heap dump fileJVMDG318: Heap dump file written to
/workarea/rajeev/gctests/heapdump.20040319.175906.8467.txtJVMDG303: JVM Requesting Java core fileJVMDG304: Java core file written to
/workarea/rajeev/gctests/javacore.20040319.175906.8467.txtJVMDG274: Dump Handler has Processed OutOfMemory.<AF[71]: insufficient heap space to satisfy allocation request><AF[71]: completed in 203 ms>
Using Using verbosegcverbosegc to set the heap sizeto set the heap size
• Use verbosegc to guess ideal size of heap, and then tune using –Xmx and –Xms.
• Setting –Xms: Should be big enough to avoid AFs from the time the application starts to the time it becomes ‘ready’. (Should not be any bigger!)
• Setting –Xmx:In the normal load condition, free heap space after each GC should be > minf (Default is 30%).There should not be any OutOfMemory errors.In heaviest load condition, if free heap space after each GC is > maxf (Default is 70%), heap size is too big.
Effect of wrong Effect of wrong ––XmsXms & & --XmxXmx settingssettingsToo small heap = Too frequent GC.
Too big heap = Too much GC pause time. (Irrespective of amount of physical memory on the system)
Heap size > physical memory size = paging/swapping = bad for your application.
It is desirable to have the Xms much less than Xmx if you are encountering fragmentation issues. This forces class allocations, thread and persistent objects to be allocated at the bottom of the heap.
What about Xms=Xmx?It means no heap expansion or shrinkage will ever occur.
Not normally recommended.
It may be good for a few apps which require constant high heap storage space.
High pause times and system activityHigh pause times and system activity
In the event of pause times being usually acceptable with the exception of a few "abnormally high" spikes - we are likely to infer that the deviation was a result of some system level activity (heavy paging for ex) outside of the Java process.
Consideration: How many clock ticks our process actually spent executing instructions, not time spent waiting for I/O or time spent waiting for a CPU to become available for the process to run on?
Headline Changes in Java 5.0 Garbage CollectorHeadline Changes in Java 5.0 Garbage Collector
Java Tiger: JSE 5.0
5.0 uses a completely new memory management framework
No pinned/dosed objectsStack Maps used to provide a level of indirection between references and heap5.0 VM never pins arrays, it always makes a copy for the JNI code
The GC is Type Accurate
New efficient parallel compactor
-Xgcpolicy:optavgpause includes concurrent sweeping (as well as marking)
Why different GC Policies?Why different GC Policies?• Availability of different GC policies gives you increased capabilities.• Best choice depends upon application behaviour and workloads.• Think about throughput, response times & pause times.
Throughput is the amount of data processed by the application
Pause time is the amount of time the garbage collector has stopped threads while collecting the heap.
Response time is the latency of the application – how quickly it answers incoming requests
Policy Option ( -Xgcpolicy )
Description
Optimize for throughput
Optthruput
(Default)
It is typically used for applications where raw throughput is more important than short GC pauses. The application is stopped each time that garbage is collected.
Optimize for pause times
Optavgpause Trades high throughput for shorter GC pauses by performing some of the garbage collection concurrently. The application is paused for shorter times.
Generational Concurrent
gencon Handles short lived objects differently than the longer lived.
Supool subpool Uses same algorithm similar to the default policy but employs allocation strategy suitable for SMPs.
Memory Management / Garbage Collection Memory Management / Garbage Collection How the IBM J9 Generational Garbage Collector WorksHow the IBM J9 Generational Garbage Collector Works
JVM Heap
Nursery/Young Generation Old Generation Permanent Space
Sun JVM Only:-XX:MaxPermSize=nn
IBM J9:-Xmn (-Xmns/-Xmnx)Sun:-XX:NewSize=nn-XX:MaxNewSize=nn-Xmn<size>
IBM J9:-Xmo (-Xmos/-Xmox)Sun:-XX:NewRatio=n
• Minor Collection – takes place only in the young generation, normally done through direct copying very efficient
• Major Collection – takes place in the new and old generation and usesthe normal mark/sweep (+compact) algorithm
Allocate Space Survivor SpaceSurvivor Space Allocate Space
• Nursery is split into two spaces (semi-spaces)Only one contains live objects and is available for allocationMinor collections (Scavenges) move objects between spacesRole of spaces is reversed
Extensible Verbose Tool KitExtensible Verbose Tool KitAnalyzing your verbose GC outputAnalyzing your verbose GC output
• EVTK: Verbose GC visualizer and analyzerAvailable through ISA
IBM Support Assistant v3.0.2https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=isa
Reduces barrier to understanding verbose output
Visualize GC data to track trends and relationshipsAnalyze results and provide general feedbackExtend to consume output related to applicationBuild plug-able analyzers for specific application needs
• The Extensible Verbose Toolkit (EVTK) is a visualizer for verbose garbage collection output
The tool parses and plots verbose GC output and garbage collection traces (-Xtgcoutput)
• The tooling framework is extensible, and will be expanded over time to include visualization for other collections of data
• The EVTK provides
Raw view of dataLine plots to visualize a variety of GC data characteristicsTabulated reports with heap occupancy recommendationsView of multiple datasets on a single set of axesAbility to save data as an image (jpeg) or comma separated file (csv)
Reports and recommendationsReports and recommendations
• Report contents can be configured using VGC menu options
Occupancy recommendations tell you how to adjust heap size for better performanceSummary information is generated for each input in the datasetGraphs included for all GC display data
• Can export as HTML by right-clicking and using the context menu The Report tab contains the
What does realWhat does real--time mean?time mean?
• Real-time = predictability of performanceHard - Violation of timing constraints are hard failuresSoft - Timing constraints are simply performance goals
• Constraints vary in magnitude (microseconds to seconds)
• Consequences of missing a timing constraint:from service level agreement miss (stock trading)to life in jeopardy (airplanes)
• Real-fast is not real-time, but Real-slow is not real-good
• Need a balance between predictability and throughput
• WRT is a Java runtime providing highly predictable operationReal-time garbage collection (Metronome)Static and dynamic compilationFull support for RTSJ (JSR 1)Java SE 5.0 compliantBuilt and rigorously tested on a RT Linux OS with IBM Opteron Blades
• Profiler Analyzer provides a powerful set of graphical and text-based views that allow users to narrow down performance problems to a particular process, thread, module, symbol, offset, instruction or source line.
– Supports time based system profiles
• Code Analyzer examines executable files and displays detailed information about functions, basic blocks and assembly instructions.
• Pipeline Analyzer displays the pipeline execution of instruction traces.
Counters : In the Process Hierarchy View Counters : In the Process Hierarchy View ( ( View Window View Window --> Show View > Show View --> Others > Others --> Profile Analyzer > Profile Analyzer --> Counters)> Counters)