© 2016 IBM Corporation IBM OMEGAMON for JVM on z/OS V5.4.0 Short Presentation October 2016 Rev: 540-1
© 2016 IBM Corporation
IBM OMEGAMON for JVM on z/OS V5.4.0
Short Presentation
October 2016
Rev: 540-1
© 2016 IBM Corporation2
This deck is intended to provide a brief but comprehensive overview of the IBM OMEGAMON for JVM product
Sections within this presentation are:– Why monitor JVMs on z/OS– Introducing IBM OMEGAMON for JVM– Resource monitoring of z/OS Connect Enterprise Edition– Example scenarios– Further Information
– = New in OMEGAMON for JVM V5.4.0
Purpose of this Presentation
© 2016 IBM Corporation3
Why monitor JVMs?
Clear understanding of how many and what JVMs are running within an LPAR Need to address tuning issues in operations environment Highlight degradation of performance over time
Leaks leading to inefficient or excessive Garbage Collections
Diagnose potential OutOfMemory conditions Heap issues and native memory leaks
Contention for shared resources Poor performance caused by multiple threads waiting on resources
Sub-optimal CPU utilization Is work running on general CPU or specialty engines?
Operating within the correct environment Are there subsystems/applications running with insecure JVM levels or settings?
Are the correct application versions deployed?
© 2016 IBM Corporation4
Introducing OMEGAMON for JVM on z/OS Brand new OMEGAMON monitoring agent focused on assisting z/OS system administrators,
operators and SMEs identify problems, resolve quicker and optimize performance Lightweight overhead compared to other offerings.
– 90% of data collected is through Health Center API
Ability to view all JVMs side-by-side. No disconnect when switching between JVMs Collects data on any online JVM on z/OS
– Subsystems: CICS, IMS DB2, WAS, z/OS Connect, ODM– Standalone Batch USS Java applications– Can identify and distinguish Liberty JVM servers
Data presented on both OMEGAMON enhanced 3270 UI and Tivoli Enterprise Portal Reports on Garbage Collection, Active Threads, Lock Utilization, JVM Environment, CPU Utilization,
Native Memory Detailed report on z/OS Connect EE resources for hybrid cloud monitoring Provides the standard OMEGAMON features:
– Look back in time with historical data collection– Be alerted to abnormal conditions through defined event generation (Situations)– Easy to configure and deploy using PARMGEN
z/OS Connect
JVM
Liberty
JVM
Batch / USS
JVM
CICS TS
JVM
WAS on z/OS
JVM
DB2 on z/OS
JVM
IMS
JVM
ODM on z/OS
JVM
OMEGAMON JVM
Agent
© 2016 IBM Corporation5
OMEGAMON for JVM
DB2
IMS
CICS
WAS
z/OS ConnectEnterprise Edition
Identify Service/API performance issues within z/OS Connect EE instances faster and avoid
bottlenecks
z/OS Connect EE Resource Monitoring
© 2016 IBM Corporation6
Data ProvidedHighest JVM Statistics
CPU StatisticsGarbage Collection Statistics
Thread Details
Lock Details
JVM Environment
JVM Command LineSystem Variables
Env VariablesJVM Parameters
ClasspathBoot Classpath
GET CountAverage Hold Time
Slow GetsRecursive AcquiresLock Utilization %
Thread StateContending Object
Stack Trace
Nursery GC DetailsGlobal GC Details% Time PausedHeap Allocation
General CPUSpecialty Processor (IFA) CPUSpecialty Processor Work on
General CPU
z/OS Connect EE Request Metrics Average Response TimeSlowest Services
Native Memory
LE Heap Detailsz/OS Extended Region
DetailJava Native Memory
© 2016 IBM Corporation7
Scenario: Visibility of all JVMs
• As noted, JVMs can be found all over the environment. Can you be clear what is online, are there JVMs online that are unplanned?
• Starting the JVM Monitor will seek out and find all JVMs on an LPAR regardless of subsystem type whether they have been configured for full monitoring or not.
• The agent will capture the jobname, ASID, subsystem type and basic details of the JVM.
CICS TS
JVM
WAS
JVM
DB2
JVM
IMS
JVM
OMEGAMON JVM
Agent
How much Java are we running? We need to see
all JVMs that are currently
online
LPAR
© 2016 IBM Corporation8
Scenario: Visibility of all JVMs
For a JVM to be fully monitored, it must be instrumented to allow OMEGAMON to collect
data. If not, we can still determine online JVMs and their subsystem type. These are reported on the second subpanel here. A user can then determine if they want to instrument that JVM
for full monitoring.
© 2016 IBM Corporation9
Scenario: Visibility of all JVMs
Equivalent Tivoli Enterprise Portal screen showing JVMs currently being fully
monitored and those detected as being online but not monitored by JVM agent
© 2016 IBM Corporation10
Scenario: Visibility of all JVMs
• To enable full monitoring of a JVM it must be instrumented to allow the OMEGAMON agent to interact with the JVM and issue requests via the Health Center API.
• Typical configuration is a minor change to the JVM startup parameters:-Xhealthcenter:level=inprocess
-javaagent:/omegamon/uss/install/dir/kan/bin/IBM/kjj.jar
• OMEGAMON code will collect JVM environment information, capture JVM events (for example GCs) and push the details to the OMEGAMON JVM agent.
CICS TS
JVM
WAS
JVM
DB2
JVM
IMS
JVM
OMEGAMON JVM
Agent
LPAR
© 2016 IBM Corporation11
Scenario: Optimizing Garbage Collection
• “Performance of JVM is poor. Could Garbage Collection be a cause?”
• Performance of the Garbage Collector has improved significantly in recent releases of Java however poor performance can still occur due to:• Insufficient heap allocation
• Poorly written applications
• The symptoms of such problems might be:• Excessive number GC events occurring within a given period of time
• High heap occupancy even after a GC
• Long pause times when GC event is occurring
• System GC events occurring
• The Garbage Collection Details workspaces provide insight into the performance of the JVM GC allowing the operator to confirm (or dismiss) the JVM as a bottleneck in the performance throughput.
Performance of JVM is poor. What can be causing this?
© 2016 IBM Corporation12
Scenario: Optimizing Garbage Collection
The Highest JVM Statistics subpanel shows the poorest performing statistics in key GC metrics
If a threshold is exceeded (example GC Rate per Minute), zoom into the JVM that potentially
has an issue.
© 2016 IBM Corporation13
Scenario: Optimizing Garbage Collection
GC Details can point out key values that may indicate a problem. A rolling 5 minute interval is
used to scale values.
Does the Occupancy look OK? Average Heap size fine?
© 2016 IBM Corporation14
Scenario: Out of Memory Conditions• The java.lang.OutOfMemoryError is a severe condition
which often occurs with little warning, and usually brings down the JVM. The error occurs when the system runs out of memory – either Java heap space or native memory.
• OMEGAMON for JVM constantly monitors the proportion of the maximum Java heap size that is still allocated after garbage collection. If that value exceeds 80% a situation is triggered which can take actions such as alerting operations staff or application SMEs. If the condition escalates, then the application can be restarted in an orderly fashion before it crashes or impacts end-users.
• The Native Memory analysis provides details which can help identify constraint in native memory. Either the address space is over-committed or an application has a memory leak. By analyzing metrics such as Language Environment Heap utilization and Extended Region Free %, OMEGAMON can avert major outages due to native memory.
The address space periodically abends.
Can we see what caused the issue?
© 2016 IBM Corporation15
Scenario: Out Of Memory Condition
Select a Job Name using the action menu and select option 'N' for 'Native Memory'
© 2016 IBM Corporation16
Scenario: Out Of Memory Condition
If the Extended Region Free % falls below 10 and continues to fall, it is an indication of a
native memory leak. If the value falls below 5, then the JVM may need to be shut down and
restarted
© 2016 IBM Corporation17
Scenario: Identify Possible Memory Leak
A snapshot of data taken a regular intervals to allow viewing of system status a specified
point in the past
Can we be alerted to
memory issues before it causes
an abend?
© 2016 IBM Corporation18
Scenario: Identify Possible Memory Leak
A creeping rise in the heap occupancy after a GC has been performed is a sign of a possible
memory leak. Unaddressed could lead to Out Of Memory Error and JVM abend and core dump
© 2016 IBM Corporation19
Scenario: Identifying Locks and Thread Blocks
• If not GC issues, perhaps threads are being blocked for an excessive period of time or locks within the JVM are being held for long periods causing application to wait for the monitor to yield.
• If high values found here, the application owner (if applicable) can be alerted or adjustments to the JVM environment could be made.
Our applications are performing poorly. Can we see what
might be the cause?
© 2016 IBM Corporation20
Scenario: Identifying Locks and Thread Blocks
Thread Statistics drills-down to all active threads making BLOCKED threads easy to spot.
NEW in V540 – Also shows Thread CPU to spot loops!
© 2016 IBM Corporation21
Scenario: Identifying Locks and Thread Blocks
The Lock Statistics shows which monitor objects were used as lock most often an how long they
were held for.
© 2016 IBM Corporation22
Scenario: Identify Environment Issues
• We are able to deep-dive into JVM environment details to view information like the classpath, system properties and the version of Java being used.
• We can also define a situation to check setting and alert us to a problem. In this case, if a ‘bad’ Java version is being used
We need to ensure the Java
levels being used are up to
date
© 2016 IBM Corporation23
Scenario: Identify Environment Issues
In the TEP Situation Editor we create a new Situation to check against the JVMs Version attribute. If this condition is ever
met, a Warning alert will be raised.
© 2016 IBM Corporation24
Scenario: Identify Environment Issues
Once the situation is tripped, you can analyze the current conditions, identify the offending job and take appropriate
action
© 2016 IBM Corporation25
Scenario: Identify Environment Issues
The Situation Status Tree in enhanced 3270 UI will if there is a JVM online with
the offending Java level. A user could then take appropriate action
© 2016 IBM Corporation26
Scenario: Slow API Response Time
• It’s important to be alerted to poor response time to services/APIs you are making available to consumers, potentially externally, to satisfy application performance and manage varying workloads before application owners raise complaints.
• The z/OS Connect Summary workspace displays all the current z/OS Connect Services that are executed in the last 5 minutes. This workspace helps identify slow requests and in conjunction with the Garbage Collection or Threads workspace, specific causes of the symptom can be found
Reports are coming back that application
request response time into z/OS is
poor. Can we identify affected
services?
© 2016 IBM Corporation27
Scenario: Slow API Response Time
Identify the z/OS Connect Job by looking at the Application field. Select the Job using
option 'Z'
Sort the rows by 'Avg Response Time' - Identify and select the service name with highest Avg Response Time. Selecting option 'S' will display more detailed
information about a particular request
© 2016 IBM Corporation28
Scenario: Slowest z/OS Connect Services
• There may be certain properties that will point us to a problem around the reported service performance issue. Maybe there is something specific about the slowest requests, the client connected, or the payload being submitted.
• The z/OS Connect Slowest Requests display the five worst performing requests over the last 5 minutes for a particular z/OS Connect service. This workspace can be used to provide diagnostic information about a specific request, which can help determine why a particular request performed poorly
Response time of services through a z/OS Connect EE instance is slow.
Can we investigate requests to deduce
the issue?
© 2016 IBM Corporation29
Scenario: Slowest z/OS Connect Services
Identify the desired Service Name you want more details for and select it with option 'S'
© 2016 IBM Corporation30
Scenario: Slowest z/OS Connect Services
Here you can view the five slowest requests, their request ID, method, response time, etc
Requests longer in length may take longer as they are sent as JSON – which might have overhead depending on the subsystem being called.
In addition, if the Response length is 0, there is no JSON response and the request may have encountered an error which could also
cause a slow response time
© 2016 IBM Corporation31
The OMEGAMON Portfolio
Service Management Suite on z/OS
OMEGAMON Performance Management Suite on z/OS
OMEGAMON z/OS Management SuiteOMEGAMON on
z/OS
OMEGAMON for JVM
OMEGAMON Mainframe Networks
OMEGAMON for Storage
OMEGAMON Dashboard Edition
OMEGAMON for CICS
OMEGAMON for DB2 PE
OMEGAMON for IMS
OMEGAMON for Messaging
ITCAM for Application Diagnostics
Service Management Unite NetView for z/OS
System Automation for z/OS
Tivoli Asset Discovery
© 2016 IBM Corporation32
More Information/References OMEGAMON Product Home
Overview and product information for all OMEGAMON products
www.ibm.com/OMEGAMON
Service Management Connect Blogs, forums, articles, best practices videos for IBM z Systems
monitoring
www.ibm.com/developerworks/servicemanagement/z
Examples:
Introducing OMEGAMON Monitoring for JVM
Using OMEGAMON to Diagnose Slow JVMs Through Thread Data
OMEGAMON JVM monitoring for z/OS Locking Data
OMEGAMON Monitoring for JVM Technote Summary of latest fixes, known issues and updates
www.ibm.biz/OMEGJVMTechnote
© 2016 IBM Corporation33
Contacts
Offering Management Nathan Brice [email protected] Chris Walker [email protected]
Release Management Jeff Summers [email protected] Dan Kitay [email protected]
Marketing Enablement John Knutson [email protected]
Sales Enablement Giulio Peri [email protected] Diego Bessone [email protected]
© 2016 IBM Corporation34
Video Overview Short 4 minute overview introducing OMEGAMON Monitoring Feature for JVM…
YouTube Direct Link: https://youtu.be/QcqnD_B3xsg Service Management Connect Blog with video embedded: www.ibm.biz/OMEGJVMVideoBlog