OMEGAMON Monitor for JVM v540 short presentation

© 2016 IBM Corporation

IBM OMEGAMON for JVM on z/OS V5.4.0

Short Presentation

October 2016

Rev: 540-1

© 2016 IBM Corporation2

This deck is intended to provide a brief but comprehensive overview of the IBM OMEGAMON for JVM product

Sections within this presentation are:– Why monitor JVMs on z/OS– Introducing IBM OMEGAMON for JVM– Resource monitoring of z/OS Connect Enterprise Edition– Example scenarios– Further Information

– = New in OMEGAMON for JVM V5.4.0

Purpose of this Presentation


Why monitor JVMs?

Clear understanding of how many and what JVMs are running within an LPAR Need to address tuning issues in operations environment Highlight degradation of performance over time

Leaks leading to inefficient or excessive Garbage Collections

Diagnose potential OutOfMemory conditions Heap issues and native memory leaks

Contention for shared resources Poor performance caused by multiple threads waiting on resources

Sub-optimal CPU utilization Is work running on general CPU or specialty engines?

Operating within the correct environment Are there subsystems/applications running with insecure JVM levels or settings?

Are the correct application versions deployed?


Introducing OMEGAMON for JVM on z/OS Brand new OMEGAMON monitoring agent focused on assisting z/OS system administrators,

operators and SMEs identify problems, resolve quicker and optimize performance Lightweight overhead compared to other offerings.

– 90% of data collected is through Health Center API

Ability to view all JVMs side-by-side. No disconnect when switching between JVMs Collects data on any online JVM on z/OS

– Subsystems: CICS, IMS DB2, WAS, z/OS Connect, ODM– Standalone Batch USS Java applications– Can identify and distinguish Liberty JVM servers

Data presented on both OMEGAMON enhanced 3270 UI and Tivoli Enterprise Portal Reports on Garbage Collection, Active Threads, Lock Utilization, JVM Environment, CPU Utilization,

Native Memory Detailed report on z/OS Connect EE resources for hybrid cloud monitoring Provides the standard OMEGAMON features:

– Look back in time with historical data collection– Be alerted to abnormal conditions through defined event generation (Situations)– Easy to configure and deploy using PARMGEN

z/OS Connect

JVM

Liberty

JVM

Batch / USS

JVM

CICS TS

JVM

WAS on z/OS

JVM

DB2 on z/OS

JVM

IMS

JVM

ODM on z/OS

JVM

OMEGAMON JVM

Agent


OMEGAMON for JVM

DB2

IMS

CICS

WAS

z/OS ConnectEnterprise Edition

Identify Service/API performance issues within z/OS Connect EE instances faster and avoid

bottlenecks

z/OS Connect EE Resource Monitoring


Data ProvidedHighest JVM Statistics

CPU StatisticsGarbage Collection Statistics

Thread Details

Lock Details

JVM Environment

JVM Command LineSystem Variables

Env VariablesJVM Parameters

ClasspathBoot Classpath

GET CountAverage Hold Time

Slow GetsRecursive AcquiresLock Utilization %

Thread StateContending Object

Stack Trace

Nursery GC DetailsGlobal GC Details% Time PausedHeap Allocation

General CPUSpecialty Processor (IFA) CPUSpecialty Processor Work on

General CPU

z/OS Connect EE Request Metrics Average Response TimeSlowest Services

Native Memory

LE Heap Detailsz/OS Extended Region

DetailJava Native Memory


Scenario: Visibility of all JVMs

• As noted, JVMs can be found all over the environment. Can you be clear what is online, are there JVMs online that are unplanned?

• Starting the JVM Monitor will seek out and find all JVMs on an LPAR regardless of subsystem type whether they have been configured for full monitoring or not.

• The agent will capture the jobname, ASID, subsystem type and basic details of the JVM.

CICS TS

JVM

WAS

JVM

DB2

JVM

IMS

JVM

OMEGAMON JVM

Agent

How much Java are we running? We need to see

all JVMs that are currently

online

LPAR



For a JVM to be fully monitored, it must be instrumented to allow OMEGAMON to collect

data. If not, we can still determine online JVMs and their subsystem type. These are reported on the second subpanel here. A user can then determine if they want to instrument that JVM

for full monitoring.



Equivalent Tivoli Enterprise Portal screen showing JVMs currently being fully

monitored and those detected as being online but not monitored by JVM agent



• To enable full monitoring of a JVM it must be instrumented to allow the OMEGAMON agent to interact with the JVM and issue requests via the Health Center API.

• Typical configuration is a minor change to the JVM startup parameters:-Xhealthcenter:level=inprocess

-javaagent:/omegamon/uss/install/dir/kan/bin/IBM/kjj.jar

• OMEGAMON code will collect JVM environment information, capture JVM events (for example GCs) and push the details to the OMEGAMON JVM agent.

CICS TS

JVM

WAS

JVM

DB2

JVM

IMS

JVM

OMEGAMON JVM

Agent

LPAR


Scenario: Optimizing Garbage Collection

• “Performance of JVM is poor. Could Garbage Collection be a cause?”

• Performance of the Garbage Collector has improved significantly in recent releases of Java however poor performance can still occur due to:• Insufficient heap allocation

• Poorly written applications

• The symptoms of such problems might be:• Excessive number GC events occurring within a given period of time

• High heap occupancy even after a GC

• Long pause times when GC event is occurring

• System GC events occurring

• The Garbage Collection Details workspaces provide insight into the performance of the JVM GC allowing the operator to confirm (or dismiss) the JVM as a bottleneck in the performance throughput.

Performance of JVM is poor. What can be causing this?



The Highest JVM Statistics subpanel shows the poorest performing statistics in key GC metrics

If a threshold is exceeded (example GC Rate per Minute), zoom into the JVM that potentially

has an issue.



GC Details can point out key values that may indicate a problem. A rolling 5 minute interval is

used to scale values.

Does the Occupancy look OK? Average Heap size fine?


Scenario: Out of Memory Conditions• The java.lang.OutOfMemoryError is a severe condition

which often occurs with little warning, and usually brings down the JVM. The error occurs when the system runs out of memory – either Java heap space or native memory.

• OMEGAMON for JVM constantly monitors the proportion of the maximum Java heap size that is still allocated after garbage collection. If that value exceeds 80% a situation is triggered which can take actions such as alerting operations staff or application SMEs. If the condition escalates, then the application can be restarted in an orderly fashion before it crashes or impacts end-users.

• The Native Memory analysis provides details which can help identify constraint in native memory. Either the address space is over-committed or an application has a memory leak. By analyzing metrics such as Language Environment Heap utilization and Extended Region Free %, OMEGAMON can avert major outages due to native memory.

The address space periodically abends.

Can we see what caused the issue?


Scenario: Out Of Memory Condition

Select a Job Name using the action menu and select option 'N' for 'Native Memory'


Scenario: Out Of Memory Condition

If the Extended Region Free % falls below 10 and continues to fall, it is an indication of a

native memory leak. If the value falls below 5, then the JVM may need to be shut down and

restarted


Scenario: Identify Possible Memory Leak

A snapshot of data taken a regular intervals to allow viewing of system status a specified

point in the past

Can we be alerted to

memory issues before it causes

an abend?


Scenario: Identify Possible Memory Leak

A creeping rise in the heap occupancy after a GC has been performed is a sign of a possible

memory leak. Unaddressed could lead to Out Of Memory Error and JVM abend and core dump


Scenario: Identifying Locks and Thread Blocks

• If not GC issues, perhaps threads are being blocked for an excessive period of time or locks within the JVM are being held for long periods causing application to wait for the monitor to yield.

• If high values found here, the application owner (if applicable) can be alerted or adjustments to the JVM environment could be made.

Our applications are performing poorly. Can we see what

might be the cause?



Thread Statistics drills-down to all active threads making BLOCKED threads easy to spot.

NEW in V540 – Also shows Thread CPU to spot loops!



The Lock Statistics shows which monitor objects were used as lock most often an how long they

were held for.


Scenario: Identify Environment Issues

• We are able to deep-dive into JVM environment details to view information like the classpath, system properties and the version of Java being used.

• We can also define a situation to check setting and alert us to a problem. In this case, if a ‘bad’ Java version is being used

We need to ensure the Java

levels being used are up to

date



In the TEP Situation Editor we create a new Situation to check against the JVMs Version attribute. If this condition is ever

met, a Warning alert will be raised.



Once the situation is tripped, you can analyze the current conditions, identify the offending job and take appropriate

action



The Situation Status Tree in enhanced 3270 UI will if there is a JVM online with

the offending Java level. A user could then take appropriate action


Scenario: Slow API Response Time

• It’s important to be alerted to poor response time to services/APIs you are making available to consumers, potentially externally, to satisfy application performance and manage varying workloads before application owners raise complaints.

• The z/OS Connect Summary workspace displays all the current z/OS Connect Services that are executed in the last 5 minutes. This workspace helps identify slow requests and in conjunction with the Garbage Collection or Threads workspace, specific causes of the symptom can be found

Reports are coming back that application

request response time into z/OS is

poor. Can we identify affected

services?


Scenario: Slow API Response Time

Identify the z/OS Connect Job by looking at the Application field. Select the Job using

option 'Z'

Sort the rows by 'Avg Response Time' - Identify and select the service name with highest Avg Response Time. Selecting option 'S' will display more detailed

information about a particular request


Scenario: Slowest z/OS Connect Services

• There may be certain properties that will point us to a problem around the reported service performance issue. Maybe there is something specific about the slowest requests, the client connected, or the payload being submitted.

• The z/OS Connect Slowest Requests display the five worst performing requests over the last 5 minutes for a particular z/OS Connect service. This workspace can be used to provide diagnostic information about a specific request, which can help determine why a particular request performed poorly

Response time of services through a z/OS Connect EE instance is slow.

Can we investigate requests to deduce

the issue?



Identify the desired Service Name you want more details for and select it with option 'S'



Here you can view the five slowest requests, their request ID, method, response time, etc

Requests longer in length may take longer as they are sent as JSON – which might have overhead depending on the subsystem being called.

In addition, if the Response length is 0, there is no JSON response and the request may have encountered an error which could also

cause a slow response time


The OMEGAMON Portfolio

Service Management Suite on z/OS

OMEGAMON Performance Management Suite on z/OS

OMEGAMON z/OS Management SuiteOMEGAMON on

z/OS

OMEGAMON for JVM

OMEGAMON Mainframe Networks

OMEGAMON for Storage

OMEGAMON Dashboard Edition

OMEGAMON for CICS

OMEGAMON for DB2 PE

OMEGAMON for IMS

OMEGAMON for Messaging

ITCAM for Application Diagnostics

Service Management Unite NetView for z/OS

System Automation for z/OS

Tivoli Asset Discovery


More Information/References OMEGAMON Product Home

Overview and product information for all OMEGAMON products

www.ibm.com/OMEGAMON

Service Management Connect Blogs, forums, articles, best practices videos for IBM z Systems

monitoring

www.ibm.com/developerworks/servicemanagement/z

Examples:

Introducing OMEGAMON Monitoring for JVM

Using OMEGAMON to Diagnose Slow JVMs Through Thread Data

OMEGAMON JVM monitoring for z/OS Locking Data

OMEGAMON Monitoring for JVM Technote Summary of latest fixes, known issues and updates

www.ibm.biz/OMEGJVMTechnote

http://www.ibm.com/OMEGAMON

http://www.ibm.com/developerworks/servicemanagement/z

http://ibm.biz/JVM_Monitor_blog

http://ibm.biz/JVMThreadDataBlog

http://ibm.biz/OMEGJVMLocksBlog

http://ibm.biz/OMEGJVMTechnote


Contacts

Offering Management Nathan Brice [email protected] Chris Walker [email protected]

Release Management Jeff Summers [email protected] Dan Kitay [email protected]

Marketing Enablement John Knutson [email protected]

Sales Enablement Giulio Peri [email protected] Diego Bessone [email protected]

mailto:[email protected]









Video Overview Short 4 minute overview introducing OMEGAMON Monitoring Feature for JVM…

YouTube Direct Link: https://youtu.be/QcqnD_B3xsg Service Management Connect Blog with video embedded: www.ibm.biz/OMEGJVMVideoBlog

https://youtu.be/QcqnD_B3xsg

http://ibm.biz/OMEGJVMVideoBlog

OMEGAMON Monitor for JVM v540 short presentation

Software