Top Banner
1 | Page Performance testing wreaking balls Leonid Grinshpan, PhD There are many ways to ruin a performance testing project, there is just a handful of ways to do it right. This publication analyses the most widespread performance testing blunders. It is impossible in one article to expose all the varieties of testing wrongdoings; as such, this publication is definitely an open- ended. It is based on my posts on the Practical Performance Analyst web site (http://www1.practicalperformanceanalyst.com/); I’m thankful to the site owner and manager Trevor Warren for giving me an opportunity and incentive to expose to engineering community my opinion on the topic. Contents Wreaking ball 1. Lacking a knowledge of application under test ................................................................. 2 Wreaking ball 2. Not seeing the forest for the trees .................................................................................... 5 Wreaking ball 3. Disregarding monitoring .................................................................................................... 8 Wreaking ball 4. Ignoring workload specification ...................................................................................... 12 Wreaking ball 5. Overlooking software bottlenecks ................................................................................... 15
16

Performance testing wreaking balls

Jun 29, 2015

Download

Software

There are many ways to ruin a performance testing project, there is just a handful of ways to do it right. This publication analyses the most widespread performance testing blunders. It is impossible in one article to expose all the varieties of testing wrongdoings; as such, this publication is definitely an open-ended.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance testing wreaking balls

1 | P a g e

Performance testing wreaking balls

Leonid Grinshpan, PhD

There are many ways to ruin a performance testing project, there is just a handful of ways to do it right.

This publication analyses the most widespread performance testing blunders. It is impossible in one

article to expose all the varieties of testing wrongdoings; as such, this publication is definitely an open-

ended. It is based on my posts on the Practical Performance Analyst web site

(http://www1.practicalperformanceanalyst.com/); I’m thankful to the site owner and manager Trevor

Warren for giving me an opportunity and incentive to expose to engineering community my opinion on

the topic.

Contents

Wreaking ball 1. Lacking a knowledge of application under test ................................................................. 2

Wreaking ball 2. Not seeing the forest for the trees .................................................................................... 5

Wreaking ball 3. Disregarding monitoring .................................................................................................... 8

Wreaking ball 4. Ignoring workload specification ...................................................................................... 12

Wreaking ball 5. Overlooking software bottlenecks ................................................................................... 15

Page 2: Performance testing wreaking balls

2 | P a g e

Wreaking ball 1. Lacking a knowledge of application under test

When we have to fix a car engine, the first thing we do is learning how the engine works. But when we

plan to load test and tune an application, knowledge of its functions and internals sometimes is

considered an unnecessary burden. If an acquaintance with the application is not included into “must to

do” list of performance project activities, than with 100% certainty the project will be ruined by

application unawareness wreaking ball. To avoid a project fiasco, here’s what we have to know about

application before we start deploying a performance testing framework.

Application functionality

What application is intended to do? What tasks does it help to automate? What are the user’s

categories (for example, planners, analysts, reviewers, C-level executives, etc.)? How the users of each

category are working with the application? Does application process the batch jobs? Does it do it

concurrently with the users’ interactive requests?

Knowledge of application functionality and usage enables to specify realistic workload to be generated

by load testing tools.

Application components and tuning parameters

Today’s enterprise applications are not monolith software masses; they are comprised of a number of

software components hosted on different hardware servers. The most prevalent are Web servers,

Application servers, and Databases. Depending on functionality, an application can include reporting,

consolidation, printing, data transformation, and many other components processing users’ requests.

The components have tuning parameters that makes them adaptable to varying service demands. The

examples of tuning parameters are: Java Virtual Machine minimum and maximum heap sizes, number of

software threads supporting concurrent processing by a server, number of database connections, etc.

If we know functionality of application’s component as well as each component tuning parameter, we

can fix a bottleneck caused by particular component by changing the values of its tuning variables.

Transactions

The users communicate with an application by initiating the transactions that require allocation of the

system resources to be processed. The intensity of the user transactions at any given time depends on

the number of users actively interacting with a system. It also is a derivative of the pace each user

submits one transaction after another.

In order to generate test workload that closely emulates real business workload, we have to identify the

following transaction characteristics:

Page 3: Performance testing wreaking balls

3 | P a g e

- List of transactions generated by the users.

- Per each transaction its intensity expressed in the average number of times a transaction was initiated per one hour by single user.

- Per each transaction, the number of users requesting it.

Transaction profiles

Transaction profile is a measure of single transaction demand for system resources. A transaction triggers a multitude of processing activities by application components hosted on different hardware servers. Each component allocates its resources for transaction processing for a particular time interval. In general, each component has the following assets to be allocated: Active resources:

- CPU time (data processing) - I/O time (data transfer) - Network time (data transfer)

Passive resources: - Software connections to the servers and services (for example, Web server

connections, database connections) - Software threads - Storage space - Memory space - Software locks

Active resources implement transaction processing and data transfer. Passive resources provide access to active resources. A consumption of an active resource is measured in the time interval it was serving a transaction. A metric for a passive resource usage depends on passive resource type: for software connections and threads it is a number of connections and threads; for memory and storage it is a size of allocated memory. A transaction profile is a set of numbers (vector) specifying quantity of each resource consumed by transaction during its processing by hardware components. Knowledge of transaction profiles is important for correct deployment of performance monitors to ensure that they record meaningful performance counters capable to identify shortage of system resources under load. Application processes

For operating system (OS) an application represents a set of processes working under OS control and

receiving from OS resources needed to satisfy the demands from user transactions. We have to know

the processes of our application in order to monitor system resources allocated to each process.

Windows Task Manager provides basic information on each process behavior:

Page 4: Performance testing wreaking balls

4 | P a g e

Windows Performance Monitor delivers broad range of performance counters for each process. This is

an example of performance counter readings for FrameworkService process:

Page 5: Performance testing wreaking balls

5 | P a g e

In UNIX universe, depending on the UNIX flavor, the process performance counters might look like

below:

Monitoring server level counters (like total CPU utilization) might show shortage of a particular hardware

resource, but won’t identify which process experiences a resource deficit. To identify that process a

monitoring has to be set up per each application process.

Application logs and error files

Application logs and error files are data gold mines - they keep valuable information on executed

transactions including, but not limited to, the times transaction was processed by different application

components. On some occasions such information let quickly identify a component that is causing a

bottleneck.

Information in log and error files helps to check if a workload generated by load testing tools is in sync

with the test requirements.

Wreaking ball 2. Not seeing the forest for the trees

In the course of performance testing implementation we usually realize that an application, we have

tasked to find out the performance malfunction causes, features tremendous complexity. Today’s

distributed application is comprised of a tanglewood of physical and logical objects interacting in a very

intricate manner using multiple rules, algorithms, and protocols while serving communities of

concurrent users generating fluctuating workloads. That makes application performance testing and

troubleshooting extremely cumbersome and time consuming. Identification of a few tuning parameters

capable to eliminate a bottleneck can be compared to locating a needle in a large and messy haystack

because application features hundreds of tuning parameters on system and application levels.

Page 6: Performance testing wreaking balls

6 | P a g e

System level tuning parameters define the management policies of the operating systems. Microsoft

offers 112 pages document “Performance Tuning Guidelines for Windows Server 2008 R2”

(http://tinyurl.com/qx4v4gy). AIX operating system tuning guide by IBM is even fattier – it has 744 pages

(http://tinyurl.com/o3b66o8).

Application tuning parameters control application demand for system resources as well as

configurations of application internal logical objects like software threads, connection pools, etc.

Application vendors publish comprehensive tuning documentations to help optimize their products.

Here are a few examples of Oracle all-inclusive performance tuning publications: “Oracle® Fusion

Middleware Performance and Tuning Guide” (http://tinyurl.com/kurmd9p), “Oracle® JRockit Performance

Tuning Guide” (http://tinyurl.com/mggv55j), “Oracle® Fusion Middleware Performance and Tuning for

Oracle WebLogic Server” (http://tinyurl.com/panje7g).

Is it possible to perceive an application in a way that scales down its complexity (similar to taking an

aerial view of the Earth)? In other words, can we conceptualize an application to abstract from

numerous details and concentrate only on its objects that have a potential to create the bottlenecks?

Shortly, can less be more?

The answer is “yes”. An application complexity reduction process encompasses building its mental

model. Wikipedia defines a mental model as an explanation of someone's thought process about how

something works in the real world (http://en.wikipedia.org/wiki/Mental_model).

In what follows we show how to build a mental model of application that exposes the relations between

demand for application services and supply of application resources. By devising a mental model we

conceptually leap into different perspective on application; that perspective highlights application

components, their interconnections, as well as transaction processing inside application.

Mental model constructs

We need three constructs to build a mental model that serves our purpose:

1. Nodes – represent hardware components processing user requests. The nodes symbolize

servers, appliances, and networks.

2. Interconnections among nodes – they stand for connections among hardware components.

Interconnections and nodes define application topology.

3. Transactions – characterize user requests for application services. If we visualize a transaction

as a physical object (for example, a car), we can create in our mind an image of a car-transaction

visiting different nodes and spending some time in each one while receiving a service.

The model’s constructs associate with application objects as shown in a table:

Page 7: Performance testing wreaking balls

7 | P a g e

Component of application Matching element in a model

Users and their computers Node “users”

Network Node “network”

Server Node “server”

Transactions initiated by users Cars

An application can be represented by a mental model as shown on the picture below:

A transaction starts its journey when a user clicks on a menu item or a link that implicitly initiates a

transaction. In a model it means that a transaction leaves a node “users”. After that it gets processed in

the nodes “network” and “server”. At the end of its journey a transaction comes back to a node “users “.

Total time a transaction has spent in the nodes “network” and “server” is a transaction response time.

How mental models help to identify the bottlenecks

Let’s consider what can cause a delay in a processing of a car-transaction in a node. One obvious reason

– a node does not have enough capacity when a number of car-transactions concurrently requested a

service. In such a case some car-transactions will receive a service, but the others will wait. Another

delay reason is not quite obvious. For example, in order to process a transaction in a CPU, an

application has to request and to receive a particular memory space. But what will happen if a memory

is not available? Obviously, a transaction will wait. In general, this fact means that transaction

processing can be delayed as a result of a limited access to a node even if node is not fully utilized.

We came to important conclusion: transaction delay can be cause by two circumstances – a shortage of

a resource capacity and a limited access to a resource.

Page 8: Performance testing wreaking balls

8 | P a g e

That means that in order to indentify where the bottlenecks can potentially take place, we have to

monitor all hardware resources that are processing transactions, as well as all objects providing access

to resources. Among such objects there are the physical ones (like memory and disk space), as well as

the logical programmatic constructs (like software threads, connection pools, locks, semaphores, etc).

The model on above picture suggests that the bottlenecks in our application might happen when there

are insufficient CPU and I/O resources in hardware server, as well as when the server has limited

memory, application spawns insufficient software threads or features poorly tuned connection pool.

Indeed, low network throughput also can cause the bottlenecks.

The model consolidates our bottleneck troubleshooting efforts into right directions; we will deploy

monitors to collect server’s CPU and I/O meaningful performance counters as well as we will monitor

server memory availability and the behavior of the connection pools and software threads for our

application processes. We also will use network monitors to assess network latency and throughput.

The bottom line – the mental models expose application fundamentals distilled of innumerable

application particulars that conceal the roots of performance issues.

From mental models to queuing models

The application mental models are indispensable instruments streamlining our performance testing and

troubleshooting activities. The mental models point to the facts that a bottleneck happens when a node

does not have sufficient capacity or access to the node is limited. In both cases processing of a

transaction will be delayed because transaction will be placed into a waiting queue.

Queuing is a major phenomenon defining application performance, but mental models cannot

quantitatively asses its impact on transaction times, as well as on application architecture. If we want to

find out an application architecture delivering application performance according to a service level

agreement, we have to transition from the application mental models to their queuing models. The

book [1] can be a guide in that journey.

Wreaking ball 3. Disregarding monitoring

In order to find out what causes unacceptable production system performance we have to monitor its

hardware and software components during normal operation or while running a load test. A system

monitoring framework can be compared to medical doctor diagnostic equipment – without it a doctor

has in his disposal only patient’s complains but not the data objectively describing vital body functions.

Running performance tests without monitoring performance counters or monitoring the wrong ones is a

useless undertaking as it does not deliver any information on application performance under load.

Page 9: Performance testing wreaking balls

9 | P a g e

Today’s systems expose hundreds of counters available for monitoring. The commonly utilized counter’s

categories are:

- The counters reporting utilization of system resources during some time interval (for example,

percent of total CPU utilizations, percent of CPU utilization by a particular process, percent of

physical disk utilization)

- Resource throughput measured in a number of operations executed by a resource during

particular time interval (for example, network throughput measured in bytes/second, number of

I/O Reads/second)

The common characteristic of both categories is that all their counters are time dependent. That means

an accuracy of reported by a counter parameter value depends on accuracy of the time measurement.

Unfortunately, in dominant today virtualized and cloud computing environments a timekeeping is

flawed. A detailed discussion on that can be found in [2,3]; here we just point at its fundamental

reason: in a virtual environment hypervisor treats guest operating system (OS) as any other process that

can be stopped and resumed at any time. When guest OS is stopped, it cannot accept time interrupts

from hardware clock. That means guest OS misses time intervals as it cannot measure time when it is

not running; that makes making time-dependent metric not representative. Taking into consideration

this fact, what are the right objects to monitor in virtualized environments?

To find it out let’s invoke a system’s conceptual queuing model. We described a representation of the

systems by the models in a previous paragraph and we stated that a queuing model is an abstract

representation of a system that includes a depiction of the system resources as well as the demands for

resources generated by the users (more on queuing models of the systems can be found in a book [1]).

Queuing models create systematic framework for system performance analysis and capacity planning.

The queues are the major phenomenon defining system performance, because waiting time in the

queues adds up to the time a transaction is processing by system resources. A queue is an indicator of

an imbalance between demand generated by fluctuating user’s workload and availability of system’s

resources to satisfy the demand. As such, while troubleshooting a performance bottleneck, it is

necessary to find out where in a system the queues are building up and exceeding the acceptable

thresholds. That can be done by monitoring internal system queues. Because instantaneous counts of

the queue lengths do not depend on implementation of a system timekeeping mechanism, this

approach delivers representative performance metrics for any environment.

We already noted that application requests two kinds of resources to process the user transactions -

active and passive. In order to be processed by any active resource, a transaction has to request and get

allocated the passive resources. If any resource needed for transaction processing is not available

because all supply is taken by other transactions, a transaction will wait in a queue until resource is

released. Indeed, wait time will increase transaction response time and application performance will

suffer.

Page 10: Performance testing wreaking balls

10 | P a g e

Observing queues is not an exceptional task – it can be done using built into OS performance monitors and counter-reporting commands. We compiled in the table below information on the Windows counters that deliver instantaneous queue lengths for different system objects. The table is far from all-inclusive, but it is sufficient enough to demonstrate queue-based performance monitoring tactic.

Counter Description

Active resources queues

Queue Length (object Server Work Queues)

Queue Length is the current length of the server work queue for particular CPU. A sustained queue length greater than four might indicate processor congestion.

Processor Queue Length (object System)

Processor Queue Length is the number of threads in the processor queue. Unlike the disk counters, this counter shows ready threads only, not threads that are running. There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the workload.

Current Disk Queue Length (object Physical Disk)

Current Disk Queue Length is the number of requests outstanding on the disk at the time the performance data is collected. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. This counter shows not only waiting requests, but also the ones that are processing. That is why requests delays are proportional to the length of this queue minus the number of spindles on the disks. For good performance, this difference should average less than two.

Output Queue Length (object Network Interface)

Output Queue Length is the length of the output packet queue on Network Interface Card. If this is longer than two, there are delays and the bottleneck should be eliminated. Microsoft informs that Output Queue Length counter value is not reliable on computers with symmetric multiprocessing (SMP) architecture; the details are here: http://support.microsoft.com/kb/822226.

Jobs (object Print Queue)

The counter shows current number of jobs in a print queue.

Passive resources queues

Current Queue Size (object HTTP Service Request Queues)

The counter shows number of requests in the Web server queue. Requests begin to accumulate when IIS falls behind in dequeueing requests. The limit is set by the application pool’s queueLength attribute, and defaults to 1000. When limit is reached, HTTP.SYS returns 503 Service Unavailable [http://blog.leansentry.com/2013/07/all-about-iis-asp-net-request-queues/] .

Requests Queued (object ASP.NET)

The number of requests waiting to be processed. When this number starts to increment linearly with respect to client load, the Web server computer has reached the limit of concurrent requests that it can process [http://www.techiesbytes.com/2010/05/using-perfmon-for-web-application-based.html].

Page 11: Performance testing wreaking balls

11 | P a g e

Requests in Application Queue (object ASP.NET Applications)

In Classic mode, ASP.NET will queue all incoming requests to the per-application queue when there are not enough threads. The threads available for request processing are determined by available threads in the CLR thread pool, minus the reserved threads set by the httpRuntime/minFreeThreads andhttpRuntime/minFreeLocalThreads attributes. This queue indicates poor performance, and does not guarantee FIFO policy in application pools with multiple applications. Because threads are shared between multiple apps, so a single app can starve the other applications of available threads [http://blog.leansentry.com/2013/07/all-about-iis-asp-net-request-queues/].

Requests Queued (object ASP.NET v4.0.30319)

In Integrated mode, ASP.NET will queue all incoming requests after the configured concurrency limit is reached. Concurrency limit is set by the MaxConcurrentRequestsPerCPU registry key or applicationPool/maxConcurrentRequestsPerCPU attribute (Defaults to 12 on .NET 2.0/3.5, and 5000 on .NET 4.0+) and MaxConcurrentThreadsPerCPU registry key or the applicationPool/MaxConcurrentThreadsPerCPU attribute (defaults to 0, disabled) [http://blog.leansentry.com/2013/07/all-about-iis-asp-net-request-queues/].

Current Queue Length (object .NET CLR LocksAndThreads)

This counter (pertaining to Common Language Runtime Microsoft’s execution environment) displays the last recorded number of threads currently waiting to acquire a managed lock in an application. You may want to run dedicated tests for a particular piece of code to identify the average queue length for the particular code path. This helps identify inefficient synchronization mechanisms [http://blog.monitis.com/2012/09/14/improving-net-performance-part-17-measuring-net-application-performance-ii/].

Objects .NET Data Provider for SqlServer and .NET Data Provider for Oracle

Both objects have ten counters each providing the counts (not rates!) characterizing size and status of connection pools and connections. Information on monitoring and tuning connection pools can be found here: http://betav.com/blogadmin/mt-search.cgi?search=Managing+and+Monitoring+.NET+Connection+Pools&IncludeBlogs=3&limit=20

Thread State and Thread Wait Reason(object Thread)

A software thread is a logical vehicle that moves execution forward. Each process initiates one or a few threads. Monitoring its thread states provides information on what causes delays in process execution. Thread State counter reports the current state of the thread. It is 0 for Initialized, 1 for Ready, 2 for Running, 3 for Standby, 4 for Terminated, 5 for Wait, 6 for Transition, 7 for Unknown. A Running thread is using a processor; a Standby thread is about to use one. A Ready thread wants to use a processor, but is waiting for a processor because none are free. A thread in Transition is waiting for a resource in order to execute, such as waiting for its execution stack to be paged in from disk. A Waiting thread has no use for the processor because it is waiting for a peripheral operation to complete or a resource to become free. Thread Wait Reason counter is only applicable when the thread is in the Wait state. There are multiple wait reasons, but each one indicates that execution of the thread is suspended as it is waiting for a particular event. The examples of thread wait reasons:

Page 12: Performance testing wreaking balls

12 | P a g e

waiting for a memory page to be freed waiting for a component of the Windows NT Executive waiting for a page to be written to disk

For enumeration of wait reasons visit http://msdn.microsoft.com/en-us/library/ms804615.aspx

This list of queue-reporting counters is extracted from Windows Performance Monitor; its goal is to demonstrate the primary tenets of queue monitoring. Today’s application landscape is tremendously diversified and requires monitoring of the queue-reporting counters for multiple UNIX flavors, the counters pertaining to different technologies, as well as application-specific counters exposed by built-in instrumentation. The common denominator is the same – if a counter shows that a queue exceeds an acceptable threshold than system performance is degraded. In addition to timekeeping issue in VMs, the technological advances like hyperthreading, power

management, CPU entitlement, and the others also distort time-dependent performance counters [4].

That makes queue monitoring a trusted and preferred methodology for a wide range of the systems that

are built upon sophisticated technologies.

Wreaking ball 4. Ignoring workload specification

The essential task of any application is processing of a generated by the users production workload according to the requirements stated in a service level agreement. To ensure that application is capable to successfully execute such a task, the performance engineers carry out load testing. Correct specification of production workload is a cornerstone of the load testing projects; it enables to come up with proper application sizing and tuning recommendations. This paragraph analyses the components of application workload; it demonstrates that ignoring or skewing production workload specifications while generating workload by load testing tools makes tuning recommendations meaningless, or ever worse, misleading.

1. The users communicate with application by issuing the transactions that represent the requests to perform particular application functions. Each request triggers a transaction execution process that consumes various hardware and software resources: CPUs, memory, network bandwidth, software threads, database connections, etc. The transactions examples are: login into system with user name and password; calculate revenue of plasma TV sales in New England stores in August 2012; estimate the expenses to launch new product line in China, create company profit and loss statement. Each transaction ends up by delivering the results to the users. As an example, Figure 1 shows a financial report generated as a reply to a reporting transaction (http://www.oracle.com/technetwork/middleware/bi-foundation/financial-reporting-large-125270.gif).

Page 13: Performance testing wreaking balls

13 | P a g e

Figure1 Financial report generated as a reply to a reporting transaction

A rich functionality supported by an application for each line of business is reflected in a broad assortment of transactions available to the users. Some applications have hundreds transactions accessible over user interfaces; such a richness of application’s front-ends sometimes collides with a need for simplicity of a user experience. Nevertheless, the business applications must be in tune with a complexity of the business processes and the way to implement it is to let users to initiate the transactions delivering needed information. The flow of transactions constitutes application’s transactional workload. The intensity of the transactions at any given time depends on a number of users actively interacting with a system. It also is a derivative of a pace each user submits one transaction after another. The interval between consecutive transactions from the same user can be substantial as a user needs time to assess a reply from an application to a previous request, and to prepare the next one.

A workload specification includes a list of transactions, a number of concurrent users initiating each transaction, and, per each transaction, its rate measured in a number of times it is set off during one hour by one user. This is an example of characterization of a user-generated workload:

List of transactions

Number of users initiating

transaction

Transaction rate (average number of transactions

initiated by one user per one hour)

Retrieve financial report 100 5

Page 14: Performance testing wreaking balls

14 | P a g e

Consolidate sale data 25 2

Enter quantity of sold on-line products

400 2

The table indicates that business users request every hour 100*5 = 500 financial reports, launch consolidation of sales data 25*2 = 50 times, and enter a quantity of sold on-line products 400*2 = 800 times. Let’s assume that the table specifies a real production workload; appropriately sized and tuned for this workload application will be able to process it according to a service level agreement. But what will happen if we are using for application tuning a load testing framework that generates a workload with downward or upward deviations from the numerical values in the table? In this case our load test does not generate production workload, and under such a skewed workload application will have either idle or insufficient capacity. Tuning application for a skewed workload is technically doable, but because such workload never happens in real life, our tuning efforts are not just useless, they are misleading, they degrade quality of service and might lead to unnecessary expenses if we recommend to purchase additional hardware. While executing performance tuning and capacity planning projects, we have to be well aware of how representative are our manufactured workloads of the real production workloads. Let’s consider presented on Figure 2 results of a load test conducted using LoadRunner, and find out how a workload, generated by this test, associates with a production workload.

Figure 2 Load test results as reported by LoadRunner

Page 15: Performance testing wreaking balls

15 | P a g e

The test includes the following transactions executed in such an order:

1. User logs in 2. Open Planning application 3. Open Web Form 4. Save Form 5. If test time not expired, go to 1, otherwise stop the test

As column “Pass” indicates, each transaction was executed 372 times during test which lasted 7 minute and 37 seconds. That is equal to 372 / 7 min 37 sec = ~ 50 executions of each transaction every minute. Because the number of concurrent users is 36, every user initiated one transaction (50/36)*4 = ~5.6 times per minute. Definitely, a workload generated by this test is not production one but stressful. This workload can be used for testing stability of the system servers, but not for sizing and tuning of production environment. Detailed discussion of application transactional workloads and importance of their specifications for proper sizing and tuning can be found in a Chapter 3 of the book [1].

Wreaking ball 5. Overlooking software bottlenecks

Software bottlenecks are often overlooked causes of poor application performance because, when

present, they limit usage of hardware resources making IT departments satisfied with infrastructure

capacity but system users unhappy with long transaction times.

The effect of software bottlenecks can be demonstrated by the following analogy (see picture below).

Let’s mentally model hardware server by a highway toll station, each server’s CPU by one toll booth,

incoming transactions by the cars, and waiting queue by the cars congregating on a toll plaza.

Page 16: Performance testing wreaking balls

16 | P a g e

The cars arrive at tool plaza from different directions and they are jammed in toll booth approaching

lanes because plaza has too few of them. As soon as a car gets through the jam, it immediately finds an

idle toll booth. Increasing a number of tool booths cannot improve plaza’s throughput; what will do the

trick is adding approaching lanes. More lanes dissolve traffic congestion and keep toll booths busy.

What constitutes application’s “approaching lanes”? They are software tuning parameters regulating

access to hardware resources. The most common parameters are:

Memory space an application can use

Number of threads providing access to CPUs/cores

Number of connection to database

Number of Web sessions with interactive users

Insufficient settings of any of these parameters limit access to available hardware resources keeping

some of them idle and at the same time causing unacceptable increase of transaction time.

Performance engineers have to find optimal values of software tuning parameters to ensure application

performance in sync with business requirements.

1. Leonid Grinshpan. Solving Enterprise Applications Performance Puzzles: Queuing Models to the

Rescue, Wiley-IEEE Press; 1 edition, 2012 (http://tinyurl.com/7hbalv5)

2. VMware document: “Timekeeping in VMware Virtual Machines” http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf

3. Bernd Harzog. “White Paper: Application Performance Management for Virtualized and Cloud based Environments” http://www.virtualizationpractice.com/blog/wp-content/plugins/downloads-manager/upload/APM_for_Virtualized_and_Cloud_Hosted_Applications.pdf

4. Adrian Cockcroft “Utilization is Virtually Useless as a Metric!” http://www.hpts.ws/papers/2007/Cockcroft_CMG06-utilization.pdf