Top Banner
37 The Application Of Modeling And Simulation In Capacity Management Within The ITIL Framework Sonya Rahmani; Otto von der Hoff SRA International, Inc. sonya [email protected]. otto von der [email protected] Abstract. Tightly integrating modeling and simulation techniques into Information Technology Infrastructure Library (ITIL) practices can be one of the driving factors behind a successful and cost- effective capacity management effort for any Information Technology (IT) system. ITIL is a best practices framework for managing IT infrastructure, development and operations. Translating ITIL theory into operational reality can be a challenge. This paper aims to highlight how to best integrate modeling and simulation into an ITIL implementation. For cases where the project team initially has difficulty gaining consensus on investing in modeling and simulation resources, a clear definition for M&S implementation into the ITIL framework, specifically its role in supporting Capacity Management, is critical to gaining the support required to garner these resources. This implementation should also help to clearly define M&S support to the overall system mission. This paper will describe the development of an integrated modeling approach and how best to tie M&S to definitive goals for evaluating system capacity and performance requirements. Specifically the paper will discuss best practices for implementing modeling and simulation into ITIL. These practices hinge on implementing integrated M&S methods that 1) encompass at least two or more predictive modeling techniques, 2) complement each one's respective strengths and weaknesses to support the validation of predicted results, and 3) are tied to the system's performance and workload monitoring efforts. How to structure two forms of modeling: statistical and simUlation in the development of "As Is" and "To Be" efforts will be used to exemplify the integrated M&S methods. The paper will show how these methods can better support the project's overall capacity management efforts. 1. Introduction ITIL is a best practices framework and set of guidelines that define an integrated, process- based approach for managing information technology services. Translating the ITIL theory into operational reality can be a challenge. Methods of implementation and best practices using ITIL principles are out of scope for this paper. Rather, this discussion aims to highlight how best to integrate modeling and simulation into ITIL implementations. A clear definition for M&S implementation into the ITIL framework especially its role in supporting Capacity Management is critical to gaining customer and stakeholder buy-in. In the case example, discussed later in this paper, the team had difficulty gaining consensus on investing in modeling and simulation resources. The benefits of modeling and simulation were unclear to the project's overall mission and as a result there was insufficient modeling resource allocation. However, once M&S was tied directly to the system's Capacity Management activities as part of ITIL, the M&S efforts gained traction. Lessons learned from this case example have been leveraged as part of developing this paper's thesis. The successful implementation of M&S within ITIL will encompass the following characteristics: 1) use of at least two or more predictive modeling techniques, 2) methods complement each one's respective strengths and weaknesses to support the validation of predicted results, and 3) techniques are tied to the system's performance and workload monitoring efforts. 2. ITIL BACKGROUND ITIL encompasses a set of concepts and policies for managing information technology infrastructure, development and operations. ITI L consists of the following five disciplines (illustrated in Figure 1): Service Strategy Service Design https://ntrs.nasa.gov/search.jsp?R=20100012870 2018-05-30T14:50:06+00:00Z
6

The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

Apr 07, 2018

Download

Documents

tranhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

37

The Application Of Modeling And Simulation In CapacityManagement Within The ITIL Framework

Sonya Rahmani; Otto von der HoffSRA International, Inc.

sonya [email protected]. otto von der [email protected]

Abstract. Tightly integrating modeling and simulation techniques into Information TechnologyInfrastructure Library (ITIL) practices can be one of the driving factors behind a successful and cost­effective capacity management effort for any Information Technology (IT) system.

ITIL is a best practices framework for managing IT infrastructure, development and operations.Translating ITIL theory into operational reality can be a challenge. This paper aims to highlight how tobest integrate modeling and simulation into an ITIL implementation.

For cases where the project team initially has difficulty gaining consensus on investing in modeling andsimulation resources, a clear definition for M&S implementation into the ITIL framework, specifically itsrole in supporting Capacity Management, is critical to gaining the support required to garner theseresources. This implementation should also help to clearly define M&S support to the overall systemmission.

This paper will describe the development of an integrated modeling approach and how best to tie M&S todefinitive goals for evaluating system capacity and performance requirements. Specifically the paper willdiscuss best practices for implementing modeling and simulation into ITIL. These practices hinge onimplementing integrated M&S methods that 1) encompass at least two or more predictive modelingtechniques, 2) complement each one's respective strengths and weaknesses to support the validation ofpredicted results, and 3) are tied to the system's performance and workload monitoring efforts. How tostructure two forms of modeling: statistical and simUlation in the development of "As Is" and "To Be"efforts will be used to exemplify the integrated M&S methods. The paper will show how these methodscan better support the project's overall capacity management efforts.

1. Introduction

ITIL is a best practices framework and set ofguidelines that define an integrated, process­based approach for managing informationtechnology services. Translating the ITIL theoryinto operational reality can be a challenge.Methods of implementation and best practicesusing ITIL principles are out of scope for thispaper. Rather, this discussion aims to highlighthow best to integrate modeling and simulation intoITIL implementations.

A clear definition for M&S implementation into theITIL framework especially its role in supportingCapacity Management is critical to gainingcustomer and stakeholder buy-in. In the caseexample, discussed later in this paper, the teamhad difficulty gaining consensus on investing inmodeling and simulation resources. The benefitsof modeling and simulation were unclear to theproject's overall mission and as a result there wasinsufficient modeling resource allocation.

However, once M&S was tied directly to thesystem's Capacity Management activities as partof ITIL, the M&S efforts gained traction. Lessonslearned from this case example have beenleveraged as part of developing this paper's thesis.

The successful implementation of M&S within ITILwill encompass the following characteristics: 1)use of at least two or more predictive modelingtechniques, 2) methods complement each one'srespective strengths and weaknesses to supportthe validation of predicted results, and 3)techniques are tied to the system's performanceand workload monitoring efforts.

2. ITIL BACKGROUND

ITIL encompasses a set of concepts and policiesfor managing information technology infrastructure,development and operations. ITI L consists of thefollowing five disciplines (illustrated in Figure 1):

• Service Strategy• Service Design

https://ntrs.nasa.gov/search.jsp?R=20100012870 2018-05-30T14:50:06+00:00Z

Page 2: The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

38

• Service Transition• Service Operation• Continual Service Improvement

SEIMCESTRAIE61ES SERI/ICE DESl61

·SeMce 'ortoUo • S~Mce Ponro!lo DesICJ1•89Mt. :conomcs • SlMCt C]taIOO~ Managl1l9'1t'IT Fin.rElil ••",gomlrl • SiMes LMI1lll2"ageme'l1• IT Demand Manlgerneri • BJppller Managenent•Semc81 strateg es for • capacity lAanagemenlCuOlnICo Sourel", •A1llilabilib &Se.." Conlnully

S.MI» 1S.M", lanag"""l

SIr...". D..". • Irtormatim SecUItft' ManagelT'Qnt

~ ITll - SERI/ICE TRANSmDIISERI/ICEOPEflAnDIIS

• 30mo. R.Quost••nagoment SOMa I Service• Olange ..anagErMl1l• So"",. "'set&Confiourolion

• ?tent lltan.oerrent Op.m11f1 ,,,,MIlo. I'- lanaOlJTlIlt• ncldentMana0811ent • t<hOWledce ManaJmen1• :)robtars ManagllTllnt •8eMet Retease ~iilnaglrumt

• Access ~Qnagement Continual S'fVicelmprovemen •~PIOj!lVlnt, DeClI'm1iss1.n ondfa sfar

Figure 1: M&S Integration into ITIL Framework

2.1 M&S and the ITIL Framework

The scope of Service Design includes the designof new services, as well as changes andimprovements to existing ones. Service Designconsists of several areas; however, for purposes ofthis discussion, the focus will be on the CapacityManagement area.

2.2 Implementing M&S Using ITIL FrameworkCapacity Management is the discipline thatensures IT infrastructure is provided at the righttime in the right volume at the right price and isused in the most efficient manner. The realsuccess lies in implementing an integrated M&Sapproach that 1) encompasses at least two ormore predictive modeling techniques, 2)complement each techniques' respective strengthsand weaknesses to support the validation ofpredicted results, and 3) is tied to the system'sperformance and workload monitoring efforts.

For system development and deployment projectsthat are still in early operational stages, additionalmodel validation challenges may arise from thelack of a scalable Performance Test environmentor a full system monitoring solution thereby limitingaccess to actual performance data. Using at leasttwo types of modeling techniques can help toovercome this early validation challenge by raisingconfidence in model results where generalagreement is obtained using disparate modelingtechniques. In addition, the combination of M&Smethods can successfully deliver capacity

1 ITIL Open Guide. March 2, 2009. <http://www.itlibrary.org>

forecasting flexibility for both large and small scaleprojects.

Projects with the following characteristics will mostbenefit from an M&S implementation tied to ITI Lprinciples:

• Clear-cut performance analysis goals• Strict Service Level Agreements (SLAs) or

Operational Level Agreements (OLAs)• Enterprise class applications• Volumes experiencing significant growth• Time-based mission critical or real-time

systems• Lack of a full-scale performance test

environment (need for alternative systemevaluation techniques)

• Cost sensitive capacity requirements• Long lead-time resource acquisition

The M&S implementation should be driven bydefinitive goals for evaluating system capacity andbehavior given clearly stated performancerequirements. The M&S implementation teamlikewise needs to be equipped with performanceanalysis and engineering expertise together withtarget system subject matter knowledge.Furthermore, the project's ITIL framework shouldbe tailored to tie M&S to the following ITILactivities: Monitoring, Demand Management,Performance Tuning and Application Sizing.

3. A CASE STUDY

A case study on a federal IT system is used belowas an example to illustrate M&S implementation inITIL's Capacity Management processes. Thefederal system contains over 100 million recordsand processes close to 50 million requestsannually. In addition, the system specifically meetsthe program characteristics in Section 2.2mentioned above.

All these factors underscored the need for a robustand flexible capacity management program. As aresult, a formal Capacity Management Processwas created using the ITI L framework. The ITI Lframework was tailored to support the federalsystem's overall Service Delivery and ServiceSupport functions. In creating the CapacityManagement processes, the project implementedmodeling and simUlation activities as a set ofintegrated activities. Figure 2 illustrates the M&Srelationship central to Capacity Management withinthe program's ITIL process framework:

Page 3: The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

39

Figure 2: M&S Central Relation to CapacityManagement within Enterprise ITIL Framework

As part of this implementation, M&S activities werejoined to several ITIL activities (as describedbelow):

• Monitoring - system performance data (e.g.,resource utilization metrics, response times,throughput, etc.) and workload monitoring(e.g., arrival patterns, transaction volume, etc.)were collected and analyzed from both theProduction and Test environments. M&S usesthese data to build and update the models.

• Demand Management M&S appliesstochastic abstractions and transaction volumemodels to workload impact analyses.

• Performance Tuning - M&S supports projectefforts to identify steps required to handlecurrent and/or new workloads to optimizesystem performance or operational policy.

• Application Sizing M&S supportsidentification of resources needed for a newsystem application or a change to existingapplication. For example, model resultsprovide input into hardware acquisitionsrequired for new system deployments.

One of the most significant factors that contributedto the success of the program's CapacityManagement Process was the tightly integratedM&S implementation within the project's overallITIL framework.

The ITIL framework references four modelingtechniques and includes Trends Analysis,Analytical Modeling, Simulation Models andBaseline Models. This paper classifies bothTrends Analysis and Analytical Modeling as formsof statistical techniques. In addition, BaselineModels are defined in the context of a simulation

model, and defined as a "benchmark" of thecurrent ("As Is") system performance.

This case example illustrates that it is thecombination of both statistical and simulationmodeling techniques that directly support makingthe program's Capacity Management Process asuccess.

3.1 M&S Techniques in Case Example

A combination of statistical and simulation modeltechniques were used to quantify performance,estimate capacity, provide subject matter input,and afford validation to the overall modelingactivities. Statistical techniques included:

• Trending using ARIMA (Auto-RegressiveIntegrated Moving Average) models for timeseries data - these methods were used tosupport characterization of existing systemworkloads and forecasting of future growthpatterns based on historical volumes.

• Analytical model development efforts - thesewere used for several different needs includingderiving mathematical expressions of systemworkloads to characterize workload arrivalpatterns and critical resource capacity models.In addition, historical transaction data werealso analyzed to identify key performancefactors and develop reusable statisticaldescriptions of the system's behavior.

Figure 4 illustrates typical transaction workloadregression trending models for two classes ofsystem transactions. The blue line depictshistorical data whereas the red line represents theregression predictions. The use of ARIMAmodeling techniques suitably captures thetemporal characteristics of workload seasonality aswell as year over year background growth wherepresent.

Class A Transaction Arrival History and ARIMA

~J 3S

l "j."~ "j.L--Y.-.-----=--~-~=_:_:___,__i__...-~..j 15j-----------L-=~..~,.....~-~'~...~-=-"_.____Ji "l:--~--~--~-~------'----~

j iFigure 3: Transaction Arrival Trending Models

Statistical modeling strengths includes the ability toapply relatively simple methods that require shorterturnarounds to getting answers as well as requiringless detailed input data; weaknesses include ahigher risk of being less accurate for predicting

Page 4: The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

40

response times and throughput, loss of predictiveaccuracy where future behavioral patterns varysubstantially relative to historical patterns, and aninability to deal with queuing and resourcecontention analysis.

Simulation modeling is used to gain more accuratepredictive results for response time, throughputand resource consumption. The simulationmodeling techniques included:

• "As Is" simulation model development effortsthat craft simulation models of the existingsystems and validate against the performancein the production environment (baseline modelwhich "benchmarks" the current system).

• "To Be" simulation models that leverage the"As Is" models to develop the anticipatedviews (i.e. future operating conditions).

Simulation modeling strengths include providingthe capabilities for more accurate projections ofsystem throughput and response times in supportof hardware acquisition estimates and architecturevalidation efforts and the ability to predict andanalyze dynamic queuing properties and resourcecontention conditions. Simulation modelingweaknesses can include requiring a longer turnaround time and large volumes of detailed outputperformance data. Valid use of the simulationmodel results will depend on the accuracy of theperformance data used to develop the models.

However, where used in collaboration, the twodiffering modeling techniques can be combined togenerally support a broader set of performanceanalysis needs and introduce flexibility in satisfyingthe project's capacity management objectives.

3.2 Developing the "As Is" Models

Early on, one of the biggest challenges was lack ofproduction monitoring on the legacy systemcomponents. The project had an urgent need forprecise simulation model results; however, most ofthe legacy system lacked any performancemonitoring tools that would correlate workload toresource consumption (e.g., CPU, diskreads/writes, etc). As a result, the integrated M&Smethods were tailored to tackle these challengesby modeling parts of the system as a "black box"and using a combination of statistical andsimulation techniques.

The statistical analysis encompassed evaluatinghistorical performance data (such as responsetime and throughput) to characterize statisticallatency distributions under no queuing conditions.

These techniques were used to combat the lack ofinstrumented performance data on specific piecesof the system. For these components, historicalresponse data were analyzed to identify a timewhere there was little or no queuing in the system.During these periods, the start and finish times ofeach transaction were collected and used to createa histogram illustrating resulting service times.The histogram data were used to build the best fitcurve characterized as a probability distribution.Thereafter, the team used the distribution torepresent the system service time in the simulationmodel. Figures 4 and 5 below illustrate thehistogram of response times under no queuingconditions for Production and the SimulationModel.

HlstoSlr.m ot Component A R.spone Time· No Queue(Production Environment)

800

700 . _.~--- ---- - . -

I800 1'-500 . - - ",-- - .- · -. --

~ '00 ~ I~

! 300 , r r r

200 • - - · --'00 '-; ••0 -

0 . 5 8 7 8 • '0 " 12 13 ,. .. '8 H '8 ,. 20

R••on_ time (_conds'

Figure 4: Histogram of Component AResponse Time - No Queue (Production)

Histognlm or Component A Re$pone Tim. - No Queue(Slmul.t1on Model)

800

700 .

I800 -- - - - - - . - - . - . . .500 - ~ - --~ ~ - -~ r

~400 f-7 I~

I 300 -. · .- .-.200

~I. - - - - r

'00 -.-.0

0 • 5 8 7 8 • '0 " 12 '3 ,. 15 18 17 '8 I. 20

Respon_ Tim. (_conds)

Figure 5: Histogram of Component AResponse Time - No Queue (Simulation Model)

Although, the model simulation was not able toentirely capture the long tail observed inProduction data, the associated statistical datademonstrated that there was little difference inoverall response time between the simulation andproduction data results (see Tables 1 and 2below).

Table 1: Production Statistics

Mean 7.87

Median 8

Mode 7

Page 5: The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

41

Table 2: Simulation Model Statistics

Mean 7.72

Median 8

Mode 7

Standard Deviation 1.65

I Standard Deviation 1 1.79 (viz. CPU, database reads/writes, etc). Thesimulation model was validated under full workloadconditions by comparing results (response time,throughput, and CPU consumption) to theproduction environment. Production changes(e.g., new code deployed, architecture or platformchanges, etc.) could then be quickly rendered inthe simulation environment by leveragingmonitored data against the validated baseline "AsIs" model.

Implementing two different modeling techniquestherefore proved critical to performing capacitymanagement early in the system developmentlifecycle when performance data were not yetavailable. Model accuracy was in turn improvedafter production data became available.

A recent workload addition of several millionrecords exemplifies the important role M&S playedin the Capacity Management process. The M&Steam worked closely with the DemandManagement office to characterize the newworkload's yearly demand based on historicalbehavior of similar historical service request types.The team used statistical regression models topredict future seasonal arrival patterns andadapted existing workload distributions into dailyworkload arrival patterns for the new transactions.Finally, the "As Is" model was simulated with thenew workloads. The team provided analysis onexpected response times, throughput, andresource utilization plus impacts anticipated toexisting workloads.

Figure 7 below illustrates an example of resourceutilization forecasted data.

3.3 Leveraging "As Is" to Forecast Impact ofNew Workloads

The program's Capacity Management forecastingresponsibilities includes regular engagement withthe system stakeholders to identify workloadchanges that may impact the IT system'sperformance and computational resource needs.

Tim. (hour:m In)

Model Simulation -- Production I

Queue begins tobuild in Model

500

I 1500

b 1000

!

After validating service times, the queuing behaviorwas analyzed using a time period starting with anempty queue that gradually built over time. Theservice request arrival times were also assessedfor that period. The simulation model was run withthe statistically derived service and arrival timemodels. Figure 6 depicts the validated simulationresults:

System Component A Queue Size(Simulation Model va Production Environment)

2000 -

Figure 6: Simulation Model vs. ProductionEnvironment Queue Validation

The team compared the model's simulation resultsto production data in order to successfully validateagainst the true system performance. In thismanner, the team was able to leverage twodifferent modeling techniques to successfully buildthe "As Is" simulation model. The statisticalanalysis facilitated service time characterization ina manner that could then be applied in thesimulation models. This would not have beenpossible without these statistical models due tolack of production performance data. In addition, ifwe had used statistical techniques in isolation, wewould not have been able to vary response timeand correlate this to queuing behavior over thecourse of a day.

Simulation models were subsequently updatedonce production monitoring tools had later beendeployed. Collected performance data wereevaluated using analytical techniques to associateresource consumption with the workload executed

Page 6: The Application Of Modeling And Simulation In Capacity ... · The Application Of Modeling And Simulation In Capacity Management Within ... of modeling and simulation were unclear

42

Server A CPU UtIlization

1--One Hour Average - 24 Hour Average I

100

80

~c::: 60

.S!..~ 40 /5

20

0... ~

Performance analysis helped to proactively identifyspecific impacts and areas for operationalimprovement to ensure a smooth transition duringsystem modernization. This was one of the mostsuccessful initiatives on the project demonstratingthe critical insight that can be gleaned from using acombination of modeling techniques.

resource consumption and response times. Forthe latter, the back-end "To Be" system model wasbuilt leveraging the "As Is" simulation modeldescribed in Section 3.2 above. The resultinganalysis assembled a comprehensive picture ofthe new system deployment impacts.

., '\ '1> ...... .::0 ,," ~ ,,'1> ,,-" "-~

Hour

Figure 7: Simulation Model - ForecastedServer A Utilization

Hour

Forecasted System Response Time

1--System Response Time - SLA Service Target I

4. CONCLUSION

In conclusion, the development of an integratedmodeling approach can significantly impact thesuccess of the project's overall capacitymanagement efforts. The M&S implementationshould encompass two or more predictivemodeling techniques, complement each one'srespective strengths and weaknesses to supportthe validation of predicted results, and be tieddirectly to system performance and workloadmonitoring efforts.

The implementation should include evaluation ofthe "As Is" system as well as forecastingtechniques. The models developed in support ofthe latter's analysis should provide estimates forresponse times, throughput, and resourceutilization for the "To Be" system. Furthermore,models should be designed to guide the project'shardware acquisition and architecture validationefforts. From the beginning, the ITIL frameworkshould be tailored to implement M&S withinCapacity Management processes and relate to thefollowing activities: Monitoring, DemandManagement, Performance Tuning and ApplicationSizing activities.

Following these high level guidelines will establishand promote a successful Capacity ManagementProgram for a broad array of enterprise ITapplication systems.

~ 20 -1----------------c

~~

5 10

Figure 8: Simulation Model - ForecastedSystem Response Times

3.4 Leveraging "As Is" to Develop "To Be"Models

Recently, the government system went through amassive modernization effort that upgraded bothits hardware and software components. Thecustomer expressed several concerns on how thiswould impact operations and most specifically SLAadherence. An M&S Tiger Team was thereforetasked to develop simulation models that wouldhelp forecast computational resource requirementsto deliver needed capacity and to justify capitalequipment acquisitions. Of additional concernwere possible impacts to the front-end businessprocesses and wide area network performance.

30

Figure 8 below illustrates an example of forecastedsystem response times. Adherence to SLAresponse times were of critical importance to thecustomer and program.

The M&S Tiger Team's objective was to developan end-to-end analysis solution that would providean impact analysis on all three aspects of thebusiness. On the back-end system, specificquestions were raised on identifying impacts to