ÉCOLE DE TECHNOLOGIE SUPÉRIEURE UNIVERSITÉ DU QUÉBEC LITERATURE REVIEW PRESENTED TO ÉCOLE DE TECHNOLOGIE SUPÉRIEURE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY BY ANDERSON RAVANELLO MODELING END USER PERFORMANCE PERSPECTIVE FOR CLOUD COMPUTING SYSTEMS USING DATA CENTER LOGS ON BIG DATA TECHNOLOGY: A LITERATURE REVIEW MONTREAL, ,
69
Embed
ÉCOLE DE TECHNOLOGIE SUPÉRIEURE UNIVERSITÉ DU QUÉBEC …publicationslist.org/data/a.april/ref-433/dga1005... · 2013-10-28 · Research Presentation 1.1 Motivation Managing Information
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ÉCOLE DE TECHNOLOGIE SUPÉRIEURE UNIVERSITÉ DU QUÉBEC
LITERATURE REVIEW PRESENTED TO ÉCOLE DE TECHNOLOGIE SUPÉRIEURE
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
BY ANDERSON RAVANELLO
MODELING END USER PERFORMANCE PERSPECTIVE FOR CLOUD COMPUTING SYSTEMS USING DATA CENTER LOGS ON BIG DATA TECHNOLOGY: A
LITERATURE REVIEW
MONTREAL, ,
This Creative Commons licence allows readers to dowload this work and share it with others as long as the author is credited. The content of this work can’t be modified in any way or used commercially.
MODELING END USER PERFORMANCE PERSPECTIVE FOR CLOUD
COMPUTING SYSTEMS USING DATA CENTER LOGS ON BIG DATA
TECHNOLOGY: A LITERATURE REVIEW
Anderson RAVANELLO
ABSTRACT
End user performance management for information systems is a complex task that involves engineering and business perspectives that are not always aligned. The software engineering perspective approaches performance management supported by the ISO/IEC 25000 family of standards, whereas the business perspective lacks such a formal standardization. Business perspective, on the other hand, encompasses socio-anthropologic factors such as user training, familiarity to technology, fitness to task and adhesion to technology. All these components are added by the recent adoption, by the companies, of the cloud computing technology that is architecturally more complicated, containing more and inter-related components. This research aims design a measurement model that, employing a performance management framework, is able to measure the end user performance perception of a system when operating on a cloud computing technology. This objective is to be achieved via the analysis of data center logs with the utilization of Big Data technology.
INTRODUCTION
Performance1 measurement of information systems is a challenging research topic.
Measuring quality of information systems has been a concern, for organizations, for a long
time, and, measuring the quality of software, systems and Information Technology (IT)
services is challenging (Juran J, 2010). This is, in part, caused by both of the immaturity of
IT as a science and also because the organizations seldom are able to follow up with the
rapidly evolving field. (HP, 2013)
Software measurement can be conducted from different perspectives: the internal
perspective, measures the quality of how well built and maintainable is the systems and; the
external perspective that is interested on how well the system fits its technological
environment; and its quality when used, being the actual utilization of the system by users in
achieving their particular business goals (ISO/IEC, 2005) These perspectives are documented
in the ISO/IEC 25000 family of standards.
There is a difference between the software engineering perspectives of software performance
measurement and the organizations – or business – perspective to software and IT
performance management. The Software engineering perspective leans toward software
quality, using internal, external and quality in use concepts states that a software created with
high internal quality has a better potential for offering high external quality, as long as it is
well integrated to its operating environment. If this is achieved, it then has a better potential
for achieving a high quality in use. It also adds that if the users are well trained and
comfortable with its use the potential for a quality system will be high. (Stavrinoudis, 2008).
We can see that, for high quality to be achieved, a number of factors must be controlled and
measured to ensure success.
1 Performance is the ability of completing a given task measured against preset known standards of accuracy, completeness, cost, and speed.
2
Alternatively, the business perspective considers software to be a part of the service it
renders to its customers; it is either useful or not to the organization in fulfilling its goals
(Bundschuh, 2008). This perspective, developed by businesses is focused mainly on the end
result of the use of software where end-user satisfaction, where its performance is key: is the
main criteria that the organizations use to state if a software is useful or not. (Glass,
1998)
Software systems performance measurement is currently conducted in many forms. One
popular approach is to use the logs readily available in different operational systems,
applications, computers and IT infrastructure components. Logs are binary files that collect
data from different components in a system and store this data in a file or database for
posterior use. Many commercial, and easily accessible tools, are available for collecting,
analyzing and generating performance dashboards that present technical measures of
different system components that are used by a software (Microsoft, Microsoft Perfmon,
In a cloud computing environment, after an end user accepts the task to be performed and
engages in its execution, what components between the user interface and the data
repositories and processors should be gauged in order to actually measure the end user
performance perspective? How to achieve this goal with the least possible customization,
meaning in the case of this specific research, trying to analyze only data center logs.
Performance measurement frameworks for cloud computing applications (CCA) are still in
early research stages (Bautista, 2012). Adoption of cloud computing technology by the
industry is also in its early stages (Phaphoom, 2013) (US General Service Admnistration,
2010). The study of cloud computing management has the potential for innovative research,
particularly with the utilization of recent large volume data processing technologies such as
Big Data. (Lin, 2010)
Understanding this, the motivation for this research is defined by the opportunity of
designing a performance measurement model for cloud computing applications, with the
utilization of Big Data technology. The model will try to include the software application,
infrastructure, network and end user perspectives of performance. To achieve this goal, it will
9
be necessary to identify performance measurement measures, issued from the data center
logs, required to experiment the subject of ‘end user performance measurement for cloud
computing applications’.
1.2 Problem definition
Measuring end user performance is a complex task. First, internal software performance,
measured through a series of quantitative measure, must be correctly designed and validated
in the most efficient way possible. Secondly, these measures must be applied and yield
results within the decision time specified. Third, the measurement result should be exploited/
interpreted by some form of intelligent mechanism that may either be machine or human in
order to infer significance to the measurement. Finally, end user performance is a mix of the
available resources for performing a set of tasks, user motivation and engagement (Hutchins,
1985) (Davis S. &., 2001) and the factors such as training (Marshall, 2008), perceived
usefulness and ease of use (Davis, 1989), support, anxiety and experience towards
technology (Fagan, 2004) that influence the user’s ability to actually perform the task given
the resources available; it’s the translation of quantitative metrics into qualitative
measurements.
Considering such a challenging scenario, the problem definition of this research can be
summarized as: a proposition of a performance measurement model that reflects the end
user experience for an application operating on a cloud computing environment. This
model will use data currently available from data center logs and, because of their large size,
will require use Big Data technology for its capture and experimentation.
1.3 Research question
Given the ample opportunities for discovery in the field of software performance
measurement from an end user perspective using cloud computing technology, this
research focuses on proposition of a performance measurement model, considering
two main perspectives: 1) it is possible to measure and analyze performance of an
10
application operating on the cloud, from an end user perspective, using only data
center logs data; 2) how do we integrate the many internal quality measures reported
in these logs, into measurement model, that would reflect the performance as
perceived by end users. Finally, this measurement model would be explored and
validated using different case studies.
Based on these perspectives, the main research question can be formulated as: How
can end user performance of an application be measured in a cloud computing
environment?
This question is declined in the following specific research questions:
- What defines a cloud computing environment?
- What influences end user performance perspective measurement in a cloud
computing environment?
- Which performance measures, found in existing data center logs, best relate to the
end user experience of a specific application?
- Which performance measurement framework can be used for the creation of a
performance model that represents the end user performance of an application that
uses cloud computing technology?
1.4 Methodology
In order to answer the research questions outlined in section 1.3, the author resorts to the
Basili framework (Basili, Selby, & Hutchens, 1986) used for experimentation in software
engineering, to plan the research and organize its four main research activities phases: 1)
definition, 2) planning, 3) development and operation and 4) interpretation as presented in
the following sections: 1.4.1 to 1.4.4.
11
1.4.1 Definition of the research
This phase, presented in table 1.1, identifies the research motivation, objective, goal and
users.
Table 1.1 - Research Definition
Motivation Objective Goal Users The design of a performance measurement model that reflects the end user experience of an application operating on a cloud computing environment using only data that is currently available from the data center logs
. Define/clarify the notion of end user performance perspective . Define/Clarify the cloud computing technology . Design a measurement model and its toolset to support the infrastructure specialist in proactively managing the cloud infrastructure to identify the performance issues from the end user’s perspective. . Identify the data center logs direct measures that best reflect the end user perspective of an application operating in a cloud;
Design a performance measurement model and its toolset that is generalizable and is still is capable of representing the end user experience of an application operating in a cloud.
Students, researchers, IT professionals and managers.
This next planning phase helps on determining the research problem as well as the specific
research activities that have to be achieved in order to answer the research questions.
1.4.2 Planning
The planning phase contains the description of any deliverables that are required for solving
the problem and answering the research questions. For supporting these findings, the required
literature reviews are also described. Table 1.2 contains the appropriate inputs and outputs of
each.
12
Table 1.2 – Research planning
Planning Steps Inputs Outputs Sate of the art of the concept of user perception of application software performance
Literature review of : . Software Engineering Performance . End user expectation and perception of application performance . End user performance perception, and other psychosocial entities that affect end user performance perception
-Literature review of the state of the art of the end-user performance standards, models, techniques and methods; -State of the art of the end user performance perspective for cloud based computing systems
State of the art of Cloud computing and BigData technology for data center logs processing
Literature review of : . Cloud computing technology, components, types and utilization. . Existing Data Center logs data analysis . Hadoop and Hbase project documentation . REAP project data
- Literature review of existing data center logs use and techniques for its analysis, open source BigData technology and corroboration of the Cloud computing syllabus by matching of components with the experiment’s infrastructure -First publication: Paper on how to measure performance as perceived by the end users that uses cloud applications
1.4.3 Development of theory and experimentation
The development phase of the research presents the activities were new knowledge and
theories are created, the definition and preparation for the experimentation and validations as
well as the main components that foster the answer to the main research question, as
presented in table 1.4.3.
Table 1.3 - Research Development and Operation
13
Development Validation Analysis Design of a measurement model and method specific to the research’s objective
. Identification of individual log measures that best reflect the end-user perspective . Design of the measurement method associated with these measures
Fitness to the framework . Applicability and comparison with commercial performance measurement tools
Identification of an experimentation and validation for the model
. Identification of analysis techniques and feedback mechanism to adequate best with an end-user perspective
. Fitness to framework
. Applicability and comparison with similar approaches according to the literature review . Definition of the scope of the experiments
Application of the performance measurement method rules (framework) over different laboratory test scenarios
. Validation of the measurement method application . Secondary performance measurement data
. Validation of the methodology
. Discovery of structural equations that could explain performance on the studied environment. . Application of big data exploration techniques on the collected data (Paper on application of big data over IT performance indexes)
Results verification and evaluation
. Measurement result analysis
. Adequacy and overall fitness of data to the frameworks . Results of the end user performance perception instruments
. Validation of the framework
. Addition to the framework with the end user perspective Publication of the findings
1.4.4 Interpretation of the results
This section contains the information required for properly understanding the methods, use
cases and scenarios that are explored during the research, as well as providing grounds for
the future research that could be conducted.
Table 1.4 – Interpretation of the results
14
Interpretation Context Extrapolation of results Future research This research is conducted by evaluating the data contained datacenter performance log files; these files might be generated by physical or virtual machines in shared or dedicated cloud workspace. Application clusters are assigned according to the specific use cases tested, for example “all Outlook 2010 users”.
. Different case studies, for larger audiences. . Different sets of measurement variables . Discovery of symbiotic applications in shared workspaces . Machine learning approaches for dynamic work distribution based on end user performance measurement fluctuations
. Can machine learning prevent degradation and resource misallocation? . Is it possible to locate clusters of symbiotic applications (applications that consume different sets of resources, thus optimizing resource utilization)? . Is it possible to locate clusters of symbiotic users? . Can machine learning dynamically assign workloads according to symbiotic profiles?
CHAPTER 2
Literature review
This section presents the topics of end user performance management (xx13) and cloud
computing. The first topic is approached from two perspectives, 1) the software engineering
perspective and 2) the business perspective. Software quality models have long been
Performance Efficiency performance relative to the amount of resources used under
stated conditions • Time-‐behavior degree to which the response and processing times and
throughput rates of a product or system, when performing its functions, meet requirements (benchmark)
• Resource utilization degree to which the amounts and types of resources used by a product or system when performing its functions meet requirements
• Capacity degree to which the maximum limits of a product or system parameter meet requirements
Functional Suitability
degree to which a product or system provides functions that meet stated and implied needs when used under specified conditions
• Functional completeness degree to which the set of functions covers all the specified tasks and user objectives
• Functional correctness degree to which a product or system provides the correct results with the needed degree of precision
• Functional appropriateness degree to which the functions facilitate the accomplishment of specified tasks and objectives. As an example : a user is only presented with the necessary steps to complete a task, excluding any unnecessary steps
Compatibility degree to which a product, system or component can exchange
information with other products, systems or components, and/or perform its required functions, while sharing the same hardware or software environment
• Coexistence degree to which a product can perform its required functions efficiently while sharing a common environment and resources with other products, without detrimental impact on any other product
• Interoperability degree to which two or more systems, products or components can exchange information and use the information that has been exchanged
Usability degree to which a product or system can be used by specified
users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use
• Appropriateness recognisability
degree to which users can recognize whether a product or system is appropriate for their needs. Appropriateness recognizability will depend on the ability to recognize the appropriateness of the product or system’s functions from initial impressions of the product or system and/or any associated documentation.
59
• Learnability degree to which a product or system can be used by specified users to achieve specified goals of learning to use the product or system with effectiveness, efficiency, freedom from risk and satisfaction in a specified context of use
• Operability degree to which a product or system has attributes that make it easy to operate and control
• User error protection degree to which a system protects users against making errors
• User interface aesthetics degree to which a user interface enables pleasing and satisfying interaction for the user
• Accessibility degree to which a product or system can be used by people with the widest range of characteristics and capabilities to achieve a specified goal in a specified context of use
Reliability degree to which a system, product or component performs
specified functions under specified conditions for a specified period of time
• Maturity degree to which a system meets needs for reliability under normal operation
• Availability degree to which a system, product or component is operational and accessible when required for use
• Fault Tolerance degree to which a system, product or component operates as intended despite the presence of hardware or software faults
• Recoverability degree to which, in the event of an interruption or a failure, a product or system can recover the data directly affected and re-‐establish the desired state of the system
Security degree to which a product or system protects information and
data so that persons or other products or systems have the degree of data access appropriate to their types and levels of authorization
• Confidentiality degree to which a product or system ensures that data are accessible only to those authorized to have access
• Integrity degree to which a system, product or component prevents unauthorized access to, or modification of, computer programs or data.
• Non-‐repudiation degree to which actions or events can be proven to have taken place, so that the events or actions cannot be repudiated later
• Accountability degree to which the actions of an entity can be traced uniquely to the entity
• Authenticity degree to which the identity of a subject or resource can be proved to be the one claimed
Maintainability degree of effectiveness and efficiency with which a product or
system can be modified by the intended maintainers • Modularity degree to which a system or computer program is composed of
discrete components such that a change to one component has minimal impact on other components
60
• Reusability degree to which an asset can be used in more than one system, or in building other assets
• Analyzability degree of effectiveness and efficiency with which it is possible to assess the impact on a product or system of an intended change to one or more of its parts, or to diagnose a product for deficiencies or causes of failures, or to identify parts to be
modified • Modifiability degree to which a product or system can be effectively and
efficiently modified without introducing defects or degrading existing product quality
• Testability degree of effectiveness and efficiency with which test criteria can be established for a system, product or component and tests can be performed to determine whether those criteria have been met
Portability degree of effectiveness and efficiency with which a system,
product or component can be transferred from one hardware, software or other operational or usage environment to another
• Adaptability degree to which a product or system can effectively and efficiently be adapted for different or evolving hardware, software or other operational or usage environments
• Installability degree of effectiveness and efficiency with which a product or system can be successfully installed and/or uninstalled in a specified environment
• Replaceability degree to which a product can be replaced by another specified software product for the same purpose in the same environment
61
Annex 2 – (to be worked)
Characteristic Subdivision Primitive
Effectiveness Task completion Proportion of tasks completed correctly.
Effectiveness Task effectiveness Proportion of the goals of the tasks achieved
Effectiveness Error frequency Frequency of User errors
Efficiency Time efficiency Time for task completion against a desired
target
Efficiency Relative task time Relative time for task completion in comparing
a regular user to an expert
Efficiency Task efficiency Percentage of goals achieved per unit of time
Efficiency Relative task efficiency Goals achieved per unit of time against desired