MASTER'S THESIS - Karan Mitrakaranmitra.me/.../2015/08/CloudSimDisk...CloudSim.pdf · MASTER'S THESIS CloudSimDisk Energy-Aware Storage Simulation in CloudSim Baptiste Louis 2015

MASTER'S THESIS

CloudSimDiskEnergy-Aware Storage Simulation in CloudSim

Baptiste Louis2015

Master of Science (120 credits)Computer Science and Engineering

Luleå University of TechnologyDepartment of Computer Science, Electrical and Space Engineering

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

PERCCOM Master Program

Master's Thesis in

PERvasive Computing & COMmunicationsfor sustainable development

Baptiste Louis

CLOUDSIMDISK: ENERGY-AWARE STORAGESIMULATION IN CLOUDSIM

2015

Supervisors: Professor Christer Åhlund - Luleå University of Technology

Doctor Karan Mitra - Luleå University of Technology

Doctor Saguna Saguna -Luleå University of Technology

Examiners: Assoc. Professor Karl Andersson - Luleå University of Technology

Professor Eric Rondeau - University of Lorraine

Professor Jari Porras - Lappeeranta University of Technology

This thesis is prepared as part of an European Erasmus Mundus program

PERCCOM - Pervasive Computing & COMmunications for sustainable develop-

ment.

This thesis has been accepted by partner institutions of the consortium (cf. UDL-

DAJ, no1524, 2012 PERCCOM agreement).

Successful defense of this thesis is obligatory for graduation with the following na-

tional diplomas:

• Master in Complex Systems Engineering (University of Lorraine);

• Master of Science in Technology (Lappeenranta University of Technology);

• Degree of Master of Science (120 credits) - Major: Computer Science and

Engineering; Specialisation: Pervasive Computing and Communications for

Sustainable Development (Luleå University of Technology).

ABSTRACT

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

PERCCOM Master Program

Baptiste Louis

CloudSimDisk: Energy-Aware Storage Simulation in CloudSim

Master's Thesis - 2015.

99 pages, 51 �gures, 11 tables, and 4 appendices.

Keywords: Modelling and Simulation, Energy Awareness, CloudSim, Storage, Cloud

Computing.

Cloud Computing paradigm is continually evolving, and with it, the size and the

complexity of its infrastructure. Assessing the performance of a Cloud environment

is an essential but strenuous task. Modeling and simulation tools have proved their

usefulness and powerfulness to deal with this issue. This master thesis work con-

tributes to the development of the widely used cloud simulator CloudSim and pro-

poses CloudSimDisk, a module for modeling and simulation of energy-aware storage

in CloudSim. As a starting point, a review of Cloud simulators has been conducted

and hard disk drive technology has been studied in detail. Furthermore, CloudSim

has been identi�ed as the most popular and sophisticated discrete event Cloud simu-

lator. Thus, CloudSimDisk module has been developed as an extension of CloudSim

v3.0.3. The source code has been published for the research community. The simula-

tion results proved to be in accordance with the analytic models, and the scalability

of the module has been presented for further development.

ACKNOWLEDGMENTS

I would like to express my gratitude to my supervisor Professor Christer Åhlund for

the con�dence that he has placed in me and for his continuous guidance during this

research work. It is my honor to accomplish this master thesis under his supervision.

As well, I would like to thank Doctor Karan Mitra for his support and his valuable

knowledge in term of cloud computing and research work.

Thanks to Doctor Saguna for her advices and her daily dose of joviality.

Thanks to my PERCCOM classmates, especially Rohan Nanda and Khoi Ngo who

were with me at Skellefteå.

Thanks to Karl Anderson and Robert Brannstrom for their presence, their accessi-

bility and their assistance during my thesis work.

Thanks to Rodrigo Calheiros (Melbourne University) for his feedbacks on my im-

plementation.

Thanks to Eric Rondeau, PERCCOM coordinator, and all the PERCCOM team

for these two years of Master.

Skellefteå, May 26, 2015

Baptiste Louis

5

CONTENTS

1 INTRODUCTION 11

1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Data Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.2 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.3 Cloud Simulators . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Research Challenges and Objective . . . . . . . . . . . . . . . . . . . 19

1.4 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 BACKGROUND AND RELATED WORK 21

2.1 Energy E�cient Storage in Cloud Environment . . . . . . . . . . . . 21

2.2 Cloud Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.2 CloudSim: a Framework for Modeling and Simulation of Cloud

Computing Infrastructures and Services . . . . . . . . . . . . . 28

2.2.3 CloudSim and Storage Modeling . . . . . . . . . . . . . . . . . 30

2.3 CloudSim Background . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.1 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.2 Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.3 Events Passing . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3.4 Future Queue and Deferred Queue . . . . . . . . . . . . . . . 36

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 CLOUDSIMDISK: ENERGY-AWARE STORAGE SIMULATION

IN CLOUDSIM 38

3.1 Module Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 CloudSimDisk Module . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2.1 HDD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.2 HDD Power Model . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.3 Data Cloudlet . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.4 Data Center Persistent Storage . . . . . . . . . . . . . . . . . 42

3.3 Execution Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Packages Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5 Energy-Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.6 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.6.1 HDD Characteristics . . . . . . . . . . . . . . . . . . . . . . . 55

6

3.6.2 HDD Power Modes . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6.3 Randomized Characteristics . . . . . . . . . . . . . . . . . . . 56

3.6.4 Data Center Persistent Storage Management . . . . . . . . . . 57

3.6.5 Broker Request Arrival Distribution . . . . . . . . . . . . . . . 58

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 RESULTS 60

4.1 Inputs and Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.1 Input Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.2 Simulation Outputs . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2.1 Request Arrival Distribution . . . . . . . . . . . . . . . . . . . 64

4.2.2 Sequential Processing . . . . . . . . . . . . . . . . . . . . . . 65

4.2.3 Seek Time Randomness . . . . . . . . . . . . . . . . . . . . . 69

4.2.4 Rotation Latency Randomness . . . . . . . . . . . . . . . . . . 70

4.2.5 Data Transfer Time Variation . . . . . . . . . . . . . . . . . . 71

4.2.6 Seek Time, Rotation Latency and Data Transfer Time Com-

pared with Energy Consumption per Transaction . . . . . . . 72

4.2.7 Persistent Storage Energy Consumption . . . . . . . . . . . . 74

4.2.8 Energy Consumption and File Sizes . . . . . . . . . . . . . . . 75

4.2.9 Disk Array Management . . . . . . . . . . . . . . . . . . . . . 78

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5 CONCLUSIONS AND FUTURE WORK 82

5.1 Thesis Contribution: CloudSimDisk . . . . . . . . . . . . . . . . . . . 82

5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

REFERENCES 83

APPENDICES

Appendix 1: CloudSimDisk Source Code

Appendix 2: Run a First Example

Appendix 3: Hard Drive Disk Model

Appendix 4: Hard Drive Disk Power Model

7

List of Figures

1 The Vostok ice core data [6]. . . . . . . . . . . . . . . . . . . . . . . . 11

2 Cloud service delivery models: SaaS, PaaS, and IaaS, based on [29]. . 15

3 Deployment models of Cloud solutions. . . . . . . . . . . . . . . . . . 16

4 Free Cooling in Facebook data center, Luleå (Sweden) [37]. . . . . . . 17

5 Architectural details of the three layers of MDCSim simulator [44]. . . 26

6 Architecture of the GreenCloud simulation environment [45]. . . . . . 27

7 Layered CloudSim architecture [43]. . . . . . . . . . . . . . . . . . . . 29

8 CloudSim and StorageCloudSim architecture overview [83]. . . . . . 31

9 Concept of HDD processing element in CloudSimEx [86]. . . . . . . 32

10 CloudSim architecture with the new data Clouds layer [82]. . . . . . 33

11 CloudSim high level modeling. . . . . . . . . . . . . . . . . . . . . . . 34

12 CloudSim Life Cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

13 Example of queue management during three clock ticks. . . . . . . . . 36

14 Diagram of model parameters [91]. . . . . . . . . . . . . . . . . . . . 39

15 CloudSimDisk Cloudlet constructor. . . . . . . . . . . . . . . . . . . . 42

16 Event passing sequence diagram for "Basic Example 1", 3 Cloudlets. . 43

17 Event passing sequence diagram for "Wikipedia Example", 3 Cloudlets. 44

18 Process when datacenter receives a CLOUDLET_SUBMIT event. . . 45

19 HDD internal process of adding a �le. . . . . . . . . . . . . . . . . . . 47

20 Example of storage management with one HDD: (a) graphic; (b) code. 49

21 CloudSimDisk: 27 classes organized in 8 packages. . . . . . . . . . . . 50

22 Class Diagram of CloudSimDisk extension. . . . . . . . . . . . . . . . 52

23 Example of idle intervals history management. . . . . . . . . . . . . . 54

24 Example of console information output. . . . . . . . . . . . . . . . . 62

25 Wikipedia workload distribution. . . . . . . . . . . . . . . . . . . . . 64

26 Simple example distribution. . . . . . . . . . . . . . . . . . . . . . . 65

27 Sequential processing of requests illustrated. . . . . . . . . . . . . . . 65

28 Sequential processing of requests part 1.1. . . . . . . . . . . . . . . . 66








36 Seek Time distribution. . . . . . . . . . . . . . . . . . . . . . . . . . 69

8

37 Rotation Latency distribution. . . . . . . . . . . . . . . . . . . . . . 70

38 Transfer times and �le sizes. . . . . . . . . . . . . . . . . . . . . . . 71

39 The transaction time: sum of the seek time, the rotation latency and

the transfer time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

40 Energy consumption per transaction compared with: (a) Seek Time;

(b) Rotation Latency; (c) Data Transfer Time. . . . . . . . . . . . . . 73

41 "MyExampleWikipedia1", 5000 requests - Final result. . . . . . . . . 74

42 Energy consumed per operation Eoperation with �le sizes of (a) 1 MB,

(b) 10 MB, (c) 100 MB and (d) 1000 MB. . . . . . . . . . . . . . . . 77

43 Disk array management algorithm: (a) FIRST-FOUND; (b) ROUND-

ROBIN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

44 CloudSimDisk simulation with 1 HDD, 2 HDDs and 3 HDDs in the

persistent storage using Round Robin algorithm management. . . . . 80

A1.1 CloudSimDisk home page on GitHub. . . . . . . . . . . . . . . . . . . 93

A1.2 CloudSim core simulation engine on CloudSimDisk GitHub repository. 94

A2.1 CloudSimDisk console output for "MyExample0". . . . . . . . . . . . 95

A3.1 CloudSimDisk HDD model of the Seagate Enterprise NAS 6TB (Ref:

ST6000VN0001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

A3.2 The abstract class extended by all Hard Disk Drive models. . . . . . 97

A4.1 CloudSimDisk HDD power model of the HGST Ultrastar 900GB (Ref:

HUC109090CSS600). . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

A4.2 The abstract class extended by all Hard Disk Drive power models. . 99

9

List of Tables

1 Comparison of Cloud Computing Simulators (alphabetic order). . . . 25

2 Entities IDs assignment. . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 CloudSimDisk HDD characteristics. . . . . . . . . . . . . . . . . . . . 40

4 CloudSimDisk HDD power mode. . . . . . . . . . . . . . . . . . . . . 41

5 Trace of HDD values related to Figure 20. . . . . . . . . . . . . . . . 48

6 "IdleIntervalsHistory" values related to Figure 23. . . . . . . . . . . . 54

7 Sets the request arrival distribution in MyPowerDatacenterBroker. . . 58

8 Input parameters for CloudSimDisk simulations. . . . . . . . . . . . . 61

9 Excel values output. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

10 Common input parameters for ROUND-ROBIN experiments. . . . . . 79

11 Maximum "waiting queue" length for ROUND-ROBIN examples. . . 81

10

ABBREVIATIONS AND SYMBOLS

(Alphabetic order)

BAU Business As Usual

CERN European Organization for Nuclear Research

CUE Carbon Usage E�ectiveness

DAS Direct Attached Storage

GeSI Global e-Sustainability Initiative

HDD Hard Drive Disk

IaaS Infrastructure-as-a-Service

IDC International Data Corporation

LHC Large Hardon Collider

LTO Linear Tape-Open

NaaS Network-as-a-Service

NAS Network Attached Storage

NIST National Institute of Standards and Technology

PaaS Platform-as-a-Service

PUE Power Usage E�ectiveness

RPM Rotation Per Minute

SaaS Software-as-a-Service

SAN Storage Arena Network

SLA Service Level Agreement

SSD Solid State Drive

StaaS Storage-as-a-Service

UPS Uninterruptible Power Supply

WUE Water Usage E�ectiveness

11

1 INTRODUCTION

This chapter aims to introduce the whole thesis. It starts with a global context, then

it describes the fundamentals concepts, and next, it presents the research challenges,

objective and the thesis contribution. At last, the thesis outline is established.

1.1 Context

In the past decades, the "Digital Revolution"1, centered around information, trans-

formed the way we communicate, we produce, we think, and marked the beginning

of the "Information Age", characterized by the development of Information and

Communication Technologies (ICTs) [1,2]. According to [3], these technologies were

responsible for more than 8% of the global electricity consumption (168GW) in 2008,

and are predicted to be 14% in 2020.

In parallel, a serious preoccupation arises around global warming. In 1979, the Na-

tional Academy of Sciences (NAS) [4] estimated an increase in the average global

temperature of about 3 degrees Celsius in the coming decades. The Vostok ice core

drilling (1998), later con�rmed by the European Project for Ice Coring in Antarctica

(EPICA, 2004), empowered scientists to correlate the concentration of CO2 and the

evolution of the surface temperature during the last hundreds millenniums (see Fig-

ure 1). Present measurements are unprecedented and indicate a signi�cant human

perturbation since the beginning of industrial revolution [5].

Figure 1. The Vostok ice core data [6].

1also known as the "Third Industrial Revolution"

12

In 2008, the Global e-Sustainability Initiative (GeSI) published the SMART 2020

report [7] questioning the impact of ICTs on the human environment. Two answers

have been identi�ed: �rst, the ICT sector has a clear impact on excessive CO2

emissions and it is expected to increase by a factor of 2.7 for 2020; second, 15% of

the total Business As Usual (BAU) emissions, predicted for 2020, can be avoided

thanks to ICTs, for a total of 600 billion euros savings.

Additionally in this report, data centers have been identi�ed as the "fastest-growing

contributor to the ICT sector's carbon footprint", attributable to the "vast amount

of data ... stored" and related to the "Information Age".

1.2 Fundamentals

This section gives a summary of the Internet of Things and Big Data paradigms,

and the fundamentals of Cloud-Computing technologies. Cloud simulation, which

is a central part of this work, is introduced as well.

1.2.1 Data Growth

The generation of data is growing at an astonishing rate. According to a Garner Sur-

vey [8], data growth is the biggest "data center hardware infrastructure" challenge

for large enterprises and 30% of the respondents plan to build a new data center

in the year coming to overcome this growth. An International Data Corporation

(IDC)'s study [9] claims that the "digital universe" will reach an unthinkable 44

zettabytes2 by 2020, giving a growth factor of 3003 compare to 2005. Another IDC's

study [10] states that 75% of the "digital universe" is produced by individuals, while

investments of enterprises in IT infrastructure increased by 50% between 2005 and

2011, and is predicted to grow by 40% from 2012 to 2020, with a main focus on

"storage management", "security", "big data", and "cloud computing".

Drivers of this multiplication include di�erent factors. The switch from analog to

digital technologies had its impact in the last decades as demonstrated by [11], which

estimated that 94% of storage technologies were in digital format in 2007 compare

to 0.8% in 1986. Subsequently, Atzori et al. [12] observed a continuous decreasing

price of digital storage and concluded that once data is generated, it is most likely

that it will be stored inde�nitely. Three years later, SINTEF [13], the largest in-

21 zettabyte = 1015 megabytes3Between 2006 and 2011, the growth factor was "only" 9.

13

dependent research organization in Scandinavia, evaluated that 90% of the word's

data has been generated over the last two years, con�rming the expected growth.

The emerging paradigm called Internet of Things (IoT) has a signi�cant impact on

data growth's predictions for the next decade. De�ned as the fourth major growth

spurt of digital data by IDC [14], IoT was approaching 20 billion connected objects

in 2014. Further, new business and research opportunities created by IoT, such as

real-time information on "mission-critical systems", "environmental monitoring" or

"product tracking" along their life cycle, will increase the amount of communicating

things to 30 billion by 2020.

Data was getting exceedingly big before IoT paradigm. According to [15], the term

"Big Data" appears in the ACM digital library in October 1997 and refers to a

problem where a set of data do not �t in the local disk memory. Today, the de�nition

of this problem has not really changed but includes the idea of complexity such as

"resource contention and interference", "heterogeneous data �ows" and "uncertain

resource needs" [16].

In 2012, Snijders et al. [17] de�ned Big Data as "a loosely de�ned term used to

describe data sets so large and complex that they become awkward to work with using

standard statistical software". More recently, a similar work have been conducted

by De Mauro et al. [18] to provide a "consensual" and "thorough" de�nition for

Big Data and proposed to de�ne the term as "Big Data represents the Information

assets characterized by such a High Volume, Velocity and Variety to require speci�c

Technology and Analytical Methods for its transformation into Value".

This last de�nition retains the 3Vs attributes introduce by Gartner's analyst Doug

Laney back in 2001 [19]:

• Volume: refers to the amount of data expressed in bytes and preceded by a

speci�c pre�x to represent very large amount of data.

Examples: petabyte, terabyte, and zettabyte.

• Variety: refers to the di�erent forms of data, mainly due to diverse data

source and a�ecting directly the complexity of processing this data.

Examples: pure text, photo, audio, video, web, and raw data.

• Velocity: refers to the rate at which data changes hands in a network.

Examples: trading, social network, and real time streams.

14

Although the "3Vs" model is still widely used today, new "V" dimensions have been

introduced [20] like Veracity as an indicator of meaningfulness to the problem being

analyzed, or Variability as an indicator of inconsistency of the data at times.

As an example of rapid growth of unstructured data, more than 500 000 gigabytes

of data was daily uploaded to Facebook databases in 2012 [21]. Today, YouTube

users upload 300 hours of new video every minute [22], compare to an upload rate

of 60 hours in 2012 and 6 hours in 2007 [23]. Concerning the digitalization of

information, the U.S. Library of Congress was storing in April 2011 not less than 235

terabytes of data [24]. In the atomic area, the Large Hadron Collider (LHC), built

by the European Organization for Nuclear Research (CERN), produced roughly

30 petabytes of data per year and will produce 110 petabytes a year after some

updates [25].

The near future will stretch the three dimensions of Big Data at their extreme.

Its "3Vs" require a speci�c environment with dedicated infrastructure and tools

enabling enterprises and scientists to store and analyze this incommodious data

complexity. As explained by [26, 27], Cloud Computing technology is become a

powerful environment capable of dealing with Big Data challenges.

1.2.2 Cloud Computing

The National Institute of Standards and Technology (NIST) has de�ned Cloud Com-

puting as "a model for enabling ubiquitous, convenient, on-demand network access

to a shared pool of con�gurable computing resources [...] that can be rapidly provi-

sioned and released with minimal management e�ort or service provider interaction"

where "computing resources" refers both to the IT infrastructure (servers, storage,

and networks components) and the abstract application layer (virtualization, inter-

faces, and software) [28]. The wide range of services provided by Cloud Computing

has been divided into three main cloud service delivery models (see Figure 2):

• Software-as-a-Service (SaaS): an application running on the Cloud and ac-

cessible by the end user through an interface like a web browser.

• Platform-as-a-Service (PaaS): a con�gurable platform integrating both hard-

ware and software tools for web application development.

• Infrastructure-as-a-Service (IaaS): a virtualized pool of computing resources

physically located in the cloud and manageable over the Internet.

15

Figure 2. Cloud service delivery models: SaaS, PaaS, and IaaS, based on [29].

Yet, the list is not exhaustive as many other "XaaS" have been coined such as

STorage-as-a-Service (StaaS) and Network-as-a-Service (NaaS) [30].

The Cloud Computing IT infrastructure is amassed in large data centers. It needs

to be managed, maintained, upgraded, and consumes huge amount of energy every

day, leading to major costs for its owner. Therefore, the deployment of a Cloud

infrastructure is a strategic move that need to be evaluated beforehand. Currently,

four Cloud infrastructure models have been de�ned (see Figure 3):

• Private Cloud: the infrastructure is used by a single organization but can

be managed and/or owned by a third party.

• Public Cloud: the infrastructure is available to the general public over the

Internet o�ering limited security and variable performances.

• Community Cloud: the infrastructure is shared by several organizations but

commonly managed internally or by a third party.

• Hybrid Cloud: the infrastructure is a combination of Cloud models that

remains a unique entity, o�ering more �exibility at the expense of complexity.

Each model has di�erent degree of security, complexity and management. Thus, the

model choice will be done according to requirements or needs of the organization.

16

Figure 3. Deployment models of Cloud solutions.

Regardless of the model, a Cloud Computing solution provides a great trade-o�

between convenience, cost and �exibility. From the hardware side, [31] identi�es

three new facets conferred by Cloud Computing:

• appearance of unlimited available resources procured by the rapid elasticity of

the system outward and inward;

• opportunity to scale up or down your infrastructure dynamically and auto-

matically, according to your personal or business needs over the time, and

consequently to pay for the real usage ("On-demand self-service");

• possibility to access "ready-to-use" computing environment and eliminating

the up-front commitment by Cloud users.

Such a degree of convenience is achieved through virtualization of hardware resources

into multiple virtual machines and virtual storage. While virtualization improves

the scalability of the cloud system and reduces the number of physical equipment,

it requires powerful resources to not a�ect the performances of the system since one

physical machine might handle multiple virtual ones.

At the storage level, virtualization backwards speci�c Direct Attached Storage (DAS)

resources to a pool of storage, accessible through an internal network (Network At-

tached Storage (NAS) and Storage Arena Network (SAN)).

17

Whereas the storage technologies have changed dramatically in the past few years,

there is today a "performance gap" [32] between the CPUs speed of servers and

the related memory or storage subsystem. Due to di�erent evolution history, stor-

age elements became a bottleneck in the computing system, a�ecting performances

when data is requested by processor tasks. The Solid State Drive (SSD) elimi-

nates the binding mechanical parts of Hard Disk Drives (HDDs), thus reduces the

overall access time and improves its performances. Additionally, this technology is

more reliable and consumes less energy, which makes the SSD technology "greener".

However, its cost-per-bit is still signi�cantly higher than HDD, which makes SDD

solution cost-prohibitive and limits its utilization to critical I/O applications. Thus,

HDDs storage resources are still widely present in modern data centers.

Concerning energy consumption and sustainability, Cloud Computing has a signi�-

cant impact on the environment. According to [33�35], running physical equipment

represents about 40% to 55% of the energy bill in a data center. Then, 30% to 45%

is used to cool the equipment due to thermal characteristic of electronic circuitry

and 10% to 15% is consumed by Uninterruptible Power Supplies (UPSs), lighting

and other. A report from the Natural Resources Defense Council (NRDC) [36] eval-

uated in 2013 that the U.S. data centers were using the equivalence of 34 power

plants, each of them generating 500 megawatts of electricity. The resulting CO2

emissions were close to 100 million Tons. Recently, Big companies like Facebook,

Google and Apple decided to build their data centers in the Nord part of Europe,

for di�erent reasons including electricity price, social stability but mainly for the

adequate climate, enabling to use outside air to cool down IT equipment and reduce

the overall data center energy consumption (see Figure 4).

Figure 4. Free Cooling in Facebook data center, Luleå (Sweden) [37].

18

However, even though energy consumption is an important factor of the Cloud

Computing environmental impact, it is not proportional to CO2 emissions. Cloud

providers are looking for alternative source of energy to reduce their carbon dioxide

emanations. As a good example, all the equipment inside Facebook Luleå data

center is powered by 100% renewable energy locally generated by hydroelectric power

plans. In another way, Google is investing in carbon o�sets projects and buy clean

power from speci�c producer in order to balance emissions from their data centers.

1.2.3 Cloud Simulators

Cloud simulator tools empowered researchers, engineers and developers to have con-

trol on all the layers of their Cloud environment, in a stable, cost-e�cient and

scalable way.

They can replicate, repeat and validate scenarios published by the research commu-

nity. Then, they can apply their own modi�cations according to their focus area and

results can be e�ciently reused by others. Also, simulation is a cost e�ective and

highly scalable way to realize tests on a large scale infrastructure since adding re-

sources is basically the matter of changing one variable. Further, simulator tools are

often based on discrete time which enable users to launch many di�erent simulations

which compute rapidly with minimum computing resources. [16]

According to [16,38�42], three popular simulators can be identi�ed: CloudSim, MD-

CSim and GreenCloud.

CloudSim [43] is an event-based tool, considered as the "most popular" [41] and "so-

phisticated" [42] Cloud simulator available, mainly due to its extensibility and open

source license. It enables modeling of hardware resources (data center, host, CPUs

elements) as well as internal network topology and virtual machines with di�erent

scheduling policies.

MDCSim [44] is a commercial discrete event-based simulator developed at the Penn-

sylvania State University. Its goal is the simulation of multi-tier data center architec-

tures. It includes multiple vendors' hardware resources models and allows estimation

of servers' power consumption.

GreenCloud [45] has been built on the top of the well-known network simulator ns-2.

It is a packet level simulator focused on energy-aware data center environment mod-

eling, including communication links, switches, gateways, as well as communication

protocols, resource allocation and workload scheduling.

19

1.3 Research Challenges and Objective

Cloud Computing is a recent paradigm, continually evolving, with an increasing

number of IT equipment and a raise of complex data structures. Research chal-

lenges include management of Service Level Agreement (SLA), security and privacy,

as well as resource monitoring and data management. Further, Cloud Computing is

facing global challenges in term of interoperability and common standardization.

More recently, energy-e�cient computing devices, energy-aware resource manage-

ment and overall data center environment supervision are predominant topics in the

scienti�c literature, with the aim to reduce the ICT energy consumption and the re-

lated CO2 emissions. Also, Cloud modeling and simulation are indirectly important

challenges to provide engineers and researchers a suitable platform to develop new

algorithms, new models and new frameworks for Cloud Computing.

The thesis topic was centered on energy e�cient data storage in data center, with

the objective to identify and develop a method to study the energy e�ciency of the

data center storage. Hence, this thesis raise questions such as how to study a data

center storage system, how to identify key components to study energy e�ciency,

and how to provide energy awareness in a large scale system such as data centers.

1.4 Thesis Contribution

The present master thesis work contributes in the area of modeling and simulation

of Cloud environment, with the development of the widely used CloudSim simulator.

Due to data growth, storage has become a major part of computing cloud systems

and its related energy consumption. Unfortunately, the current version of CloudSim

does not provide any fully implemented simulation of storage components.

Therefore, this thesis work presents CloudSimDisk, a module for energy-aware stor-

age modeling and simulation in CloudSim simulator. The extension is implemented

in accordance with CloudSim architecture, and provides a high degree of scalabil-

ity for future development. The source code has been published for the research

community.

20

1.5 Thesis Outline

This section gives an overview of the thesis structure with a brief introduction to

the following chapters.

Chapter 2 � Background and Related Work

Chapter two presents a literature study of energy e�cient storage in Cloud envi-

ronment and Cloud simulators. Further, CloudSim architecture and its features are

analyzed in detail, and related works on storage modeling are presented. The chap-

ter is concluded with a background on the operation of CloudSim discrete event

simulator.

Chapter 3 � CloudSimDisk: Energy-Aware Storage Simulation in CloudSim

Chapter three introduces the CloudSimDisk module. At �rst, the module require-

ments are presented. Then, the di�erent components of CloudSimDisk are described.

Further, the package diagram, the execution �ow of CloudSimDisk simulations and

the implementation of energy-awareness are explained. At last, the scalability of the

module is demonstrated.

Chapter 4 � Results

Chapter four presents the results produced by CloudSimDisk. Inputs and outputs

of the simulations are explained in detail. The computation of the energy consump-

tion is analytically described and simulation results demonstrate the validity of the

implementation.

Chapter 5 � Conclusions and Future Work

Chapter �ve summaries the thesis contribution and discusses the limitations of the

current implementation. The chapter concludes this master thesis with future works

for CloudSimDisk.

21

2 BACKGROUND AND RELATED WORK

This chapter presents the related work of this thesis. The �rst part discusses en-

ergy e�cient storage in Cloud environment, including storage technologies, energy

e�ciency and data management. The second part discusses Cloud Computing simu-

lation tools: it inventories the main simulators available today and compares them.

Then, the well-known CloudSim tool is analyzed in depth and CloudSim storage

modeling is discussed. The chapter ends with a short and clear background on

CloudSim operation, necessary for the complete understanding of the next chapter.

2.1 Energy E�cient Storage in Cloud Environment

Pushed forward by data growth (see 1.2.1), e�cient storage in cloud environment has

become a strategic element in term of cost, performances and energy consumption.

Hard Disk Drive (HDD) is a well-established technology widely used in data cen-

ters. According to [46�48], its average cost per Gigabyte (GB) in 2014 was close to

US$0.03, decreasing continually since 1980. The areal density of HDDs had an aver-

age of 860 Gbits per square inch (or 1333 bits per micrometer-squared) in 2013 [49],

way ahead LTO Tape (2.1 Gbits/in2), ENT Tape (3.1 Gbits/in2) and optical Blu-ray

discs (75 Gbits/in2). The resulting capacity of these drives varies from 250 GB to

4 Terabytes (TB), and up to 10TB for the most recent ones, using thinner platters,

helium-�lled technology and Singled Magnetic Recording (SMR) [50, 51]. Concern-

ing performances, HDDs have a transfer rate in the order of 100-200 MB/s [52] which

vary depending on many di�erent factors including the rotation speed of platter, the

location of the �le on the platter and the number of sectors per track. As such, HDD

technology is a cost-e�cient and performance solution to store data in data center

environment.

However, I/O intensive applications require very low latency to access stored data.

Solid-State Drives (SSDs) have the advantages to not have a mechanical part, which

reduce their latency and improve their performances. The data rate of such drives is

around 200-500 MB/s [53] and their areal density was of 900 Gbits per square inch

in 2013 according to IBM [49]. But the main drawback of this technology, which

explains its slow adoption, is its cost. The cost per GB for SSD technology is around

US$0.40 for the cheapest models [54,55], 13 times higher than HDD technology.

22

To optimize the balance between cost and performance, the current trend is to dif-

ferentiate the application level requirements to match with an optimized resource

storage. SSD solution is used for critical I/O application while cheaper HDD tech-

nology is used for the remaining storage needs. Note that other technologies like

blue-ray disk [56] or Linear Tape-Open (LTO) magnetic tape [57] are studied for

long term conservation data with very low access frequency, or even probably never

accessed (backup, data required to be retained for a certain number of years by

law). Jay Parikh, Facebook's vice president of infrastructure engineering, stated

during the 2014 Open Compute summit that their experimental "Blu-ray system

reduces costs by 50 percent and energy use by 80 percent compared with its current

cold-storage system" based on HDD technology [58].

To summarize, the HDD technology, introduced by IBM in early 50s and improved

in the following decades, is still widely used in today's data centers, mainly due to

its low cost and high areal storage. Nevertheless, critical I/O applications require

more performance technology like Solid-State Drive to deal with hot data, data that

needs to be accessed quickly and frequently. Further, cold data, or data that are

almost never accessed, can be stored on low performance, low cost and low power

consumption technology like Tape or Blu-ray.

With the rise in Data (see 1.2.1), energy e�ciency became an important factor to

reduce the cost of the overall storage system in Cloud environment. One drawback

of HDD technology is that it requires signi�cant power in idle mode to spin platters,

while no operation is processing. One way to limit this downside is to reduce HDD

spin speed during inactive periods. The Open Compute Project [59] states that

"reducing HDD RPM (Rotation Per Minute) by half would save roughly 3-5W per

HDD". If we consider tens or even hundreds of thousands of HDD, in one data cen-

ter, the energy saved would be in the order of hundreds of kilowatts. Yet, it needs

to be considered that the related performances of the storage system will drop since

the speed of the disk is lower. Another approach is to reduce the number of HDD

in the system, without decreasing the total capacity of the data center. At �rst,

Perpendicular Magnetic Recording (PMR) technique has replaced the Longitudinal

Magnetic Recording (LMR) technique by storing bits vertically instead of horizon-

tally and squeezes around 750Gb per square inch on a disk platter. Next, Shingled

Magneting Recording (SMR) technique has been introduced. Because the reader

width of HDDs is smaller than the writer width, SMR technology overlaps adjacent

data tracks and improves the areal density of the drive by 25% [60]. This technol-

ogy is mainly interesting for long storage scenario with a few data modi�cations and

suppressions, since writing performances are a�ected by the shingled method.

23

Nishikawa et al. [61] proposes a novel energy e�cient storage management system for

intensive I/O behavior applications. The aim is to use application I/O patterns to

optimize the storage device I/O behavior. Hence, intervals during which an HDD is

inactive can be optimized by allowing the device to switch in power-o� state and to

save energy. The results show that the proposed framework provide equal or better

performances in term of energy consumption than conventional storage power-saving

methods (Popular Data Concentration (PDC) and Dynamic Data Reorganization

(DDR)). Since the proposed approach utilizes the application's I/O behaviors, it

can be also applied to SSD technology.

Wan et al. [62] develops a high performance energy-e�cient replication storage sys-

tem, using a RAID5 or RAID6 cache to conserve the reliability of the system. It

bu�ers many small writes requests in fewer large write transactions. Hence, the

front bu�er part of the system allows the back replica part to optimize its periods

of active and standby mode. As a result, the proposed solution improves writing

performances and saves more energy than previous system (GRAID and eRAID).

However, the PERAID system does not tolerate failure of the primary RAID cache

and the performances of the replica have not been yet analyzed, as well as its energy

e�ciency.

Taal et al. [63] proposes a decisional process for o�oading storage tasks from local

systems to remote data centers, based on greenhouse gas emissions. The decision

depends on the carbon emissions of the local storage, the remote data center and

the network connecting them. At �rst, they analyze the Power Usage E�ectiveness

(PUE) of both data centers, as well as the energy e�ciency of their storage systems

and the intermediary network. Then, the energy consumed by each part is converted

into grams of CO2 emitted according to the energy source used. The results show

that the o�oading decision is mainly a�ected by the interconnecting network, that

is to say, moving storage tasks to a cleaner remote data center is not systematically

greener. Future works plan to take into account some cold storage systems. Also,

the evaluation of the data center energy e�ciency should consider new metrics such

as Water Usage E�ectiveness (WUE) and Carbon Usage E�ectiveness (CUE).

Shuja et al. [64] presents a survey of techniques and architectures for designing

energy-e�cient data centers. Section III focuses on storage systems and discusses

several works related to energy e�ciency. Among them, [65] identi�ed two solu-

tions to reduce energy consumption on the storage part: improving energy e�ciency

of hardware and reducing data redundancy. The second solution implies however

careful consideration of "data replication, mapping, and consolidation". Also, [66]

proposes Hibernator, a disk array energy management system using, among other,

HDDs with spin-down capability. The proposed model saves 29% more energy than

24

previous solutions, while providing comparable performances. The main conclusion

of the survey resumes the advantages of �ash storage technology, but underlines its

under-representation in current cloud paradigm, due to their higher cost.

2.2 Cloud Simulators

Cloud Computing systems are complex. They require particular tools to analyze

speci�c quality concerns such as resource provisioning, task scheduling, network

con�guration, security or virtual machines management. Testing in real world en-

vironment is one technique for evaluated the performances of a system, but it is a

costly and time consuming method [44]. Moreover, the user might not have access

to all the system's components that need to be analyzed. Simulation splits apart

proper quality concern and allows to focus on the desired problem, that can be tested

under many di�erent scenarios [67]. Cloud simulators provide a stable, cost-e�cient

and scalable environment, where tests can be replicated, repeated and validated by

the research community. They empower users to control all the layers of the Cloud

system, videlicet the physical resources con�guration, the infrastructure topology,

the code middle-ware platform, the cloud application services and the user workload

behavior [43].

2.2.1 Overview

In the literature, few recent papers [39,41,67�69] have established a review of avail-

able tools for modeling and simulation of Cloud Computing environment. A non-

exhaustive but illustrative list of 19 simulators have been studied for this thesis

work, including CloudSim, CloudAnalyst, GreenCloud, MDCsim, iCanCloud, Net-

workCloudSim, EMUSIM, GroundSim, MR-CloudSim, DCSim, SimIC, D-Cloud,

PreFail, SPECI, OCT, OpenCirrus, CDOSim, TeachCloud and GDCSim. Each of

them has their own particularities, depending on their underlying platform, their

core simulation, their programing language and their maturity.

Table 1 compares the 19 simulators on six di�erent criteria, namely the base plat-

form (NS-2, SimJava, GridSim, CloudSim, etc.), the availability of the tool (Open

Source or Commercial license), the programing language used (Java, C++, XML,

etc.), the presence or not of a Graphic User Interface (GUI), the execution time

range (second, minute) and the energy-awareness capability (Yes or No).

25

Table

1.Com

parison

ofCloudCom

putingSimulators

(alphabeticorder).

SIM

ULATOR

PLATFORM

AVAILABILITY

CODE

GUI

TIM

ING

ENERGYAWARE

CDOSim

CloudSim

-Java

No

Second

No

CloudAnalyst

CloudSim

OpenSource

Java

Yes

Second

Yes

CloudSim

SimJava,GridSim

OpenSource

Java

No

Second

Yes

D-Cloud

Eucalyptus

OpenSource

-No

-No

DCSim

-OpenSource

Java

No

Minute

No

EMUSIM

CloudSim,AEF

OpenSource

Java

No

Second

Yes

GDCSim

BlueTool

OpenSource

C++,XML

No

Minute

Yes

GreenCloud

NS-2

OpenSource

C++,OTcl

Limited

Minute

Yes

GroundSim

-OpenSource

Java

Limited

Second

No

iCanCloud

OMNET,MPI

OpenSource

C++

Yes

Second

No

MDCsim

CSIM

Com

mercial

Java,C++

No

Second

Minimal

MR-CloudSim

CloudSim

Not

available

Java

No

-Yes

NetworkC

loudSim

CloudSim

OpenSource

Java

No

Second

Yes

OpenC

loudTestbed

Heterogeneous

Needregistration

-Limited

Minute

No

OpenC

irrus

Heterogeneous

OpenSource

-No

-Yes

PreFail

-OpenSource

Java

No

Minute

No

SimIC

SimJava

Not

available

Java

Yes

Second

Minimal

SPECI

SimKit

OpenSource

Java

Limited

Minute

No

TeachCloud

CloudSim

OpenSource

Java

Yes

Second

No

26

For this thesis work, the main criteria were the availability of the code and the pro-

graming language used for development. An Open Source software is free, permits

collaborative development and thus, encourages users to adopt it. The programing

language of the software should be generic and unique to facilitate its development

and to allow more contributions by the community. Several simulator tools meet

these expectations. Hence, to reduce the list, a more detailed analysis of the lit-

erature review has been carried out. Despite the fact that Cloud simulator tools

are numerous, the correlation between [16,38�42] revealed three major tools namely

CloudSim [43], MDCSim [44] and GreenCloud [45].

MDCSim [44] has been developed in 2009 by Seung-Hwan Lim from Pennsylvania

State University (USA). It models large scale multi-tier data center architecture with

a "comprehensive, �exible, and scalable" simulation platform, three reasons of its

predominant use. Each layer of its architecture is modeled independently and can be

modi�ed without a�ected other layers. Moreover, the simulator allows to estimate

and to measure the power consumption of a server cluster composed of thousands of

nodes. Figure 5 shows the architecture of the simulator. "NIC" stands for Network

Interface Card.

Figure 5. Architectural details of the three layers of MDCSim simulator [44].

One critical drawback of this simulator is its availability since MDCSim is a com-

mercial tool so users need to buy a license in order to use it. Further, it is not sure

you can access the source code of the tool.

GreenCloud [45] is an open source simulator, designed to model energy aware data

centers. It has been released in its �rst version by the Luxembourg University

in December 2010 as an extension of the well-known ns-2 network simulator [70].

27

Hence, GreenCloud is a packet-based simulator, focused on communication patterns

between data center components. From the energy perspective, GreenCloud provides

power models for server components, gateways, links, as well as core, access and

aggregation switches of multi-tier architecture. It o�ers �exible workload that can

be con�gured and tested on both computational and communicational attributes.

Also, new resource allocation, workload scheduling or communication protocols can

be implemented and tested. Figure 6 shows the architecture of the simulator.

Figure 6. Architecture of the GreenCloud simulation environment [45].

Compared to MDCSim, GreenCloud can simulate only small data center architec-

tures since its core packet-based simulation is highly time and resource consuming.

Further, GreenCloud users have to know both C++ and OTcl languages to develop

and run a simulation, which is a signi�cant obstacle for its wide adoption.

Thus, previously mentioned drawbacks of the popular GreenCloud and MDCSim

simulators are �rst reasons to consider CloudSim [43] for this thesis work. Addition-

ally, CloudSim has been identi�ed as the "most popular" [41] and "sophisticated" [42]

Cloud simulator available today. The Java programming language used for its im-

plementation has been the focus of several projects during my master program.

Further, the CloudSim source code is released under open source license, so it is free

and easy to download, which makes its development more convenient and dynamic.

28

2.2.2 CloudSim: a Framework for Modeling and Simulation of Cloud

Computing Infrastructures and Services

CloudSim is a widely used software framework for modeling and simulation of Cloud

Computing environments. It has been developed in 2010 by the CLOUDS Labora-

tory at the Computer Science and Software Engineering Department of Melbourne

University (Australia), and it is used in research by several universities and organi-

zations, including Duke University from North Carolina, HP Labs at Palo Alto and

the National research Center for Intelligent Computer systems (NCIC) in China.

According to [41] published in February 2014, CloudSim "is the most popular sim-

ulator tool available for cloud computing environment". Java Object Oriented

Programming (OOP) language is used for its implementation, which results in a

highly scalable simulator. Additionally to this widely known programing language,

CloudSim is an open source software so that any users can download its entire

source code and participate to its development. The simulator enables modeling

of CPU components, RAM, storage, Virtual Machine (VM), host, Cloud broker

and data center, as well as VM allocation policy, VM scheduler, dynamic workload,

power-aware simulation and network modeling built from BRITE network topology

�les [71].

Many important contributions have been undertaken around CloudSim. Garg et

al. [42] have developed NetworkCloudSim as an extension of the simulator allowing

simulation of networking protocols and communication between Cloud applications.

The extension has been integrated to CloudSim Toolkit 3.0. Beloglazov et al. [72]

have developed new algorithms related to VM migration and dynamic VM consoli-

dation problems, and CloudSim have been chosen as testing platform. As a result, a

power package has been created in CloudSim to enable power aware simulation and

the package has been then integrated in CloudSim Toolkit 2.0. Later, in CloudSim

Toolkit 3.0, the power package have been expanded with new accurate power models

of servers (HP and IBM) based on data from SPECpower benchmark [73]. Also, due

to the lack of Graphical User Interface (GUI), Wickremasinghe et al. [74] have de-

veloped CloudAnalyst, a CloudSim-based visual modeler for analyzing Cloud Com-

puting environments and applications, running on the top of the simulator. Other

related project can be found on the CloudSim o�cial website [75].

Recent discussions on the CloudSim community group, on stackover�ow.com and on

researchgate.net reveal several ongoing development of the simulator. Additionally,

CloudSim Toolkit 1.0 has been released in April 2009, Toolkit 2.0 in May 2010 and

Toolkit 3.0 in January 2012. Thus, it can be expected that a new version will be

29

soon released.

Concerning the architecture of CloudSim framework (see Figure 7), three layers can

be identi�ed as follow (bottom-up description):

1. The Core Simulation Engine: event queues control, clock updates, clock

tick execution, resource registration, dynamic environment supervision, simu-

lation termination.

2. The Modeling libraries: data centers, hosts, VMs, Cloudlets (tasks), �les,

storage, Processing elements (Pes), allocation and scheduling policies, network

topology, utilization models, power models.

3. The User Con�guration Code: scenarios, infrastructure and resources

speci�cations, application and workload con�gurations, allocation and schedul-

ing policies declarations.

Figure 7. Layered CloudSim architecture [43].

Developing a complete and accurate Cloud simulator is a long-lasting task that

need to evolve with business needs and new research �ndings. While CloudSim is

30

widely used, some major features are still unsatisfactory: modeling and simulation

of storage is of them.

2.2.3 CloudSim and Storage Modeling

A rapid analysis of CloudSim reveals a clear lack in Storage modeling. The related

embedded code is limited to a single model of HDD reused from GridSim simulator

[76], barely scalable due to its implementation and including some mistakes in its

algorithm apropos to the rotation latency [77] as well as the seek time and transfer

time of the model [78]. Also, from CloudSim Toolkit 1.0, a class SanStorage.java

implementing bandwidth and networkLatency parameters is still "not yet fully

functional" [79] since no improvements have been carried out in the succeeding new

version of the simulator.

Some projects related to CloudSim and storage modeling have been undertaken.

StorageCloudSim [80] has been released under open source license, CloudSimEx

project [81] is in continual development on GitHub, and Long et al. [82] worked on

cloud data storage, but they did not share their extension to the community.

StorageCloudSim Tobias Sturm from Karlsruhe Institute of Technology pub-

lished in August 2013 a Bachelor Thesis entitled "Implementation of a Simulation

Environment for Cloud Object Storage Infrastructures" [83]. His work added mod-

eling and simulation of Storage as a Service (STaaS) Clouds in CloudSim, using

the Cloud Data Management Interface (CDMI) standard. The architecture of the

simulator is depicted in Figure 8.

In this implementation, a data element is represented by a Binary Large OB-

ject (BLOB) and a storage disk is represented by a Object-Based Storage Device

(OBSD). Some new methods have been implemented compare to CloudSim storage

interface including:

• getMaxReadTransferRate(); // Returns the max possible transfer rate for read

operations in byte/ms.

• getMaxWriteTransferRate(); // Returns the max possible transfer rate for

write operations in byte/ms.

• getReadLatency(); //Returns the average latency of the drive in ms for read

31

operations. The latency includes the rotational latency, the command process-

ing latency and the settle latency.

• getWriteLatency(); //Returns the average latency of the drive in ms for write

operations. The latency includes the rotational latency, the command process-

ing latency and the settle latency.

However, these pieces of information are rarely provided by disk manufacturers and

need to be retrieved using speci�c benchmarking tools. Also, StorageCloudSim is

limited to STaaS Cloud Computing model type (see 1.2.2) with a focus on the Cloud

Data Management Interface standard, and does not provide any implementation of

energy aware storage for CloudSim.

Figure 8. CloudSim and StorageCloudSim architecture overview [83].

32

CloudSimEx The objective of CloudSimEx project is to develop a set of CloudSim

extensions and integrate them in the o�cial version of the simulator when the de-

velopment deserves it in term of usefulness, usability and maturity. [84]

The ongoing CloudSimEx features include web session modeling, better logging utili-

ties, utilities for generating CSV �les for statistical analysis, automatic id generation,

utilities for running multiple experiments in parallel, utilities for modeling network

latencies and MapReduce simulation.

Associated to CloudSimEx project, [85] worked on the performances of the per-

sistent storage. For that purpose, a new modeling of disk I/O performance has

been implemented: a HDD processing element HddPe extends the processing ele-

ment class Pe.java modeling a CPU in CloudSim. Then, Host.java, VM.java and

Cloudlet.java have been modi�ed to support the new disk I/O model. Thus, it is

not an implementation of disk simulation, but a more abstracted way to represent

I/O disk operation, independently to the device. The extended Cloudlet (Task)

has both CPU and Disk operations to be executed by respectively CPU and HDD

processing elements, as shown by Figure 9.

Figure 9. Concept of HDD processing element in CloudSimEx [86].

Data Cloud Layer Long and Zhao [82] extended CloudSim by adding the �le

striping and data replica features to the simulator. Thus, the expended CloudSim

layer architecture includes a new data Cloud layer between Cloud Services layer and

VM services layer (see Figure 10). The extension can be used to model di�erent

replica management strategy, commonly known as Redundant Array of Independent

Disks (RAID). However, the source code has not been shared with the research

community and it has not been possible to contact the authors.

33

Figure 10. CloudSim architecture with the new data Clouds layer [82].

2.3 CloudSim Background

In this section, a detailed description of the CloudSim tool is presented, including

the entities elements, the simulation life cycle, the core events passing operation and

the dynamic characteristic of CloudSim.

2.3.1 Entities

CloudSim is a framework for modeling and simulation of Cloud Computing infras-

tructures and services. Figure 11 shows the high level modeling of CloudSim where

components with continuous border lines are called "entities" (see 2.3.3). An entity

has the ability to send and to receive events to each other (see 2.3.3), and also to

interact with the other components (dash lines).

One major component of CloudSim is the Datacenter entity which aims to model a

real Datacenter: it houses a list of Hosts (physical servers machines) and a list of

storage (physical storage devices), both with de�ned hardware speci�cations (RAM,

Bandwidth, Capacity, CPUs for the Host; Capacity, SeekTime, Latency and maxi-

mum Transfer Rate for Storage).

CloudSim supports server virtualization so each host runs one or more Virtual Ma-

chines (VMs). Each VM is assigned to a Host according to speci�c VM Allocation

Policies de�ned for a particular Datacenter. Further, each host allocates resources

to VMs according to speci�c VM Scheduler Policies de�ned for a particular host.

Then, Cloud application jobs are managed internally by VMs according to various

Cloudlet Scheduler Policies de�ned for each particular VM.

A Cloudlet models the Cloud-based application services in CloudSim. It represents

a job, a request or a task to be executed. The Datacenter Broker models the inter-

34

Figure 11. CloudSim high level modeling.

mediary layer between end users and cloud providers responsible to meet the user

application's Quality of Services (QoS) needs. Thus this component supervises the

cloudlet arrival rate for a speci�c data center.

Finally, CloudSim core simulation implements automatically two entities: the Cloud-

InformationService (CIS) entity, responsible for resource registration, indexing and

discovery, and the CloudSimShutdown entity, responsible to signal the end of the

simulation to the CIS entity. These entities should not be created by the user himself

since the core simulation does it.

2.3.2 Life Cycle

In CloudSim, each simulation is following a speci�c life cycle (see Figure 12) that

needs to be chronologically followed in order for the simulation to work properly.

Figure 12. CloudSim Life Cycle.

35

Firstly, CloudSim common attributes such as "trace �ag", "calendar" and "number

of user" are initialized. CouldInformationService and CloudSimShutdown entities

are created. This has to be done before creating any other entities.

Secondly, Broker and Datacenter(s) entities are created. As explained in 2.3.1,

Datacenter is composed of Hosts machines. Thus, a hosts List is also created and

passed as parameter in the Datacenter constructor. Similarly, a VMs List and

Cloudlets List is created and sent to the Broker entity before the simulation starts.

Thirdly, all the entities threads are switched to running state and the core simulation

of CloudSim is executed (see 2.3.4). The simulation can be either stopped if there is

not more events in the queue or if the simulation reached a speci�c terminationTime

or if an abruptTerminate event appeared due to an unexpected problem.

Fourthly, once the simulation is �nished, the �nal result is printed. Note that the

�nal result can be formatted in a di�erent way depending on the scenarios executed.

Also, intermediary results can be printed during previous phases of the life cycle.

2.3.3 Events Passing

CloudSim is an event-based simulator, so each step of the simulation is triggered by

an event. Events are the heartbeat of CloudSim simulation environment: if no more

events are generated, it is the end of the simulation.

All communication between the main CloudSim components are accomplished by

the events passing activity. Each event can be seen as messages with a source ID and

a destinations ID: only an entity can send or receive an event. There are four di�er-

ent entities: CloudSimShutDown, CloudInformationService, Datacenter(s), and the

Broker. Their IDs are assigned consistently (Table 2) according to the CloudSim

Life Cycle (see 2.3.2).

Table 2. Entities IDs assignment.

ENTITIES IDs

CloudSimInformattionService 0CloudSimShutDown 1Datacenter(s) 2 ... nBroker n+1

36

Each event stores a unique Tag number corresponding to a speci�c happening (Ex-

ample: VM_DATACENTER_EVENT, END_OF_SIMULATION) or indicating the type of ac-

tion to perform by the recipient (Example: VM_CREATE, CLOUDLET_SUBMIT). Hence,

the recipient of an event has to implement a method which executes this particular

tag number. The execution can be a simple "print out" for the user or a more com-

plex sequence of sub methods which will eventually generate new events. Finally, an

event have a data parameter of type Object used to carry a Cloudlet or datacenter

characteristics or any other information that need to be transfer with the event.

2.3.4 Future Queue and Deferred Queue

Unlike GridSim [76], the core simulation of CloudSim framework is a dynamic en-

vironment by reason of two event queues: Future Queue and Deferred Queue (see

Figure 13). All events generated by entities during the runtime are added to the

Future Queue and sorted by their time parameter (tX ), "time at which the event

should be delivered to its destination entity [for execution]" [43]. As explained in

2.3.3, the execution of an event can result in the creation of new events, with similar

or di�erent "Time" parameter. In other word, events generation numbers (Event

X ) do not determine the order in Future Queue. Afterwards, the "top of the queue"

event in Future Queue is moved to the Deferred Queue and will be processed at the

next clock tick. If next event in the Future Queue has the same time, it will be

moved as well, and so on.

Figure 13. Example of queue management during three clock ticks.

In the example diagrammed in Figure 13, Event1 is executed �rst and generates

37

Event3. Then Event2, having the same Time parameter, is executed too and gen-

erates Event4 and Event5. No more events are in the Deferred Queue at this time.

Afterwards, Event4 being at the "top of the queue" in Future Queue is moved to

the Deferred Queue. A new event is in Deferred Queue so it is executed during

the next tick and it generates Event7. No more events are in the Deferred Queue

so the Event5 being at the "top of the queue" in Future Queue is moved to the

Deferred Queue. Also, Event6 has the same time parameter so it is moved as well

in Deferred Queue. The next step would be to execute Event4, then Event5 which

will eventually generate new events.

2.4 Summary

In this chapter, HDD has been identi�ed as the main technology used in today's

data centers, mainly due to its low cost and high areal density. From an energy

perspective, cold storage has been the topic of most of the recent researches related

to energy e�cient storage in cloud environment.

Further, Cloud Simulation has been identi�ed as a cost-e�ective solution to perform

experiments in a controllable, stable and repeatable way. Numerous Cloud simula-

tors have been compared. As a consequence of its popularity, its availability and

its extensibility, CloudSim has been the choice of this thesis work. The analysis

revealed a lack in storage modeling that has not been yet overcome.

Last, a background of CloudSim operation has been presented to prepare, in the

next chapter, the introduction of the CloudSimDisk module.

38

3 CLOUDSIMDISK: ENERGY-AWARE STORAGE

SIMULATION IN CLOUDSIM

This chapter presents CloudSimDisk, a module for energy aware storage simulation

in CloudSim simulator. The �rst part explains the objectives of the module, and

the module requirements. Next, the main concepts of CloudSimDisk are described,

such as HDD model, HDD power model, data cloudlet and data center persistent

storage. Then, the execution �ow and the packages diagram is explained. At last,

energy awareness and scalability for CloudSimDisk are discussed.

3.1 Module Requirements

CloudSimDisk has been developed according to di�erent requirements, which pro-

vide several advantages to the module and explain the architectural design choices.

The main requirement was to respect the architecture and the core processing of

CloudSim. In fact, CloudSimDisk is a module for CloudSim so it has to operate in

the same way than CloudSim. Also, similar design choices will reduce the learning

curve of CloudSim users who want to adopt CloudSimDisk. Further, it will encour-

age participations and contributions for the future development of the module.

Another important requirement is the scalability of the module. HDD technology is

complex to model due to the electromechanical nature of the devices. Additionally,

the technology is evolving rapidly. Hence, CloudSimDisk has to be developed with

the idea that implementing more characteristics, more features, should be possible

later. This capability is a positive argument for the adoption of CloudSimDisk.

An additional requirement was to consider �rst only the main parameters of the

HDD technology, and to provide energy consumption results based on this simpli-

�ed model. In fact, this work has to be achieved within strict deadlines, so some

development decisions such as this one has been taken.

3.2 CloudSimDisk Module

This section introduces the CloudSimDisk module. At �rst, the Hard Disk Drive

(HDD) model and the associated HDD power model are presented. Then, the data

cloudlet object, or storage task, is explained in details. Further, the data center

persistent storage is de�ned.

39

3.2.1 HDD Model

As explained in Chapter 1, HDDs are still today the most used storage technology in

Cloud computing environment. Unfortunately, CloudSim provides only one model

of HDD reused from GridSim simulator [76], barely scalable and including some

mistakes in its algorithm [78]. To overcome this barrier, CloudSimDisk module

implements a new HDD model.

According to [87] [88] [89], the main characteristics a�ecting the overall HDD perfor-

mance are the mechanical components, combination of the read/write head transver-

sal movement and the platter rotational movement. Additionally, the internal data

transfer rate, often called sustained rate, has been identi�ed as a bottleneck of the

overall data transfer rate of an HDD [90]. More recently, [91] proposed a HDD model

based on 23 input parameters which achieve between 91% to 96.5% accuracy. Figure

14 shows a diagram of model parameters used in their implementation, organized by

functional category. Each parameters is described in detail in order of importance:

�rst parameter is the position time, "the sum of the seek time and the rotational

latency", and second is the transfer time, "the time required to transfer one sector

of data to or from the media", namely the Internal Data Transfer Time.

Figure 14. Diagram of model parameters [91].

40

A new package, namely cloudsimdisk.models.hdd, has been created, and contains

classes modeling HDD storage components. Each model implements one method,

namely getCharacteristic(int key). In this method, the parameter key is an

integer corresponding to a speci�c characteristic of the HDD. To ensure the consis-

tency between di�erent HDD models, all the classes extend one common abstract

class, which declares the getCharacteristic(int key) method. Thereby, the pa-

rameter key corresponds to the same HDD characteristic in each model.

However, it is not convenient for developers or users to play with key numbers.

Hence, the getCharacteristic(int key) method has been declared as Protected

and cannot be used directly. Instead, the common abstract class implements a

getter for each HDD characteristic (getCapacity(), getAvgSeekTime(), etc.).

The getCharacteristic(int key) method is used only internally to retrieve the

required characteristic. As a result, methods accessed by users are semantically un-

derstandable. Table 3 inventories the available methods declared in HDD models to

retrieve HDD characteristics.

Table 3. CloudSimDisk HDD characteristics.

KEY

0getManufacturerName()

The name of the Manufacturer (Ex: Seagate Technology, Toshiba, West-ern Digital).

1getModelNumber()

The unique manufacturer reference (Ex: ST4000DM000).

2getCapacity()

The capacity of the HDD in megabyte (MB).

3getAvgRotationLatency()

The average rotation latency of the disk which is de�ned as half theamount of time it takes for the disk to make one full revolution, in second(s), directly dependent on the disk rotation speed in Rotation Per Minute(RPM).

4getAvgSeekTime()

The average seek time of the disk which is de�ned as the average timeneeded to move the read/write head from track x to track y, also corre-sponding to one-third of the longest possible seek time, moving from theoutermost track to the innermost track, assuming an uniform distributionof requests [92].

5getMaxInternalDataTransferRate()

The maximum internal data transfer rate which is de�ned as the rate atwhich data is transferred physically from the disk to the internal bu�er,also called Sustained Data Rate or Sustained Transfer Rate.

41

3.2.2 HDD Power Model

For the toolkit 3.0, Anton Beloglazov has included a power package to CloudSim,

based on his publication a year before [72]. This implementation provides the nec-

essary algorithm for modeling and simulation of energy-aware computational re-

sources, i.e. Host and Virtual Machines. However, it does not provide energy

awareness to the storage component.

Thus, similarly to 3.2.1, the package cloudsimdisk.power.models.hdd has been

created in accordance with the power package in place. Inside, the abstract class

PowerModelHdd.java implements semantically understandable getters to retrieve

the power data of a speci�c HDD in a particular operating mode. Table 4 invento-

ries the available operating power mode declared in HDD power models.

Table 4. CloudSimDisk HDD power mode.

KEY MODE DESCRIPTION

0 Active The disk is handling a request.1 Idle The disk is spinning but there is no activity on it.

3.2.3 Data Cloudlet

As explained in 2.3.1, CloudSimDisk models a request with a Cloudlet component.

However, the CloudSim implementation of this component interacts mainly with the

Host's CPU hardware element. No examples of interactions with storage element

are provided and no results are printed out. Thus, an extension of the CloudSim

Cloudlet is proposed by CloudSimDisk. The default Cloudlet constructor with eight

parameters has been reused. Additionally, two new parameters have been de�ned:

• requiredFiles: a list of �lenames that need to be retrieved by the cloudlet.

These requested �les have to be stored on the persistent storage of the Data-

center before the cloudlet is executed.

• dataFiles: a list of �les that need to be stored by the cloudlet. These new �les

will be added to the persistent storage of the Datacenter during the cloudlet

processing.

42

Note that requiredFiles has been already implemented in CloudSim v3.0.3 but

the constructor parameter to set this variable has been called fileList. However,

this list is not a list of File object, but a list of String corresponding to �lenames.

To make matters even more confusing, the new parameter dataFiles implemented

in CloudSimDisk is a list of File. Thus, in order to clarify things, the fileList

parameter has not been reused by CloudSimDisk. Instead, requiredFiles and

dataFiles are parameters of the new Cloudlet's constructor (see Figure 15).

Figure 15. CloudSimDisk Cloudlet constructor.

3.2.4 Data Center Persistent Storage

In CloudSim, one parameter of the data center entity is a list of Storage elements.

This list models the data center persistent storage. Unfortunately, CloudSim does

not provide any example how to interact with this component.

CloudSimDisk's aim is to provide a module for storage modeling and simulation

in CloudSim. Thus, an extension of the CloudSim data center model has been

realized by CloudSimDisk. Methods have been deleted, overridden and created in

order to interact only with the data center persistent storage. As a result, the data

center model implements all necessary algorithms to process requiredFiles and

dataFiles of a Cloudlet when one is received.

3.3 Execution Flow

This section diagrams the core execution of CloudSimDisk, including communication

between components, events passing activity and main methods execution.

43

As a starting point, CloudSim.startSimulation() starts all the entities and gen-

erates automatically the �rst events of the whole simulation process. At 0.1 second,

one of these events calls the method submitCloudlets() of the broker, responsible

to send Cloudlets one by one to the data center. Therefore, for each Cloudlet, one

event is scheduled at destination to the data center (see Figures 16 and 17) 4. These

events have the Tag CLOUDLET_SUBMIT, a scheduling time de�ned by the distribu-

tion chosen by the user, and it contains the Cloudlet as "event-data". Next, data

center calculates the transaction time for each �le of the Cloudlet that need to be

added to, or retrieved from, the persistent storage. At the same time, it generates a

con�rmation event at destination to itself, with the Tag CLOUDLET_FILE_DONE and

delayed by the calculated transaction time plus the eventual waiting delay due to

request queue on the target disk.

Figure 16. Event passing sequence diagram for "Basic Example 1", 3 Cloudlets.

Figure 16 presents a simple example where the transaction time of each cloudlet

is inferior to the cloudlet arrival time intervals. Hence, there is no waiting delay.

Figure 17 presents an example based on real word workload (wikipedia). In this

case, the cloudlet arrival rate is more important, so the interval time between two

cloudlets is smaller. As a results, cloudlets have to wait in the disk queue before

execution.4For the sake of simplicity, each Cloudlet contains only one �le that needs to be added to the

persistent storage, itself composed of only one HDD.

44

Figure 17. Event passing sequence diagram for "Wikipedia Example", 3 Cloudlets.

As a reminder, the Transaction Time is the sum of the Rotation Latency, the Seek

Time and the Transfer Time, obtained according to the target HDD characteristics

(see 3.2.1). The waiting delay is the time the request spends in the disk queue,

waiting to be executed. If no requests are in the queue, the waiting time is zero.

Figure 18 presents in detail what happens when the data center receives an event

Tag CLOUDLET_SUBMIT. It begins by retrieving the Cloudlet object from the data

parameter of the event. Then, it retrieves the Cloudlet DataFiles which is a list

of �les that contains zero, one or many �les (see �rst part of Figure 18). Each

�le is added to the persistent storage according to the chosen algorithm (see 3.2.4).

The transaction time for an operation is returned by the HDD, and also stored in

the attributes of the File so that we can access this information later. After each

operation, the method processOperationWithStorage(...) is called to handle the

waiting delay of the request, the queue size of the HDD and the operation mode of

the HDD. This method is the fruit of long thoughts, gathering logical, mathematical

and engineering skills. Also, this method generates the CLOUDLET_FILE_DONE events

that will be used for output results.

When all the data �les have been handled, the same scenario is performed for the

required �les of the Cloudlet, except the list of required �les is a list of �lenames

that need to be retrieved, and so getFile(Filename_n) returns the requested File

(see second part of Figure 18).

45

Figure18.Process

when

datacenterreceives

aCLOUDLET_SUBMIT

event.

46

On the HDD side, adding a �le can be decomposed in three phases, chronologically

organized (see Figure 19):

• Firstly, the transaction time is determined by retrieving successively the seek

time, the rotation latency and the transfer time for the concerned �le. All this

information is dependent on the HDD model used.

• Secondly, the list of �les, the list of �lenames and the space used on the HDDs

are updated relatively to the �le added. This action is necessary to keep a

track on the content of each HDD. Also, it facilitates the implementation of

�le stripping.

• Thirdly, the �le's attributes "transactionTime" and "RessourceID" are set

respectively by the transaction time determined in phase I and the ID of the

concerned HDD. Hence, this information can be reused later, for example, to

analyze on which HDD �les are added.

After these three phases, the addFile method returns the transaction time. Note

that phase II does not exist for getting a �le since the �les on the HDDs are not

changed, so the list of �les, the list of �lenames and the space used on the HDD are

unchanged. Also, the "ResourceID" is not modi�ed in phase III since the �le is still

stored by the same device. Moreover, the �le object needs to be retrieved from the

list of �les in the HDD before phase I. In order to do that, an iterator is instantiated

to run through the list. A while loop compares the �lenames of each element in

the list until the required �le is found. If the name of the required �le does not

match any �les stored on the persistent storage, a null File object is returned by

the method getFile(fileName), otherwise the matched �le is returned.

CloudSimDisk users can de�ne any type of request arrival distribution as parameter

of the simulation, so the data center has to implement a scalable algorithm that

handle the persistent storage in any situation. This feature is provided inside the

method processOperationWithStorage().

Now, remember that the datacenter entity has to handle the persistent storage, all

along the simulation. This includes updating the storage state (idle or active) if

needed and to keep a history of the time spent in each mode by each HDD. This

task is not easy going since it mixes di�erent times, delays and durations. Moreover,

new cloudlets can arrive at any time to add or to retrieve some �les.

47

Figure19.HDD

internal

process

ofaddinga�le.

48

Figure 20a depicts an example to understand how the state of the persistent storage

is managed during a simulation. Additionally, Figure 20b shows the Java code

responsible for this process, and Table 5 summarizes the key time values. The

example consider three cloudlets, arriving at three di�erent times.

When a cloudlet arrives to interact with the persistent storage, two cases can be

identi�ed:

• The target HDD is in Idle mode. In that case5:

� the waiting time is null;

� the active end time is the current time plus the transaction time;

� the event delay is the transaction time;

� the storage is set in Active mode.

• The target HDD is in Active mode. In that case6:

� the waiting time is the active end time minus the current time;

� the active end time is increased by the transaction time;

� the event delay is the waiting time plus the transaction time.

Note that in both cases, the transaction time is retrieved from the �le's attributes

and the total active duration of the target HDD is incremented by this duration.

Further, the power needed by the HDD in active mode is retrieved, the energy is

computed (based on the transaction time), and the con�rmation event is scheduled

carrying all the previously established variables.

Table 5. Trace of HDD values related to Figure 20.

Clock() TransactionTime WaitingTime ActiveEndat EventDelayAt 0.311 s 0.014 s 0.000 s 0.325 s 0.014 sAt 0.321 s 0.008 s 0.004 s 0.333 s 0.012 sAt 0.326 s 0.002 s 0.007 s 0.335 s 0.009 s

5HDD is processing nothing.6HDD is already processing one or more request(s).

49

(a) HDD management at the datacenter level.

(b) Code snippet showing the storage's state update when receiving a new Cloudlet.

Figure 20. Example of storage management with one HDD: (a) graphic; (b) code.

50

3.4 Packages Description

CloudSimDisk is composed of 27 classes organized in 8 packages (see Figure 21), and

inherits from 8 classes of CloudSim gathered in 4 di�erent packages. Names follow

the same pattern used in CloudSim, so users will be able to use these extensions with

a minimum learning curve. The pre�x "My" has been added to some class names to

avoid confusion between CloudSim and CloudSimDisk. The package's architecture

has been thought to simplify a prospective integration in CloudSim. The overall

Class diagram is shown in Figure 22.

Figure 21. CloudSimDisk: 27 classes organized in 8 packages.

The following list describes each packages of CloudSimDisk (alphabetic order):

• cloudsimdisk contains MyCloudlet and MyDatacenter components, extended

from Cloudlet and Datacenter components of CloudSim, and MyHarddriveStor-

age component implementing the Storage interface of CloudSim;

• cloudsimdisk.core contains CloudSimTags component extended from CloudSim

to add the Tag CLOUDLET_FILE_DONE used in CloudSimDisk extension and to

implement a method converting "tag numbers" in "tag text" for logging pur-

pose;

51

• cloudsimdisk.distributions contains a basic distribution for Example0 and

Example1 that reads a simple �le, a Wikipedia distribution for Wikipedia ex-

amples that read Wikipedia traces �les, a seek time distribution characterized

by a minimum, a maximum and a mean, and a distribution tester to test

algorithm implemented in this package;

• cloudsimdisk.examples contains the three examples (MyExample0, MyEx-

ample1 and MyWikipediaExample1) provided with CloudSimDisk, classes gath-

ering constants variables used by examples, a Runner to run CloudSimDisk

examples and an Helper to assist in providing functionality to the Runner.

• cloudsimdisk.models.hdd contains three HDD models from three di�erent

manufacturer (HGST, Seagate and Toshiba) and one common abstract class

from which each model is extended;

• cloudsimdisk.power contains MyPowerDatacenterBroker component extended

from PowerDatacenterBroker component of CloudSim, and MyPowerDatacen-

ter and MyPowerHarddriveStorage components extended from MyDatacenter

and MyHarddriveStorage components of CloudSimDisk;

• cloudsimdisk.power.models.hdd contains three HDD power models from

three di�erent manufacturer (HGST, Seagate and Toshiba) and one common

abstract class from which each power model is extended;

• cloudsimdisk.util contains one class to write Logs in a text �le and one

class to write results in an excel �le template including prede�ned graphs for

CloudSimDisk simulations.

Further, three examples are provided in the extension:

• MyExample0 and MyExample1 provide simple scenarios which contribute to the

novice users understanding. The number of cloudlets is limited (up to 9) ,

the arrival time is uniform on 1 second and the �le size is linear from 1MB to

9MB.

• MyWikipediaExample1 is a more complex scenario with thousands of cloudlets,

each of them having a �le to store on the persistent storage. The request arrival

rate is read from a Wikipedia workload trace [93].

52

Figure 22. Class Diagram of CloudSimDisk extension.

53

3.5 Energy-Awareness

In CloudSim, the energy-awareness implementation has been separated from the

non-power-aware implementation. Therefore, CloudSimDisk energy-aware features

are gathered in cloudsimdisk.power package, comparably to CloudSim implemen-

tation.

Two classes, namely MyPowerDatacenter and MyPowerHarddriveStorage, extend

respectively the classes MyDatacenter and MyHarddriveStorage from cloudsimdisk

package.

MyPowerDatacenter increments two global variables related to energy:

• totalStorageEnergy is the total energy consumed by the persistent storage

for a particular data center during a simulation. The energy considered is the

sum of the energy consumed by each read/write request on its persistent stor-

age. In more depth, the energy consumed by one request is the power needed

by the target HDD element to add, to retrieve or to delete a �le multiplied by

the transaction time of this request.

• totalDatacenterEnergy is the total energy consumed by the data center,

including totalStorageEnergy. For the �rst version of CloudSimDisk, this

variable is not relevant since only the energy consumed by the persistent stor-

age is added to the total data center energy. Nevertheless, the idea is to

decompose the data center into parts and the sum of these parts composing

the data center. Hence, we can later identify the impact of storage energy

consumption on the overall data center energy consumption.

In processOperationWithStorage() method, the power consumption of the cur-

rent mode of the disk is retrieved and multiplied by the transaction time. The re-

sult is the energy consumption of the related operation. Both totalStorageEnergy

and totalDatacenterEnergy are incremented by this value. Later, power and en-

ergy of the operation is added to the data of the CLOUDLET_FILE_DONE event. The

processCloudletFilesDone() method has been modi�ed to print out these two

new information.

Another feature has been implemented in this energy-aware data center. In the

method processOperationWithStorage, in the case the storage is in idle mode

54

and before to switch it to active mode, the duration during which the storage have

been in idle mode is stored in an array called IdleIntervalsHistory. Thus, for

each HDD of the persistent storage, a history of intervals durations in idle mode

is recorded. It can be further analyzed to eventually spin down some HDDs of the

persistent storage in order to reduce the total storage energy consumption.

Table 6. "IdleIntervalsHistory" values related to Figure 23.

Clock() new idle interval IdleIntervalsHistoryAt 240 s 240 s 240At 2400 s 1200 s 240;1200At 3600 s 480 s 240;1200;480At 4800 s 960 s 240;1200;480;960

Figure 23. Example of idle intervals history management.

The example, presented by Figure 23 and Table 6, illustrates the idle intervals

management realized when a disk is going to switch to active mode. In this example,

we assumed four bursts of requests arriving to the same disk, each of them taking

more or less time to be handled. Four Idle intervals can be identi�ed with a duration

of 240s, 1200s, 480s and 960s.

55

3.6 Scalability

This section demonstrates the scalability of the CloudSimDisk module. It presents

di�erent parts of the extension where the user can implement its own algorithms or

add new input parameters.

3.6.1 HDD Characteristics

As explained in 3.2.1, each HDD model is a "Switch-case" statement that returns

a speci�c characteristic according to a key parameter. This key follows the same

pattern for all HDD models (see Table 3).

To implement a new HDD characteristic, only two modi�cations need to be done:

1. In the target model, a new case should be added which return the value of the

new characteristic.

protected Object getParameter(int key) {

switch (key) {

...

case <KEY_NUMBER>:

return <PARAMETER_VALUE>; // the value of the new

characteristic

default:

return "n/a";

}

}

2. In the common abstract class StorageModelHdd.java, add one public method

with clear semantic which retrieves the new characteristic.

public double getMaxInternalDataTransferRate() {

return (Double) getCharacteristic(5);

}

public <TYPE> getNameOfTheParameter() {

return (CAST_OBJECT) getCharacteristic(<KEY_NUMBER>);

}

56

3.6.2 HDD Power Modes

Similarly to HDD characteristic, HDD power modes are implemented in a "Switch-

case" statement that return a speci�c power consumption according to a key pa-

rameter. This key follows the same pattern for all HDD power models (see Table

4).

To implement a new HDD power mode, only two modi�cations need to be done:

1. In the target power model, a new case should be added which return the power

value of the new mode.

protected Object getPowerData(int key) {

switch (key) {

...

case <KEY_NUMBER>:

return <POWER_VALUE>; // the power value of the new mode

default:

return "n/a";

}

}

2. In the common abstract class PowerModelHdd.java, add one public method

with clear semantic which retrieves the new power value.

public double getPowerActive() {

return getPowerData(1);

}

public <TYPE> getPowerOfYourMode() {

return (CAST_OBJECT) getPowerData(<KEY_NUMBER>);

}

3.6.3 Randomized Characteristics

In real world, most of the HDD characteristics are variables, like seek time or rotation

latency. CloudSimDisk randomizes these values by applying di�erent distributions.

57

The rotation latency is generated from UniformDistr(0, 2 * avgRotationLatency)

which return values between 0 and two times the average rotation latency in a uni-

form way.

To apply a new distribution to the seek time, one modi�cation in MyHarddriveStorage

needs to be done. When the seek time is set, a number generator is created according

to the distribution wanted.

public boolean setSeekTime(double avgSeekTime) {

// previous SeekTime ditribution (to remove)

ContinuousDistribution generator = new MySeekTimeDistr(0.0002, 3 *

avgSeekTime, avgSeekTime);

// new SeekTime ditribution (to add)

ContinuousDistribution generator = new YourPersoDistribution(...);

this.genSeekTime = generator; // store generator

return true;

}

3.6.4 Data Center Persistent Storage Management

In CloudSimDisk, all the transactions are done with the persistent storage of the

data center entity. This persistent storage is a pool of HDD elements. When a

�le needs to be added to this system, the data center needs to choose one HDD in

the pool to which redirect the request. This choice will be done according to the

algorithm con�gured by the user before the simulation.

In MyDatacenter class, the method addFile(File) implements a "Switch-case"

statement where a key parameter de�ne which algorithm to apply while man-

aging incoming requests to the persistent storage. At the moment, the basics

FIRST-FOUND and ROUND-ROBIN algorithm are implemented: FIRST-FOUND

comes from CloudSim source code and ROUND-ROBIN has been added by the

CloudSimDisk team.

If a user want to implement its own algorithm, he needs to add one new case to the

"Switch-case" statement, and to write down its code.

58

public int addFile(File file) {

int key = 1;

...

switch (key) {

...

case <KEY_NUMBER>:

// write your own algorithm to manage request to the persistent

storage

break;

default:

System.out.println("ERROR: no algorithm corresponding to this

key.");

break;

}

...

}

3.6.5 Broker Request Arrival Distribution

The broker implemented in CloudSimDisk is responsible to send Cloudlets one by

one to the data center. Thus, MyPowerDatacenterBroker schedules each Cloudlet

at a speci�c time according to a distribution de�ned by the user.

In this class, the method setDistri(type, source) implements a "Switch-case"

statement which instantiate a distribution according to the type parameter (see

Table 7). The source parameter is used for wiki and basic distribution as a path

to the �le containing arrival times information.

Table 7. Sets the request arrival distribution in MyPowerDatacenterBroker.

TYPE DISTRIBUTION

expo Exponential distribution with an average of 60 seconds (arbitrary).unif Uniform distribution between 0 and 10 seconds (arbitrary).basic Read the arrival times in a �le for Basic Example.wiki Read the arrival times in a Wikipedia trace �le.

default Uniform distribution between 1 and 1.001 second.

59

To apply a new request arrival distribution, it needs to add one new case to the

"Switch-case" statement.

public void setDistri(String type, String source) {

switch (type) {

...

case "<TYPE>":

distri = new YourOwnDistribution(...);

break;

default:

distri = new UniformDistr(1, 1.0001); // arbitrary parameters

break;

}

}

3.7 Summary

This chapter presented CloudSimDisk module for modeling and simulation of storage

in CloudSim. The HDD model and power model are presented along with the execu-

tion �ow, and a description of the package diagram. Furthermore, energy awareness

and scalability has been demonstrated. The following chapter aims to validate this

module by presenting experimental results produced with CloudSimDisk.

60

4 RESULTS

This chapter presents simulation results obtained with CloudSimDisk module. At

�rst, input parameters and simulation outputs are described. As a central part, sev-

eral experiments are presented to demonstrate the core processing of the module,

including the incoming request distribution, the HDD characteristics variations and

the step by step evolution of the simulation. Additional results on energy consump-

tion and disk array management are discussed as well.

4.1 Inputs and Outputs

In this section, the input parameters of the simulator and the simulation outputs

are listed and described one by one.

4.1.1 Input Parameters

CloudSimDisk simulations are based on 10 input parameters presented in Table 8.

The requests arrival times on the data center persistent storage can be retrieved

from a de�ned distribution (uniform, exponential) or from a �les in which each

line correspond to a time7 (basic, wikipedia). Thus, the requestArrivalRateType

parameter informs the data center broker about the request arrival distribution to

create (expo, unif, basic or wiki). If the distribution needs to be read from a �le, the

path of this �le should be de�ned in the requestArrivalTimesSource parameter.

When it comes to create requests, one data �le and one required �le can be assigned

to each Cloudlet8. So a Cloudlet contains zero-to-one �le to add and zero-to-one �le

to retrieve. If some �les need to be retrieved, they have to be added to the persistent

storage before the simulation start. The startingFilesList parameter is used for

this purpose. The number of disk in the persistent storage should be at least one.

The default algorithm to manage the persistent storage disks at the data center level

is a Round-Robin (�rst �le added on the �rst drive, the second �le on the second

drive, and so on). This algorithm can be changed in MyDatacenter.java. The pool

of disk is uniform, that is to say it is based on one unique HDD model. The Power

model chosen should be in accordance with the HDD model.7examples are provided in the files folder of CloudSimDisk project.8technically, a Cloudlet can have more than one data �le and required �le.

61

Table 8. Input parameters for CloudSimDisk simulations.

No. NAME DESCRIPTION

1 nameOfTheSimulation The name of the simulation.

2 requestArrivalRateTypeThe type of distribution.(Ex: unif, expo, wiki)

3 requestArrivalTimesSourceThe path of the source �le containing the ar-rival times of requests.

4 numberOfRequest The number of request (Cloudlet) to create.

5 requiredFilesThe path of the �le containing the list of �le-names required.

6 dataFilesThe path of the �le containing the list of �le-names and �le sizes that needs to be storedduring the simulation.

7 startingFilesListThe path of the �le containing the list of �le-names and �le sizes that need to be storedbefore the simulation start.

8 numberOfDiskThe number of Hard Disk Drives (HDDs) inthe persistent storage of the datacenter.

9 hddModelThe model of HDD for the whole persistentstorage.

10 hddPowerModelThe power model of HDD in the persistentstorage.

4.1.2 Simulation Outputs

The outputs of a simulation are displayed in three di�erent formats, each of them

containing di�erent information for di�erent purposes. First, the IDE console shows

the step by step evolution of the simulation and a summary of the results (see Figure

24). This output is erased by the next simulation. Also, the console length is often

limited, so the output can be not total. Second, a log �le is created in the logs folder

of CloudSimDisk project. Its purpose is mainly to understand or debug the core

simulation. It contains a trace of all the events exchanged during the simulation,

the status of the persistent storage at the beginning of the simulation, the detailed

transaction time of each operation, the evolution of the length of each HDD queue

and the operating mode of each HDD in real time. Third, an Excel spreadsheet (see

Table 9) is created with all the information related to each Cloudlet operation9. This

Excel �le includes prebuilt graphs about the request arrival distribution, the seek

time and rotation latency distributions, the energy consumption, the transaction

times and the request waiting times.

9the spreadsheet can only store information about one operation per Cloudlet.

62

Figure 24. Example of console information output.

63

Table 9. Excel values output.

COLUMN NAME DESCRIPTION

A CloudletIDThe ID of each Cloudlet. All the re-sults are related to a speci�c Cloudlet,unique by this ID.

B Arrival Time (s)The time in second at which theCloudlet arrives.

C Waiting Time (s)The time in second the Cloudlet waitedin the waiting queue of the target disk.

D Transaction Time (s)The time in second of the transaction,sum of the seek time, the rotation la-tency and the transfer time.

E Seek Time (s)

The time in second to move the read-/write arm to the required track. Thisseek time is related to one particulartransaction.

F Rotation Latency (s)

The time in second to rotate the diskand bring the required sector under theread/write head. This rotation latencyis related to one particular transaction.

G Transfer Time (s)

The time in second to transfer a �le be-tween the internal controller to the sur-face of the magnetic disk. This transfertime is related to one particular trans-action.

H Done Time (s)The time in second at which the trans-action is �nished.

I FilenameThe name of the �le subject to thetransaction.

J File size (MB)The size of the �le subject to the trans-action.

K ActionDenotes if the �le has been added orretrieved.

L HDD name The name of the target HDD.

M Energy Consumption (J)

The energy consumed by the targetdisk to do the transaction. This out-put is available only with power-awarecomponent of CloudSimDisk.

64

4.2 Results

In this section, di�erent results are presented to demonstrate the core processing of

CloudSimDisk. The method used aims to analyze the output results, and to compare

them with the input con�gurations and the expected results from the analytical

models. In this way, the incoming request arrival distribution, the seek time, the

rotation latency and the data transfer time are analyzed. Additionally, the real time

console output is compared with the expected sequential processing proposed by

CloudSimDisk. Further, results on energy consumption and disk array management

are discussed as well.

4.2.1 Request Arrival Distribution

The request arrival rate is de�ned by input parameters "requestArrivalRateType"

and "requestArrivalTimesSource" (see Table 8). In the output Excel spreadsheet

produced by CloudSimDisk, the request arrival distribution is plotted to validate

the user input parameters. As an example, some wikipedia workload traces [93] have

been used as input request arrival rate. Figure 25 shows an example of Wikipedia

workload drawn by CloudSimDisk. A uniform distribution can be observed with an

average of 3060 requests per second (153 requests every 0.05 second).

Figure 25. Wikipedia workload distribution.

65

4.2.2 Sequential Processing

As presented in Chapter 3.3, Figure 20a, each Cloudlet (request) sent to an HDD is

executed according to a FIFO queue (First In, First Out). Thus, if one request is

received at the same time that another is executing, this request will have to wait

in the "waiting queue" before its execution.

For the sake of clarity, the following example is obtained using a simple workload

distribution shown by 26.

Figure 26. Simple example distribution.

Figures 28 to 35 present chronologically the console output of the example depicted

by Figure 27. The persistent storage is composed of 1 HDD.

Figure 27. Sequential processing of requests illustrated.

66

At �rst, the simulation is initialized, entities are started and components are created

(see Figure 28).

Figure 28. Sequential processing of requests part 1.1.

Then, Cloudlets are scheduled by the Broker entity (see Figure 29). In this exam-

ple, Cloudlets #1, #2 and #3 are scheduled respectively at 0.311, 0.321 and 0.356

second. Thus, they are expected to be executed in alphabetical order.


During the second part of the simulation, the datacenter entity start receiving

Cloudlets (see Figure 30) as scheduled during the �rst part by the Broker entity

(see Figure 29). Cloudlet #1 arrives �rst and start to be executed.


Cloudlet #2 arrived at 0.321000 second (see Figure 31), during the execution of

Cloudlet #1. Thus, Cloudlet #2 has to wait in the "waiting queue" of the disk

until Cloudlet #1 is done. Cloudlet #1 is completed at 0.333513 second, with a

transaction time of 0.022513 second and a null waiting time because the Cloudlet

was the �rst to be executed.

67


Cloudlet #3 arrived at 0.356000 second (see Figure 32), during the execution of

Cloudlet #2. Thus, Cloudlet #3 has to wait in the "waiting queue" of the disk

until Cloudlet #2 is done. Cloudlet #2 is completed at 0.374022 second, with a

transaction time of 0.040509 second and a waiting time of 0.012513 second in the

disk queue because of Cloudlet #1.


Cloudlet #3 is completed at 0.416238 second, with a transaction time of 0.042216

second and a waiting time of 0.018022 second in the disk queue because of Cloudlet

#2.


Since all the requests are now completed, the simulation termination is executed

(see Figure 34). All the entities are shut down, the CloudSim variables are reset and

the simulation is completed.

68


The �nal result is printed out (see Figure 35). The disk has been in Idle mode

between the beginning of the simulation and the reception of the �rst Cloudlet,

that is to say 0.311 second. Then the disk has been in active mode until the end

of the simulation. This total active time (0.105 second) is actually the sum of the

transaction time of each Cloudlet (0.022513 + 0.040509 + 0.042216) rounded to the

nearest one-thousandth second.

The maximum "waiting queue" of hdd1 is of 1 request, reached both when Cloudlet

#2 arrived during Cloudlet #1 execution and when Cloudlet #3 arrived during

Cloudlet #2 execution.

The total energy consumption of the persistent storage for this simulation is of 3.332

Joule(s). More explanations are provided by the section 4.2.7.


69

4.2.3 Seek Time Randomness

The seek time is the time to move the read/write head from an initial track x to the

required track y. Thus, this time is dependent on the distance between these two

tracks. The longer is the distance to cover, the longer is the seek time. However,

the seek time is not linear with the distance to travel because of acceleration and

deceleration periods of the actuator arm of the disk.

CloudSimDisk randomizes the seek time according to the average seek time parame-

ter of the target disk. The seek time distribution MySeekTimeDistr returns random

values between a minimum min, a maximum max and an average mean. According

to [92], CloudSimDisk distributes the seek time between 0 and 3 times the average

seek time, in second.

Figure 36 presents the seek time distribution obtained with CloudSimDisk for 5000

Cloudlets, based on HGST Ultrastar (REF: HUC109090CSS600) HDD model (Av-

erageSeekTime: 0.004 second).

Figure 36. Seek Time distribution.

The average seek time obtained by simulation is 0.0040 second. The minimum seek

time is 0.0000 s and the maximum is 0.0120 s. These values are in accordance with

the previously described model.

70

4.2.4 Rotation Latency Randomness

The rotation latency is the time to rotate the platter of the disk and bring the

required sector under the read/write head. This time is dependent on the rota-

tional speed of the disk measured in Rotation Per Minute (RPM). Also, the average

rotational latency of a disk is half the time needed for a full rotation. The mini-

mal rotation latency is null and correspond to the case where the required sector is

already under the read/write head.

CloudSimDisk randomized the rotation latency according to the average rotation

latency parameter of the target disk. Thus, a uniform distribution is used with a

minimum of 0 and a maximum of 2 times the average rotation latency, in second.

Figure 37 presents the rotation latency distribution obtained with CloudSimDisk for

5000 Cloudlets, based on HGST Ultrastar (REF: HUC109090CSS600) HDD model

(AverageRotationLatency: 0.003 second).

Figure 37. Rotation Latency distribution.

The average rotation latency obtained by simulation is 0.0030 second. The minimum

rotation latency is 0.0000 s and the maximum is 0.0060 s. These values are in

accordance with the previously described model.

71

4.2.5 Data Transfer Time Variation

As explained in section 3.2.1, the internal data transfer rate (or sustained rate)

has been identi�ed as a bottleneck of the overall data transfer rate of an HDD.

This parameter is very di�cult to model because of its dependency with several

factors including Zoned Bit Recording Variances, Cache E�ects and File System

Fragmentation. [90]

The current data transfer rate model of CloudSimDisk does not take into account

these parameters and considers the outermost zone of tracks on the platter, where

the internal data transfer rate is maximal [94].

Thus, the data transfer time varies only according to the size of the �le to transfer.

Figure 38 presents the transfer times obtained with CloudSimDisk for 500 Cloudlets,

based on HGST Ultrastar (REF: HUC109090CSS600) HDD model (MaxInternal-

DataTransferRate: 198.0 MB/second) and considering �le sizes between 1 MB to

10 MB.

Figure 38. Transfer times and �le sizes.

The result shows that the transfer time obtained with CloudSimDisk is linear with

the size of the �le processed (y = 198x + 1E-12 ). The intercept 1E-12 can be

approximated to 0. The slope of the line 198 corresponds to the maximum internal

data transfer rate of the HDD model used for the simulation. This value is in

accordance with the previously described model.

72

4.2.6 Seek Time, Rotation Latency and Data Transfer Time Compared

with Energy Consumption per Transaction

In the Excel spreadsheet output produced by CloudSimDisk, the energy consump-

tion for each transaction is calculated (see section 4.2.7, equation (6)). This section

compares the variation of the seek time, the rotation latency and the transfer time

compare with the variation of the energy consumption, for each transaction.

The following scenario considers a Wikipedia workload with 50 Cloudlets, each of

them adding a �le to the persistent storage. The model of HDD used is a Seagate

Enterprise (REF: ST6000VN0001) (ActivePower: 11.27 Watts). The �le sizes vary

between 1 MB to 10 MB.

The results, presented by Figure 40, show that the data transfer time and the energy

vary in a similar way. This indicates that the data transfer time has a major impact

on the energy consumption per transaction.

Figure 39 shows the detail of each transaction times for this scenario. For 86% of the

Cloudlet (43 Cloudlets out of 50), the transfer time is the most signi�cant part of

the total transaction time. This result is in accordance with the previous statement

relative to data transfer time and energy consumption per transaction (see Figure

40c).

Figure 39. The transaction time: sum of the seek time, the rotation latency and thetransfer time.

73

(a) Seek time compares with energy consumption per transaction.

(b) Rotation latency compares with energy consumption per transaction.

(c) Data transfer time compares with energy consumption per transaction.

Figure 40. Energy consumption per transaction compared with: (a) Seek Time; (b)Rotation Latency; (c) Data Transfer Time.

74

4.2.7 Persistent Storage Energy Consumption

The �nal result of a CloudSimDisk execution is the energy consumed by the storage

system during one simulation (see Figure 41).

Figure 41. "MyExampleWikipedia1", 5000 requests - Final result.

The energy consumed by the persistent storage of the data center is noted EpersistentStorage.

It is obtained by the sum of all Ehdd_i, the energy consumed by the i -th HDD out

of n in the persistent storage (see equation (1)).

EpersistentStorage =n∑

i=1

Ehdd_i (1)

Then, the energy consumed by the i -th HDD Ehdd_i is the sum of the energy con-

sumed by this HDD in Idle mode Ehdd_i, idle and in Active mode Ehdd_i, active (see

equation (2)).

Ehdd_i = Ehdd_i, idle + Ehdd_i, active (2)

The energy consumed in Idle mode for a speci�c HDD Ehdd_i, idle is the sum of all

Ehdd_i, idle_j, the energy consumed by the i -th HDD during the j -th idle interval out

of m (see equation (3)).

Ehdd_i, idle =m∑j=1

Ehdd_i, idle_j (3)

The energy consumed in Active mode for a speci�c HDD Ehdd_i, active is the sum of

all Ehdd_i, active_j, the energy consumed by the i -th HDD during the j -th operation

out of m (see equation (4)).

75

Ehdd_i, active =m∑j=0

Ehdd_i, active_j (4)

The energy consumed by a speci�c HDD during a speci�c Idle interval Ehdd_i, idle_j

is the interval duration tidle_j multiplied by the power required by the HDD in Idle

mode Phdd_i, idle (see equation (5)).

Ehdd_i, idle_j = tidle_j × Phdd_i, idle (5)

The energy consumed by a speci�c HDD during a speci�c operation Ehdd_i, active_j

is the operation duration thdd_i, operation_j multiplied by the power required by the

HDD in Active mode Phdd_i, active (see equation (6)).

Ehdd_i, active_j = thdd_i, operation_j × Phdd_i, active (6)

The time of one speci�c operation thdd_i, operation_j is called the transaction time (see

Figure 39). It is the sum of the seek time tseekTime, the rotation latency trotationLatency

and the transfer time ttransferTime (see equation (7)).

thdd_i, operation_j = ttransactionT ime = tseekT ime + trotationLatency + ttransferT ime (7)

This analytical model has been implemented in CloudSimDisk. To validate the im-

plementation, several scenarios have been executed with the simulator and the total

energy consumed by the persistent storage have been compared with the manually

calculated results. Both proved to be similar.

4.2.8 Energy Consumption and File Sizes

An operation on the persistent storage consists of adding/retrieving a �le on/from

an HDD. A �le has a name and a size. While the name is used to identify each �le,

the size has an impact on the energy consumed by the storage. Indeed, in equation

76

(7), the transfer time ttransferTime can be expressed as the data transfer rate R of the

target HDD multiplied by the size of the �le processed fsize (see equation (8)).

ttransferT ime = R× fsize (8)

Figure 42 shows the energy consumption per operation depending upon the size of

the �le processed by this operation. The scenario consists of adding 100 �les with

the same size to the persistent storage. The HDD model used is a HGST Western

Digital 900GB, with an average seek time of 0.003 second, an average rotation

latency of 0.004 second and a maximum internal data transfer rate of 198.0 MB/s

(ref: HUC109090CSS600). The scenario has been repeated four times with �le sizes

of 1, 10, 100, and 1000 megabytes.

To verify the validity of the simulations, an analytic result has been calculated from

the HDD characteristics and according to equation (9).

Eoperation = [tavgSeekT ime + tavgRotLat + (tmaxIntDataTransfRate × fsize)]× Pactive (9)

The energy consumed by the operation is noted Eoperation. Then, tavgSeekTime is the

average seek time of the HDD, tavgRotLat is the average rotation latency of the HDD,

tmaxIntDataTransfRate is the maximal internal data transfer rate of the HDD, fsize is the

size of the �le concerned by the operation and Pactive is the power required by the

HDD during an operation.

The experimental results are in accordance with the analytical values. The tiny

di�erence is explained by the randomness of the seek time and the rotation latency.

Nevertheless, the means obtained for each simulation are very close to the analytic

results. This shows that the seek time and the rotation latency vary in accordance

with the average values de�ned in the HDD model used as input parameter of the

simulation.

77

(a) (b)

(c) (d)

Figure 42. Energy consumed per operation Eoperation with �le sizes of (a) 1 MB, (b) 10MB, (c) 100 MB and (d) 1000 MB.

78

4.2.9 Disk Array Management

In CloudSimDisk, the persistent storage of a data center is a pool of HDD devices.

When Cloudlets are processing, new �les are eventually stored in the persistent

storage, more speci�cally in one HDD of the pool. Hence, the simulator follows a

de�ned algorithm to choose the target HDD.

CloudSim implements a FIRST-FOUND algorithm (see Figure 43a). It scans the

list of available HDD and adds the �le on the �rst one which has enough free space.

This algorithm is simple and work well with one HDD, but it does not provide

e�ciency when the persistent storage is composed of several devices.

CloudSimDisk implements additionally a new algorithm called ROUND-ROBIN (see

Figure 43b). Files are added sequentially to each disk of the persistent storage, so

the �rst �le on the �rst disk, the second �le on the second disk, etc. When no more

disks are in the pool, the algorithm restarts from the �rst disk. If one disk is full,

the algorithm tries the next disk. If all the persistent storage is full, the algorithm

returns a tag number corresponding to FILE_ADD_ERROR_STORAGE_FULL event. If

the �le is stored with success, the algorithm returns a tag number corresponding to

FILE_ADD_SUCCESSFUL event.

(a) (b)

Figure 43. Disk array management algorithm: (a) FIRST-FOUND; (b) ROUND-ROBIN.

79

To observe how Round-Robin algorithm a�ects the simulation, Figure 44 presents

the results of three similar scenarios with a variable persistent storage size of 1

HDD, 2 HDDs and 3 HDDs. Table 10 lists the input parameters used during the

experiments.

Table 10. Common input parameters for ROUND-ROBIN experiments.

CLOUDLET No. REQUEST No.FILE

NAME

FILE

SIZE

ARRIVAL

TIME

Cloudlet #1 Request #1 FileA 1 MB 0.311 sCloudlet #2 Request #2 FileB 7 MB 0.321 sCloudlet #3 Request #3 FileC 3 MB 0.356 sCloudlet #4 Request #4 FileD 1 MB 0.366 sCloudlet #5 Request #5 FileE 5 MB 0.368 sCloudlet #6 Request #6 FileF 3 MB 0.370 sCloudlet #7 Request #7 FileG 1 MB 0.387 sCloudlet #8 Request #8 FileH 7 MB 0.390 sCloudlet #9 Request #9 FileI 3 MB 0.401 sCloudlet #10 Request #10 FileJ 10 MB 0.402 sCloudlet #11 Request #11 FileK 7 MB 0.419 sCloudlet #12 Request #12 FileL 3 MB 0.436 sCloudlet #13 Request #13 FileM 1 MB 0.467 sCloudlet #14 Request #14 FileN 2 MB 0.478 sCloudlet #15 Request #15 FileO 3 MB 0.490 s

When the persistent storage is composed of only 1 HDD, the incoming requests

tend to accumulate in the disk queue, leading to poor performances. For example,

Cloudlet #15 spent around 90% of its processing time in the "waiting queue". In

fact, in this case, the Round-Robin algorithm cannot balance the workload with

other devices, and the input load is more important than the processing capability

of the HDD.

When the persistent storage is composed of 2 HDDs, the Round-Robin algorithm can

balance the incoming requests on both drives, reducing signi�cantly their "waiting

time" and improving the overall performances of the persistent storage. For example,

the waiting time of Cloudlet #15 has been divided by 9. However, for most of the

Cloudlet, the "waiting time" is still more than 50% of their total processing time.

When the persistent storage is composed of 3 HDDs, the load balancing is even more

e�ective. Most of the Cloudlet has a null "waiting time" and others are waiting for

less than 50% of their total processing time. Moreover, the system is more resilient

due to the number of devices in the persistent storage, but it consumes more energy.

CloudSimDisk can be used to study this dilemma.

80

Figure44.CloudSim

Disksimulation

with1HDD,2HDDsand3HDDsin

thepersistentstorageusingRoundRobin

algorithm

managem

ent.

81

Table 11 summaries the e�ect of ROUND-ROBIN on the persistent storage perfor-

mances. It shows that the algorithm acts as a load balancer and reduces the number

of request waiting in the disks queues. Consequently, requests are executed more

rapidly and the overall performances of the persistent storage are improved.

Table 11. Maximum "waiting queue" length for ROUND-ROBIN examples.

HDD

NAME

MAX "WAITING QUEUE" LENGTH (Requests)for 1 HDD for 2 HDDs for 3 HDDs

hdd1 8 Req 4 Req 2 Reqhdd2 - 3 Req 2 Reqhdd3 - - 2 Req

4.3 Summary

Chapter 4 presented some results obtained with CloudSimDisk module. Inputs and

outputs of the CloudSimDisk simulations have been explained in detail. The se-

quential processing has been proved by an analysis of the console output. The

randomness of the seek time and the rotation latency, and the variation of the data

transfer time, obtained with CloudSimDisk, are in accordance with the HDD model

parameters and the stated assumptions. The computation of the energy consump-

tion has been analytically described and simulation results have demonstrated the

validity of the implementation. Finally, the ROUND-ROBIN algorithm, managing

the incoming requests to the persistent storage, has been validated with three dif-

ferent experiments. Further algorithm can be implemented, for example to study

the trade-o� between performances and energy savings.

82

5 CONCLUSIONS AND FUTURE WORK

The thesis topic was centered on energy e�cient data storage in data center, with

the objective to look at the energy consumed to store each bit of digital information.

As a result, the present work introduced CloudSimDisk, a module for energy-aware

storage simulation in the widely used CloudSim simulator. This chapter discusses

the thesis contribution, its limitations and future work.

5.1 Thesis Contribution: CloudSimDisk

CloudSimDisk module allows modeling and simulation of energy-aware Hard Disk

Drive (HDD) technology in CloudSim simulator. It implements new HDD models

with capacity, average seek time, average rotation latency and maximum internal

data transfer rate parameters. Additionally, it implements HDD power model with

idle and active power requirements. The data center, data center broker and cloudlet

models in CloudSim have been extended to allow simulation of energy-aware storage.

Classes and methods names in CloudSimDisk follow the same pattern used in

CloudSim v.3.0.3, so users will be able to use later implemented features or ex-

tend them with a minimum learning curve. From a software engineering point of

view, CloudSimDisk has been developed according to scalability and modularity

qualities attributes. Consequently, users can test their algorithms and add new

input parameters, so that CloudSimDisk can be improved over time.

The source code has been released as Open Source software10 on GitHub11, where

it is possible to follow future developments of CloudSimDisk.

5.2 Limitations

This thesis work presents several limitations, considering restrictions due to a limited

time frame and limitations due to the chosen tools.

CloudSimDisk is an extension to CloudSim Toolkit v3.0.3. Hence, CloudSimDisk

development had to comply with the CloudSim core simulation engine. The pro-

10LGPL license, http://www.gnu.org/copyleft/lgpl.html11https://github.com/Udacity2048/CloudSimDisk

83

graming language had to be Java. Also, some Java Access Modi�ers had to be

changed in the CloudSim source code in order to extend desired functionalities in

CloudSimDisk. Thus, a modi�ed version of CloudSim Toolkit v3.0.3 has to be

provided with CloudSimDisk source code. Later, these minor changes should be

e�ectuated in the o�cial version of CloudSim.

To understand the implementation and the internal execution of CloudSim, as well

as to analyze the operation of Hard Disk Drive (HDD) technology, has been time

consuming. Therefore, the current version of CloudSimDisk does not take into

account the disk fragmentation, the zoned bit recording, the head switch time, the

cylinder switch time, the internal cache or bu�er, the command overhead, the settle

time, the di�erence between read/write performances, a proper disk controller, disk

scheduling algorithms, a spin-down power saving mode and a temperature model

a�ecting the HDD energy consumption.

5.3 Future Work

This section presents some suggestions for the future development of CloudSimDisk.

At �rst, real case scenarios should be performed using benchmarking tool to mea-

sure the energy consumption of a speci�c HDD under di�erent workload. Similar

scenarios should be run using CloudSimDisk module and results should be compared

with real world measures.

Then, models should be expanded to improve modeling of modern HDDs, including

read/write di�erentiation, internal bu�er and spin-down capability. Also, while the

seek time is one of the most important performance speci�cation, it is a challenging

task to model it accurately. In fact, this time is not linear and, for each operation,

it is a�ected by a multitude of factors, including the size of the �le to process, the

disk fragmentation, the �le system type or even performed operation (read or write).

Thus, new parameters should be implemented in CloudSimDisk.

Furthermore, the network package in CloudSim could be linked with CloudSimDisk

to model the network latency between servers and the persistent storage, and its

related energy consumption. However, this amount of energy consumed should not

be merged with the current persistent storage energy consumption. The modularity

of the implementation should be preserved; the network is an independent part of

the data center architecture.

84

REFERENCES

[1] M. Castells, End of Millennium: The Information Age: Economy, Society, and

Culture. John Wiley & Sons, 2010, vol. 3.

[2] M. Castells, The power of identity: The information age: Economy, Society,

and Culture. John Wiley & Sons, 2011, vol. 2.

[3] M. Pickavet, W. Vereecken, S. Demeyer, P. Audenaert, B. Vermeulen, C. De-

velder, D. Colle, B. Dhoedt, and P. Demeester, �Worldwide Energy Needs for

ICT: The Rise of Power-Aware Networking,� in 2nd International Symposium

on Advanced Networks and Telecommunication Systems (ANTS'08). IEEE,

2008, pp. 1�3.

[4] J. G. Charney, A. Arakawa, D. J. Baker, B. Bolin, R. E. Dickinson, R. M.

Goody, C. E. Leith, H. M. Stommel, and C. I. Wunsch, �Carbon Dioxide and

Climate: a Scienti�c Assessment,� National Academy of Sciences, Washington,

DC, 1979.

[5] J.-R. Petit, J. Jouzel, D. Raynaud, N. I. Barkov, J.-M. Barnola, I. Basile,

M. Bender, J. Chappellaz, M. Davis, G. Delaygue et al., �Climate and Atmo-

spheric History of the Past 420,000 Years From the Vostok Ice Core, Antarc-

tica,� Nature, vol. 399, no. 6735, pp. 429�436, 1999.

[6] T. POOG, �What Is Your Carbon Foolprint?� http://thepoog.com/?p=2732,

July, 2, 2012, accessed: 2015.05.15.

[7] Global e-Sustainability Initiative et al., �Smart 2020: Enabling the Low Carbon

Economy in the Information Age, Global ICT Solution Case Studies,� The

Climate Group, Tech. Rep, 2008.

[8] A. Adams and N. Mishra, �User Survey Analysis: Key Trends Shaping the

Future of Data Center Infrastructure Through 2011,� Gartner Market Analysis

and Statistics, 2010.

[9] J. Gantz and D. Reinsel, �Extracting Value From Chaos,� IDC iView, vol. IDC,

no. 1142, pp. 9�10, 2011.

[10] J. Gantz and D. Reinsel, �The Digital Universe in 2020: Big Data, Bigger Digital

Shadows, and Biggest Growth in the Far East,� IDC iView: IDC Analyze the

Future, vol. 2007, pp. 1�16, 2012.

http://thepoog.com/?p=2732

85

[11] M. Hilbert and P. López, �The World's Technological Capacity to Store, Com-

municate, and Compute Information,� Science, vol. 332, no. 6025, pp. 60�65,

2011.

[12] L. Atzori, A. Iera, and G. Morabito, �The Internet of Things: A Survey,� Com-

puter Networks, vol. 54, no. 15, pp. 2787�2805, 2010.

[13] SINTEF, �Big Data, for Better or Worse: 90% of World's Data Gener-

ated Over Last Two Years,� http://www.sciencedaily.com/releases/2013/05/

130522085217.htm. ScienceDaily., May 2013, accessed: 2015.04.15.

[14] V. Turner, J. F. Gantz, D. Reinsel, and S. Minton, �The Digital Universe of

Opportunities: Rich Data and the Increasing Value of the Internet of Things,�

International Data Corporation, White Paper, IDC_1672, 2014.

[15] Gil Press, �A Very Short History Of Big Data,� http://www.forbes.com/

sites/gilpress/2013/05/09/a-very-short-history-of-big-data/, Forbes, Decem-

ber 2013, accessed: 2015.03.13. [Online]. Available: http://www.forbes.com/

sites/gilpress/2013/05/09/a-very-short-history-of-big-data/

[16] R. Ranjan, �Modeling and Simulation in Performance Optimization of Big Data

Processing Frameworks,� Cloud Computing, IEEE, vol. 1, no. 4, pp. 14�19,

2014.

[17] C. Snijders, U. Matzat, and U.-D. Reips, �Big Data: Big Gaps of Knowledge in

the Field of Internet Science,� International Journal of Internet Science, vol. 7,

no. 1, pp. 1�5, 2012.

[18] A. De Mauro, M. Greco, and M. Grimaldi, �What is Big Data? A Consensual

De�nition and a Review of Key Research Topics,� in 4th International Confer-

ence on Integrated Information, Madrid. doi: 10.13140/2.1.2341.5048. AIP

Conference Proceedings, 2014, pp. 2341�5048.

[19] D. Laney, �3D Data Management: Controlling Data Volume, Velocity and Va-

riety,� META Group Research Note, vol. 6, 2001.

[20] M. van Rijmenam, �Why The 3V's Are Not Su�cient To Describe Big Data,�

https://data�oq.com/read/3vs-su�cient-describe-big-data/166, August 2013,

accessed: 2015.05.07.

[21] E. Kern, �Facebook is Collecting Your Data - 500 Terabytes a

Day,� https://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-

terabytes-a-day/, August, 22, 2012, accessed: 2015.05.15.

http://www.sciencedaily.com/releases/2013/05/130522085217.htm

http://www.sciencedaily.com/releases/2013/05/130522085217.htm

http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/




https://datafloq.com/read/3vs-sufficient-describe-big-data/166

https://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/

https://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/

86

[22] YouTube, �Statistics,� https://www.youtube.com/yt/press/statistics.html,

May, 08, 2015, accessed: 2015.05.15.

[23] YouTube O�cal Blog, �Holy Nyans! 60 Hours per Minute and 4 Billion Views a

Day on YouTube,� http://youtube-global.blogspot.se/2012/01/holy-nyans-60-

hours-per-minute-and-4.html, January, 23, 2012, accessed: 2015.05.15.

[24] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A. H.

Byers, and M. G. Institute, �Big Data: The Next Frontier for Innovation, Com-

petition, and Productivity,� 2011.

[25] M. Chalmers, �Large Hadron Collider: The Big Reboot,� http://www.nature.

com/news/large-hadron-collider-the-big-reboot-1.16095, October, 08, 2014, ac-

cessed: 2015.05.15.

[26] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. U. Khan,

�The Rise of "Big Data" on Cloud Computing: Review and Open Research

Issues,� Information Systems, vol. 47, pp. 98�115, 2015.

[27] C. Ji, Y. Li, W. Qiu, U. Awada, and K. Li, �Big Data Processing in Cloud

Computing Environments,� in Proceedings of the International Symposium on

Parallel Architectures, Algorithms and Networks, I-SPAN, 2012.

[28] P. Mell and T. Grance, �The NIST De�nition of Cloud Computing,� National

Institute of Standards and Technology, vol. 53, no. 6, p. 7, September 2011.

[29] A. Milenkoski, A. Iosup, S. Kounev, K. Sachs, P. Rygielski, J. Ding,

W. Cirne, and F. Rosenberg, �Cloud Usage Patterns: A Formalism

for Description of Cloud Usage Scenarios,� SPEC Research Group -

Cloud Working Group, Standard Performance Evaluation Corporation

(SPEC), Tech. Rep. SPEC-RG-2013-001 v.1.0.1, May 2013. [Online].

Available: http://research.spec.org/�leadmin/user_upload/documents/rg_

cloud/endorsed_publications/SPEC-RG-2013-001_CloudUsagePatterns.pdf

[30] S. Kächele, C. Spann, F. J. Hauck, and J. Domaschka, �Beyond IaaS and PaaS:

An Extended Cloud Taxonomy for Computation, Storage and Networking,� in

Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility

and Cloud Computing. IEEE Computer Society, 2013, pp. 75�82.

[31] M. Armbrust, O. Fox, R. Gri�th, A. D. Joseph, Y. Katz, A. Konwinski, G. Lee,

D. Patterson, A. Rabkin, I. Stoica et al., �Above the Clouds: a Berkeley View

of Cloud Computing,� Dept. Electrical Eng. and Comput. Sciences, University

of California, Berkeley, Rep. UCB/EECS, vol. 28, p. 13, 2009.

https://www.youtube.com/yt/press/statistics.html

http://youtube-global.blogspot.se/2012/01/holy-nyans-60-hours-per-minute-and-4.html

http://youtube-global.blogspot.se/2012/01/holy-nyans-60-hours-per-minute-and-4.html

http://www.nature.com/news/large-hadron-collider-the-big-reboot-1.16095

http://www.nature.com/news/large-hadron-collider-the-big-reboot-1.16095

http://research.spec.org/fileadmin/user_upload/documents/rg_cloud/endorsed_publications/SPEC-RG-2013-001_CloudUsagePatterns.pdf

http://research.spec.org/fileadmin/user_upload/documents/rg_cloud/endorsed_publications/SPEC-RG-2013-001_CloudUsagePatterns.pdf

87

[32] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative

Approach. Elsevier, 2012.

[33] D. Kliazovich, P. Bouvry, F. Granelli, and N. L. Fonseca, �Energy Consump-

tion Optimization in Cloud Data Centers,� Cloud Services, Networking and

Management, 2014.

[34] Emerson Network Power, �Energy Logic: Reducing Data Center Energy Con-

sumption by Creating Savings that Cascade Across Systems,� White paper,

Emerson Electric Co, 2009.

[35] S. Pelley, D. Meisner, T. F. Wenisch, and J. W. VanGilder, �Understanding

and Abstracting Total Data Center Power,� in Workshop on Energy-E�cient

Design, 2009.

[36] The Natural Resources Defense Council (NRDC), �Data Center E�-

ciency Assessment,� https://www.nrdc.org/energy/�les/data-center-e�ciency-

assessment-IP.pdf, August 2014, accessed: 2015.05.07.

[37] Gadgetzz, �Facebooks Server Hall in Lulea Sweden,� http://www.

yousaytoo.com/facebooks-server-hall-in-lulea-sweden/2102255, 2012, accessed:

2015.04.23.

[38] M. Mahdi, M. Raheleh, and S. I. Abdul, �Grid and Cloud Computing Simulation

Tools.� International Journal of Networks and Communications, pp. 45�52,

2013.

[39] S. Utkal and S. Mayank, �Comparison of Various Cloud Simulation Tools Avail-

able in Cloud Computing,� International Journal of Advanced Research in Com-

puter and Communication Engineering, vol. 4, pp. 171�176, March 2015.

[40] R. Habibur and M. Adnan, �A Survey of Cloud Simulation Tools,� http://

www.slideshare.net/habibur01/survey-on-cloud-simulator, July 2013, accessed:

2015.04.16.

[41] A. Ahmed and A. S. Sabyasachi, �Cloud Computing Simulators: A Detailed

Survey and Future Direction,� in IEEE International Advance Computing Con-

ference (IACC). IEEE, 2014, pp. 866�872.

[42] S. K. Garg and R. Buyya, �Networkcloudsim: Modelling Parallel Applications

in Cloud Simulations,� in Fourth IEEE International Conference on Utility and

Cloud Computing (UCC). IEEE, 2011, pp. 105�113.

https://www.nrdc.org/energy/files/data-center-efficiency-assessment-IP.pdf

https://www.nrdc.org/energy/files/data-center-efficiency-assessment-IP.pdf

http://www.yousaytoo.com/facebooks-server-hall-in-lulea-sweden/2102255

http://www.yousaytoo.com/facebooks-server-hall-in-lulea-sweden/2102255

http://www.slideshare.net/habibur01/survey-on-cloud-simulator

http://www.slideshare.net/habibur01/survey-on-cloud-simulator

88

[43] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De Rose, and R. Buyya,

�CloudSim: a Toolkit for Modeling and Simulation of Cloud Computing En-

vironments and Evaluation of Resource Provisioning Algorithms,� Software:

Practice and Experience, vol. 41, no. 1, pp. 23�50, 2011.

[44] S.-H. Lim, B. Sharma, G. Nam, E. K. Kim, and C. R. Das, �MDCSim: A Multi-

tier Data Center Simulation Platform,� in IEEE International Conference on

Cluster Computing and Workshops (CLUSTER'09). IEEE, 2009, pp. 1�9.

[45] D. Kliazovich, P. Bouvry, and S. U. Khan, �GreenCloud: A Packet-Level Sim-

ulator of Energy-Aware Cloud Computing Data Centers,� The Journal of Su-

percomputing, vol. 62, no. 3, pp. 1263�1283, 2012.

[46] J. C. McCallum, �Disk Drive Prices (1955-2014),� http://www.jcmit.com/

diskprice.htm, September, 14, 2014, accessed: 2015.05.06.

[47] Statistic Brain Research Institute, �Average Cost of Hard Drive Storage,� http:

//www.statisticbrain.com/average-cost-of-hard-drive-storage/, November, 11,

2014, accessed: 2015.05.06.

[48] MKOMO, �A History of Storage Cost (update),� http://www.mkomo.com/

cost-per-gigabyte-update, March, 9, 2014, accessed: 2015.05.06.

[49] G. D. I. S. R. Fontana and T. Group, �Volumetric Density Trends (TB/in3) -

Tape, HDD, NAND FLASH, Blu-Ray,� http://www.digitalpreservation.gov/

meetings/documents/storage14/Fontana_Volumetric%20Density%20Trends%

20for%20Storage%20Components%20--%20LOC%2009222014.pdf, August,

22, 2014, library Of Congress meeting, Designing Storage Architectures for

Digital Collections. Accessed: 2015.05.01.

[50] HGST Press room, �World's First 10TB HDD with the Lowest $/TB

and Watt/TB,� http://www.hgst.com/press-room/press-releases/HGST-

unveils-intelligent-dynamic-storage-solutions-to-transform-the-data-center,

September, 9, 2014, accessed: 2015.05.06.

[51] S. Anthony, �Seagate Starts Shipping 8TB Hard Drives, with 10TB and HAMR

on the Horizon,� http://www.extremetech.com/computing/186624-seagate-

starts-shipping-8tb-hard-drives-with-10tb-and-hamr-on-the-horizon, July, 21,

2014, accessed: 2015.05.06.

[52] D. Riley, �Surveillance Hard Drive Shoot-Out: WD And Seagate

Square O�,� http://www.tomshardware.com/reviews/surveillance-hard-drive-

performance,3831-5.html, June, 19, 2014, accessed: 2015.05.06.

http://www.jcmit.com/diskprice.htm

http://www.jcmit.com/diskprice.htm

http://www.statisticbrain.com/average-cost-of-hard-drive-storage/

http://www.statisticbrain.com/average-cost-of-hard-drive-storage/

http://www.mkomo.com/cost-per-gigabyte-update

http://www.mkomo.com/cost-per-gigabyte-update

http://www.digitalpreservation.gov/meetings/documents/storage14/Fontana_Volumetric%20Density%20Trends%20for%20Storage%20Components%20--%20LOC%2009222014.pdf



http://www.hgst.com/press-room/press-releases/HGST-unveils-intelligent-dynamic-storage-solutions-to-transform-the-data-center

http://www.hgst.com/press-room/press-releases/HGST-unveils-intelligent-dynamic-storage-solutions-to-transform-the-data-center

http://www.extremetech.com/computing/186624-seagate-starts-shipping-8tb-hard-drives-with-10tb-and-hamr-on-the-horizon

http://www.extremetech.com/computing/186624-seagate-starts-shipping-8tb-hard-drives-with-10tb-and-hamr-on-the-horizon

http://www.tomshardware.com/reviews/surveillance-hard-drive-performance,3831-5.html

http://www.tomshardware.com/reviews/surveillance-hard-drive-performance,3831-5.html

89

[53] J. Clark, �HDDs Versus SSDs in the Data Center,� http://www.

datacenterjournal.com/hdds-versus-ssds-in-the-data-center/, November, 20,

2012, accessed: 2015.05.06.

[54] A. Jan Willem, �SSD Prices Drop 20Cheapest Drive,� http://www.myce.com/

news/ssd-prices-drop-20-compared-to-august-0-35-per-gb-for-cheapest-drive-

73603/, November, 28, 2014, accessed: 2015.05.06.

[55] J. C. McCallum, �Flash Memory Prices (2003-2014),� http://www.jcmit.com/

�ashprice.htm, September, 14, 2014, accessed: 2015.05.06.

[56] J. Brodkin, �Why Facebook Thinks Blu-ray Discs Are Perfect for the

Data Center,� http://arstechnica.com/information-technology/2014/01/31/

why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/, January,

31, 2014, accessed: 2015.05.07.

[57] T. Trader, �Blu-ray for Cold Storage, What About Tape?� http://www.

hpcwire.com/2014/02/13/blu-ray-cold-storage-tape/, February, 13, 2014, ac-

cessed: 2015.05.07.

[58] J. Niccolai, �Facebook Puts 10,000 Blu-ray Discs in Low-Power Storage Sys-

tem,� http://www.computerworld.com/article/2487031/data-center/facebook-

puts-10-000-blu-ray-discs-in-low-power-storage-system.html, January, 28,

2014.

[59] Eran, �Saving Data Center Power by Reducing HDD

Spin Speed,� http://www.opencompute.org/blog/saving-data-center-power-

by-reducing-hdd-spin-speed/, Open Compute Project, August 2011, accessed:

2015.03.13. [Online]. Available: http://www.opencompute.org/blog/saving-

data-center-power-by-reducing-hdd-spin-speed/

[60] Seagate, �Breaking Capacity Barriers With Seagate Shingled Magnetic Record-

ing,� http://www.seagate.com/gb/en/tech-insights/breaking-areal-density-

barriers-with-seagate-smr-master-ti/, May, 06, 2015, accessed: 2015.05.06.

[61] N. Nishikawa, M. Nakano, and M. Kitsuregawa, �Energy E�cient Storage Man-

agement Cooperated with Large Data Intensive Applications,� in IEEE 28th

International Conference on Data Engineering (ICDE). IEEE, 2012, pp. 126�

137.

[62] J. Wan, C. Yin, J. Wang, and C. Xie, �A New High-Performance, Energy-

E�cient Replication Storage System with Reliability Guarantee,� in IEEE 28th

http://www.datacenterjournal.com/hdds-versus-ssds-in-the-data-center/

http://www.datacenterjournal.com/hdds-versus-ssds-in-the-data-center/

http://www.myce.com/news/ssd-prices-drop-20-compared-to-august-0-35-per-gb-for-cheapest-drive-73603/



http://www.jcmit.com/flashprice.htm

http://www.jcmit.com/flashprice.htm

http://arstechnica.com/information-technology/2014/01/31/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/

http://arstechnica.com/information-technology/2014/01/31/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/

http://www.hpcwire.com/2014/02/13/blu-ray-cold-storage-tape/

http://www.hpcwire.com/2014/02/13/blu-ray-cold-storage-tape/

http://www.computerworld.com/article/2487031/data-center/facebook-puts-10-000-blu-ray-discs-in-low-power-storage-system.html

http://www.computerworld.com/article/2487031/data-center/facebook-puts-10-000-blu-ray-discs-in-low-power-storage-system.html

http://www.opencompute.org/blog/saving-data-center-power-by-reducing-hdd-spin-speed/




http://www.seagate.com/gb/en/tech-insights/breaking-areal-density-barriers-with-seagate-smr-master-ti/

http://www.seagate.com/gb/en/tech-insights/breaking-areal-density-barriers-with-seagate-smr-master-ti/

90

Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 2012,

pp. 1�6.

[63] A. Taal, D. Drupsteen, M. X. Makkes, and P. Grosso, �Storage to Energy: Mod-

eling the Carbon Emission of Storage Task O�oading Between Data Centers,�

in IEEE 11th Consumer Communications and Networking Conference (CCNC).

IEEE, 2014, pp. 50�55.

[64] J. Shuja, K. Bilal, S. A. Madani, M. Othman, R. Ranjan, P. Balaji, and S. U.

Khan, �Survey of Techniques and Architectures for Designing Energy-E�cient

Data Centers,� IEEE Systems Journal, vol. PP, pp. pages 1 � 13, July 2014.

[65] A. G. Yoder, �Energy E�cient Storage Technologies for Data Centers,� inWork-

shop on Energy-E�cient Design (WEED), 2010.

[66] Q. Zhu, Z. Chen, L. Tan, Y. Zhou, K. Keeton, and J. Wilkes, �Hibernator:

Helping Disk Arrays Sleep Through the Winter,� in ACM SIGOPS Operating

Systems Review, vol. 39. ACM, 2005, pp. 177�190.

[67] X. Bai, M. Li, B. Chen, W.-T. Tsai, and J. Gao, �Cloud Testing Tools,� in IEEE

6th International Symposium on Service Oriented System Engineering (SOSE).

IEEE, 2011, pp. 1�12.

[68] W. Zhao, Y. Peng, F. Xie, and Z. Dai, �Modeling and Simulation of Cloud

Computing: A Review,� in IEEE Asia Paci�c Cloud Computing Congress (AP-

CloudCC). IEEE, 2012, pp. 20�24.

[69] A. Oujani and R. Jain, �A Survey on Cloud Computing Simulations and

Cloud Testing,� http://students.cec.wustl.edu/~azinoujani/, April 2013, ac-

cessed: 2015.04.28.

[70] T. Issariyakul and E. Hossain, Introduction to Network Simulator NS2.

Springer Science & Business Media, 2011.

[71] A. Medina, A. Lakhina, I. Matta, and J. Byers, �BRITE: An Approach to Uni-

versal Topology Generation,� in Modeling, Analysis and Simulation of Com-

puter and Telecommunication Systems, 2001. Proceedings. 9th International

Symposium on Modeling, Analysis and Simulation of Computer and Telecom-

munication Systems (MASCOTS 2001). IEEE, August 2001, pp. 346�353.

[72] A. Beloglazov and R. Buyya, �Optimal Online Deterministic Algorithms and

Adaptive Heuristics for Energy and Performance E�cient Dynamic Consolida-

tion of Virtual Machines in Cloud Data Centers,� Concurrency and Computa-

tion: Practice and Experience, vol. 24, no. 13, pp. 1397�1420, 2012.

http://students.cec.wustl.edu/~azinoujani/

91

[73] K.-D. Lange, �Identifying Shades of Green: The SPECpower Benchmarks.�

IEEE Computer, vol. 42, no. 3, pp. 95�97, 2009.

[74] B. Wickremasinghe, R. N. Calheiros, and R. Buyya, �Cloudanalyst: A

Cloudsim-Based Visual Modeller for Analysing Cloud Computing Environ-

ments and Applications,� in 24th IEEE International Conference on Advanced

Information Networking and Applications (AINA). IEEE, 2010, pp. 446�452.

[75] Cloud Computing and Distributed Systems (CLOUDS) Laboratory, �CloudSim:

A Framework for Modeling and Simulation of Cloud Computing Infras-

tructures and Services,� http://www.cloudbus.org/cloudsim/, 2013, accessed:

2015.04.30.

[76] R. Buyya and M. Murshed, �Gridsim: A Toolkit for the Modeling and Simula-

tion of Distributed Resource Management and Scheduling for Grid Computing,�

Concurrency and computation: practice and experience, vol. 14, no. 13-15, pp.

1175�1220, 2002.

[77] R. Calheiros, B. Louis, and tanya33k, �Issue 24: HarddriveStorage Con-

fused, in CloudSim v3.0.3,� https://code.google.com/p/cloudsim/issues/detail?

id=24, February 2012, accessed: 2015.04.08.

[78] R. Calheiros, B. Louis, A. Ashly, and Tanya, �HarddriveStorage Con-

fused, in CloudSim v3.0.3,� https://groups.google.com/d/topic/cloudsim/

BkVfMwGhFLM/discussion, February 2012, accessed: 2015.04.03.

[79] R. Calheiros, �Class SanStorage,� http://www.cloudbus.org/cloudsim/doc/

api/org/cloudbus/cloudsim/SanStorage.html, April 2009, accessed: 2015.04.08.

[80] T. Sturm, �StorageCloudSim Repository Source Code,� https://github.com/

toebbel/StorageCloudSim, February 2015, accessed: 2015.04.30.

[81] N. Grozev and M. Alrokayan, �CloudSimEx Repository Source Code,� https:

//github.com/Cloudslab/CloudSimEx, October 2014, accessed: 2015.04.30.

[82] S. Long and Y. Zhao, �A Toolkit for Modeling and Simulating Cloud Data

Storage: An Extension to Cloudsim,� in International Conference on Control

Engineering and Communication Technology (ICCECT). IEEE, 2012, pp.

597�600.

[83] T. Sturm, �Implementation of a Simulation Environment for Cloud

Object Storage Infrastructures,� http://downloads.tobiassturm.de/projects/

storagecloudsim/thesis.pdf, 2013 August, accessed: 2015.04.08.

http://www.cloudbus.org/cloudsim/

https://code.google.com/p/cloudsim/issues/detail?id=24

https://code.google.com/p/cloudsim/issues/detail?id=24

https://groups.google.com/d/topic/cloudsim/BkVfMwGhFLM/discussion

https://groups.google.com/d/topic/cloudsim/BkVfMwGhFLM/discussion

http://www.cloudbus.org/cloudsim/doc/api/org/cloudbus/cloudsim/SanStorage.html

http://www.cloudbus.org/cloudsim/doc/api/org/cloudbus/cloudsim/SanStorage.html

https://github.com/toebbel/StorageCloudSim

https://github.com/toebbel/StorageCloudSim

https://github.com/Cloudslab/CloudSimEx


http://downloads.tobiassturm.de/projects/storagecloudsim/thesis.pdf

http://downloads.tobiassturm.de/projects/storagecloudsim/thesis.pdf

92

[84] Nikolayg, Alrokayan, and Mohammed, �CloudSim Project,� https://github.

com/Cloudslab/CloudSimEx, October 2012, accessed: 2015.04.10.

[85] N. Grozev and R. Buyya, �Performance Modelling and Simulation of Three-tier

Applications in Cloud and Multi-cloud Environments,� The Computer Journal,

vol. 58, no. 1, pp. 1�22, 2015.

[86] N. Grozev, �CloudSim and CloudSimEx [Part 2] - Disk Operations,�

https://nikolaygrozev.wordpress.com/2014/06/13/cloudsim-and-cloudsimex-

part-2-disk-opeartions/, June 2014, accessed: 2015.04.30.

[87] PC Tech Guide, �Hard Disk (Hard Drive) Performance - Transfer Rates,

Latency, and Seek Times,� http://www.pctechguide.com/hard-disks/hard-

disk-hard-drive-performance-transfer-rates-latency-and-seek-times, November

2011, accessed: 2015.04.10.

[88] W. Rachele, �What Are Characteristics of a Hard Drive?� http://www.ehow.

com/list_6684495_characteristics-hard-drive_.html, March 2012, accessed:

2015.04.10.

[89] Red Hat Enterprise Linux, �5.4. Hard Drive Performance Characteristics,�

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_

Linux/4/html/Introduction_To_System_Administration/s1-storage-

perf.html, accessed: 2015.04.10.

[90] The PC Guide, �Internal Data Transfer Rate,� http://www.pcguide.com/ref/

hdd/perf/intRate-c.html, April 2001, accessed: 2015.04.10.

[91] D. J. Lingenfelter, A. Khurshudov, and D. M. Vlassarev, �E�cient Disk Drive

Performance Model for Realistic Workloads,� IEEE Transactions on Magnetics,

vol. 50, no. 5, pp. 1�9, 2014.

[92] P. Jan, �Average Seek Time of a Computing,� http://faculty.plattsburgh.edu/

jan.plaza/teaching/papers/seektime.html, 1999, accessed: 2015.04.20.

[93] G. Urdaneta, G. Pierre, and M. van Steen, �Wikipedia Workload Analysis for

Decentralized Hosting,� Elsevier Computer Networks, vol. 53, no. 11, pp. 1830�

1845, July 2009, http://www.globule.org/publi/WWADH_comnet2009.html.

[94] C. Kozierok, �Zoned Bit Recording,� http://www.pcguide.com/ref/hdd/geom/

tracks_ZBR.htm, April, 17, 2001, accessed: 2015.05.13.



https://nikolaygrozev.wordpress.com/2014/06/13/cloudsim-and-cloudsimex-part-2-disk-opeartions/

https://nikolaygrozev.wordpress.com/2014/06/13/cloudsim-and-cloudsimex-part-2-disk-opeartions/

http://www.pctechguide.com/hard-disks/hard-disk-hard-drive-performance-transfer-rates-latency-and-seek-times

http://www.pctechguide.com/hard-disks/hard-disk-hard-drive-performance-transfer-rates-latency-and-seek-times

http://www.ehow.com/list_6684495_characteristics-hard-drive_.html

http://www.ehow.com/list_6684495_characteristics-hard-drive_.html

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Introduction_To_System_Administration/s1-storage-perf.html



http://www.pcguide.com/ref/hdd/perf/intRate-c.html

http://www.pcguide.com/ref/hdd/perf/intRate-c.html

http://faculty.plattsburgh.edu/jan.plaza/teaching/papers/seektime.html

http://faculty.plattsburgh.edu/jan.plaza/teaching/papers/seektime.html

http://www.globule.org/publi/WWADH_comnet2009.html

http://www.pcguide.com/ref/hdd/geom/tracks_ZBR.htm

http://www.pcguide.com/ref/hdd/geom/tracks_ZBR.htm

Appendix 1. CloudSimDisk Source Code

CloudSimDisk has been developed on Eclipse Luna SR2 (4.4.2), Windows 64 Bit.

The entire source code is available on GitHub at https://github.com/Udacity2048/

CloudSimDisk (see Figure A1.1).

Figure A1.1. CloudSimDisk home page on GitHub.

CloudSimDisk JavaDoc is provided in the docs folder. The required jar �les are gath-

ered in the jars folder. The files folder contains necessary �les for CloudSimDisk

simulation examples (request arrival times, data �le names and sizes, Wikipedia

trace, etc).

Future improvements of CloudSimDisk will be uploaded on this repository. Also,

this github will contain discussion threads about potential issues, general questions

and new ideas, and a wiki page providing detail information on the most raised

questions.

(continues)

https://github.com/Udacity2048/CloudSimDisk

https://github.com/Udacity2048/CloudSimDisk

Appendix 1. (continued)

The CloudSim core simulation engine is provided with CloudSimDisk (see Figure

A1.2) because several Java Access Modi�ers had to be changed from "Private" to

"Protected" in order to allow class extensions for CloudSimDisk.

Figure A1.2. CloudSim core simulation engine on CloudSimDisk GitHub repository.

Appendix 2. Run a First Example

As a start, a new user should run MyExample0. This example aims to understand the

basic execution of CloudSimDisk simulations. The scenario consists of 1 request,

sent to the data center at 0.5 second. The request contains 1 FileA of 1 MB

that need to be stored. The persistent storage is composed of 1 HDD (Seagate

Enterprise 6TB Ref:ST6000VN0001). No �les are retrieved. No �le needed in the

storage system before the simulation start.

The example can be �nd in cloudsimdisk.examples package. The expected output

of the simulation is depicted in Figure A2.1. Note that the time at which the �le is

added (0.527915 second) can be di�erent for the reason that the transaction time

(0.027915 second) depends on the randomness of the seek time and the rotation

latency. Hence, the energy consumed (0.315 Joule) can vary too.

Figure A2.1. CloudSimDisk console output for "MyExample0".

Appendix 3. Hard Drive Disk Model

This appendix section presents the source code of an HDD model implemented in

CloudSimDisk (see Figure A3.1). Each HDD model is following the same pattern.

The brand and the reference of the modeled disk are included in the name of the

class. Thus, the user knows which disk is modeled by this class.

The core of the class is formed by a "switch-case" statement where each case cor-

responds to one disk characteristic. Later, the user can add new characteristics by

adding new cases.

Figure A3.1. CloudSimDisk HDD model of the Seagate Enterprise NAS 6TB (Ref:ST6000VN0001).

Then, the abstract class StorageModelHdd implements one method for each HDD

characteristic (see Figure A3.2). The name of these methods are semantically un-

derstandable for the end user.

(continues)


Figure A3.2. The abstract class extended by all Hard Disk Drive models.

Appendix 4. Hard Drive Disk Power Model

This appendix section presents the source code of an HDD power model implemented

in CloudSimDisk (see Figure A4.1). Each HDD power model is following the same

pattern. The brand and the reference of the modeled disk are included in the name

of the class. Thus, the user knows which disk is modeled by this class.

The core of the class is formed by a "switch-case" statement where each case cor-

responds to the power consumption in one speci�c operating mode. Later, the user

can add new power data by adding new cases.

Figure A4.1. CloudSimDisk HDD power model of the HGST Ultrastar 900GB (Ref:HUC109090CSS600).

Then, the abstract class PowerModelHdd implements one method for each power

mode (see Figure A4.2). The name of these methods are semantically understand-

able for the end user.

(continues)


Figure A4.2. The abstract class extended by all Hard Disk Drive power models.

MASTER'S THESIS - Karan Mitrakaranmitra.me/.../2015/08/CloudSimDisk...CloudSim.pdf · MASTER'S THESIS CloudSimDisk Energy-Aware Storage Simulation in CloudSim Baptiste Louis 2015

Documents