MASTER'S THESIS CloudSimDisk Energy-Aware Storage Simulation in CloudSim Baptiste Louis 2015 Master of Science (120 credits) Computer Science and Engineering Luleå University of Technology Department of Computer Science, Electrical and Space Engineering
100
Embed
MASTER'S THESIS - Karan Mitrakaranmitra.me/.../2015/08/CloudSimDisk...CloudSim.pdf · MASTER'S THESIS CloudSimDisk Energy-Aware Storage Simulation in CloudSim Baptiste Louis 2015
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MASTER'S THESIS
CloudSimDiskEnergy-Aware Storage Simulation in CloudSim
Baptiste Louis2015
Master of Science (120 credits)Computer Science and Engineering
Luleå University of TechnologyDepartment of Computer Science, Electrical and Space Engineering
Luleå University of Technology
Department of Computer Science, Electrical and Space Engineering
PERCCOM Master Program
Master's Thesis in
PERvasive Computing & COMmunicationsfor sustainable development
Baptiste Louis
CLOUDSIMDISK: ENERGY-AWARE STORAGESIMULATION IN CLOUDSIM
2015
Supervisors: Professor Christer Åhlund - Luleå University of Technology
Doctor Karan Mitra - Luleå University of Technology
Doctor Saguna Saguna -Luleå University of Technology
Examiners: Assoc. Professor Karl Andersson - Luleå University of Technology
Professor Eric Rondeau - University of Lorraine
Professor Jari Porras - Lappeeranta University of Technology
This thesis is prepared as part of an European Erasmus Mundus program
PERCCOM - Pervasive Computing & COMmunications for sustainable develop-
ment.
This thesis has been accepted by partner institutions of the consortium (cf. UDL-
DAJ, no1524, 2012 PERCCOM agreement).
Successful defense of this thesis is obligatory for graduation with the following na-
tional diplomas:
• Master in Complex Systems Engineering (University of Lorraine);
• Master of Science in Technology (Lappeenranta University of Technology);
• Degree of Master of Science (120 credits) - Major: Computer Science and
Engineering; Specialisation: Pervasive Computing and Communications for
Sustainable Development (Luleå University of Technology).
ABSTRACT
Luleå University of Technology
Department of Computer Science, Electrical and Space Engineering
PERCCOM Master Program
Baptiste Louis
CloudSimDisk: Energy-Aware Storage Simulation in CloudSim
Master's Thesis - 2015.
99 pages, 51 �gures, 11 tables, and 4 appendices.
Keywords: Modelling and Simulation, Energy Awareness, CloudSim, Storage, Cloud
Computing.
Cloud Computing paradigm is continually evolving, and with it, the size and the
complexity of its infrastructure. Assessing the performance of a Cloud environment
is an essential but strenuous task. Modeling and simulation tools have proved their
usefulness and powerfulness to deal with this issue. This master thesis work con-
tributes to the development of the widely used cloud simulator CloudSim and pro-
poses CloudSimDisk, a module for modeling and simulation of energy-aware storage
in CloudSim. As a starting point, a review of Cloud simulators has been conducted
and hard disk drive technology has been studied in detail. Furthermore, CloudSim
has been identi�ed as the most popular and sophisticated discrete event Cloud simu-
lator. Thus, CloudSimDisk module has been developed as an extension of CloudSim
v3.0.3. The source code has been published for the research community. The simula-
tion results proved to be in accordance with the analytic models, and the scalability
of the module has been presented for further development.
ACKNOWLEDGMENTS
I would like to express my gratitude to my supervisor Professor Christer Åhlund for
the con�dence that he has placed in me and for his continuous guidance during this
research work. It is my honor to accomplish this master thesis under his supervision.
As well, I would like to thank Doctor Karan Mitra for his support and his valuable
knowledge in term of cloud computing and research work.
Thanks to Doctor Saguna for her advices and her daily dose of joviality.
Thanks to my PERCCOM classmates, especially Rohan Nanda and Khoi Ngo who
were with me at Skellefteå.
Thanks to Karl Anderson and Robert Brannstrom for their presence, their accessi-
bility and their assistance during my thesis work.
Thanks to Rodrigo Calheiros (Melbourne University) for his feedbacks on my im-
plementation.
Thanks to Eric Rondeau, PERCCOM coordinator, and all the PERCCOM team
Each event stores a unique Tag number corresponding to a speci�c happening (Ex-
ample: VM_DATACENTER_EVENT, END_OF_SIMULATION) or indicating the type of ac-
tion to perform by the recipient (Example: VM_CREATE, CLOUDLET_SUBMIT). Hence,
the recipient of an event has to implement a method which executes this particular
tag number. The execution can be a simple "print out" for the user or a more com-
plex sequence of sub methods which will eventually generate new events. Finally, an
event have a data parameter of type Object used to carry a Cloudlet or datacenter
characteristics or any other information that need to be transfer with the event.
2.3.4 Future Queue and Deferred Queue
Unlike GridSim [76], the core simulation of CloudSim framework is a dynamic en-
vironment by reason of two event queues: Future Queue and Deferred Queue (see
Figure 13). All events generated by entities during the runtime are added to the
Future Queue and sorted by their time parameter (tX ), "time at which the event
should be delivered to its destination entity [for execution]" [43]. As explained in
2.3.3, the execution of an event can result in the creation of new events, with similar
or di�erent "Time" parameter. In other word, events generation numbers (Event
X ) do not determine the order in Future Queue. Afterwards, the "top of the queue"
event in Future Queue is moved to the Deferred Queue and will be processed at the
next clock tick. If next event in the Future Queue has the same time, it will be
moved as well, and so on.
Figure 13. Example of queue management during three clock ticks.
In the example diagrammed in Figure 13, Event1 is executed �rst and generates
37
Event3. Then Event2, having the same Time parameter, is executed too and gen-
erates Event4 and Event5. No more events are in the Deferred Queue at this time.
Afterwards, Event4 being at the "top of the queue" in Future Queue is moved to
the Deferred Queue. A new event is in Deferred Queue so it is executed during
the next tick and it generates Event7. No more events are in the Deferred Queue
so the Event5 being at the "top of the queue" in Future Queue is moved to the
Deferred Queue. Also, Event6 has the same time parameter so it is moved as well
in Deferred Queue. The next step would be to execute Event4, then Event5 which
will eventually generate new events.
2.4 Summary
In this chapter, HDD has been identi�ed as the main technology used in today's
data centers, mainly due to its low cost and high areal density. From an energy
perspective, cold storage has been the topic of most of the recent researches related
to energy e�cient storage in cloud environment.
Further, Cloud Simulation has been identi�ed as a cost-e�ective solution to perform
experiments in a controllable, stable and repeatable way. Numerous Cloud simula-
tors have been compared. As a consequence of its popularity, its availability and
its extensibility, CloudSim has been the choice of this thesis work. The analysis
revealed a lack in storage modeling that has not been yet overcome.
Last, a background of CloudSim operation has been presented to prepare, in the
next chapter, the introduction of the CloudSimDisk module.
38
3 CLOUDSIMDISK: ENERGY-AWARE STORAGE
SIMULATION IN CLOUDSIM
This chapter presents CloudSimDisk, a module for energy aware storage simulation
in CloudSim simulator. The �rst part explains the objectives of the module, and
the module requirements. Next, the main concepts of CloudSimDisk are described,
such as HDD model, HDD power model, data cloudlet and data center persistent
storage. Then, the execution �ow and the packages diagram is explained. At last,
energy awareness and scalability for CloudSimDisk are discussed.
3.1 Module Requirements
CloudSimDisk has been developed according to di�erent requirements, which pro-
vide several advantages to the module and explain the architectural design choices.
The main requirement was to respect the architecture and the core processing of
CloudSim. In fact, CloudSimDisk is a module for CloudSim so it has to operate in
the same way than CloudSim. Also, similar design choices will reduce the learning
curve of CloudSim users who want to adopt CloudSimDisk. Further, it will encour-
age participations and contributions for the future development of the module.
Another important requirement is the scalability of the module. HDD technology is
complex to model due to the electromechanical nature of the devices. Additionally,
the technology is evolving rapidly. Hence, CloudSimDisk has to be developed with
the idea that implementing more characteristics, more features, should be possible
later. This capability is a positive argument for the adoption of CloudSimDisk.
An additional requirement was to consider �rst only the main parameters of the
HDD technology, and to provide energy consumption results based on this simpli-
�ed model. In fact, this work has to be achieved within strict deadlines, so some
development decisions such as this one has been taken.
3.2 CloudSimDisk Module
This section introduces the CloudSimDisk module. At �rst, the Hard Disk Drive
(HDD) model and the associated HDD power model are presented. Then, the data
cloudlet object, or storage task, is explained in details. Further, the data center
persistent storage is de�ned.
39
3.2.1 HDD Model
As explained in Chapter 1, HDDs are still today the most used storage technology in
Cloud computing environment. Unfortunately, CloudSim provides only one model
of HDD reused from GridSim simulator [76], barely scalable and including some
mistakes in its algorithm [78]. To overcome this barrier, CloudSimDisk module
implements a new HDD model.
According to [87] [88] [89], the main characteristics a�ecting the overall HDD perfor-
mance are the mechanical components, combination of the read/write head transver-
sal movement and the platter rotational movement. Additionally, the internal data
transfer rate, often called sustained rate, has been identi�ed as a bottleneck of the
overall data transfer rate of an HDD [90]. More recently, [91] proposed a HDD model
based on 23 input parameters which achieve between 91% to 96.5% accuracy. Figure
14 shows a diagram of model parameters used in their implementation, organized by
functional category. Each parameters is described in detail in order of importance:
�rst parameter is the position time, "the sum of the seek time and the rotational
latency", and second is the transfer time, "the time required to transfer one sector
of data to or from the media", namely the Internal Data Transfer Time.
Figure 14. Diagram of model parameters [91].
40
A new package, namely cloudsimdisk.models.hdd, has been created, and contains
classes modeling HDD storage components. Each model implements one method,
namely getCharacteristic(int key). In this method, the parameter key is an
integer corresponding to a speci�c characteristic of the HDD. To ensure the consis-
tency between di�erent HDD models, all the classes extend one common abstract
class, which declares the getCharacteristic(int key) method. Thereby, the pa-
rameter key corresponds to the same HDD characteristic in each model.
However, it is not convenient for developers or users to play with key numbers.
Hence, the getCharacteristic(int key) method has been declared as Protected
and cannot be used directly. Instead, the common abstract class implements a
getter for each HDD characteristic (getCapacity(), getAvgSeekTime(), etc.).
The getCharacteristic(int key) method is used only internally to retrieve the
required characteristic. As a result, methods accessed by users are semantically un-
derstandable. Table 3 inventories the available methods declared in HDD models to
retrieve HDD characteristics.
Table 3. CloudSimDisk HDD characteristics.
KEY
0getManufacturerName()
The name of the Manufacturer (Ex: Seagate Technology, Toshiba, West-ern Digital).
1getModelNumber()
The unique manufacturer reference (Ex: ST4000DM000).
2getCapacity()
The capacity of the HDD in megabyte (MB).
3getAvgRotationLatency()
The average rotation latency of the disk which is de�ned as half theamount of time it takes for the disk to make one full revolution, in second(s), directly dependent on the disk rotation speed in Rotation Per Minute(RPM).
4getAvgSeekTime()
The average seek time of the disk which is de�ned as the average timeneeded to move the read/write head from track x to track y, also corre-sponding to one-third of the longest possible seek time, moving from theoutermost track to the innermost track, assuming an uniform distributionof requests [92].
5getMaxInternalDataTransferRate()
The maximum internal data transfer rate which is de�ned as the rate atwhich data is transferred physically from the disk to the internal bu�er,also called Sustained Data Rate or Sustained Transfer Rate.
41
3.2.2 HDD Power Model
For the toolkit 3.0, Anton Beloglazov has included a power package to CloudSim,
based on his publication a year before [72]. This implementation provides the nec-
essary algorithm for modeling and simulation of energy-aware computational re-
sources, i.e. Host and Virtual Machines. However, it does not provide energy
awareness to the storage component.
Thus, similarly to 3.2.1, the package cloudsimdisk.power.models.hdd has been
created in accordance with the power package in place. Inside, the abstract class
PowerModelHdd.java implements semantically understandable getters to retrieve
the power data of a speci�c HDD in a particular operating mode. Table 4 invento-
ries the available operating power mode declared in HDD power models.
Table 4. CloudSimDisk HDD power mode.
KEY MODE DESCRIPTION
0 Active The disk is handling a request.1 Idle The disk is spinning but there is no activity on it.
3.2.3 Data Cloudlet
As explained in 2.3.1, CloudSimDisk models a request with a Cloudlet component.
However, the CloudSim implementation of this component interacts mainly with the
Host's CPU hardware element. No examples of interactions with storage element
are provided and no results are printed out. Thus, an extension of the CloudSim
Cloudlet is proposed by CloudSimDisk. The default Cloudlet constructor with eight
parameters has been reused. Additionally, two new parameters have been de�ned:
• requiredFiles: a list of �lenames that need to be retrieved by the cloudlet.
These requested �les have to be stored on the persistent storage of the Data-
center before the cloudlet is executed.
• dataFiles: a list of �les that need to be stored by the cloudlet. These new �les
will be added to the persistent storage of the Datacenter during the cloudlet
processing.
42
Note that requiredFiles has been already implemented in CloudSim v3.0.3 but
the constructor parameter to set this variable has been called fileList. However,
this list is not a list of File object, but a list of String corresponding to �lenames.
To make matters even more confusing, the new parameter dataFiles implemented
in CloudSimDisk is a list of File. Thus, in order to clarify things, the fileList
parameter has not been reused by CloudSimDisk. Instead, requiredFiles and
dataFiles are parameters of the new Cloudlet's constructor (see Figure 15).
Figure 15. CloudSimDisk Cloudlet constructor.
3.2.4 Data Center Persistent Storage
In CloudSim, one parameter of the data center entity is a list of Storage elements.
This list models the data center persistent storage. Unfortunately, CloudSim does
not provide any example how to interact with this component.
CloudSimDisk's aim is to provide a module for storage modeling and simulation
in CloudSim. Thus, an extension of the CloudSim data center model has been
realized by CloudSimDisk. Methods have been deleted, overridden and created in
order to interact only with the data center persistent storage. As a result, the data
center model implements all necessary algorithms to process requiredFiles and
dataFiles of a Cloudlet when one is received.
3.3 Execution Flow
This section diagrams the core execution of CloudSimDisk, including communication
between components, events passing activity and main methods execution.
43
As a starting point, CloudSim.startSimulation() starts all the entities and gen-
erates automatically the �rst events of the whole simulation process. At 0.1 second,
one of these events calls the method submitCloudlets() of the broker, responsible
to send Cloudlets one by one to the data center. Therefore, for each Cloudlet, one
event is scheduled at destination to the data center (see Figures 16 and 17) 4. These
events have the Tag CLOUDLET_SUBMIT, a scheduling time de�ned by the distribu-
tion chosen by the user, and it contains the Cloudlet as "event-data". Next, data
center calculates the transaction time for each �le of the Cloudlet that need to be
added to, or retrieved from, the persistent storage. At the same time, it generates a
con�rmation event at destination to itself, with the Tag CLOUDLET_FILE_DONE and
delayed by the calculated transaction time plus the eventual waiting delay due to
request queue on the target disk.
Figure 16. Event passing sequence diagram for "Basic Example 1", 3 Cloudlets.
Figure 16 presents a simple example where the transaction time of each cloudlet
is inferior to the cloudlet arrival time intervals. Hence, there is no waiting delay.
Figure 17 presents an example based on real word workload (wikipedia). In this
case, the cloudlet arrival rate is more important, so the interval time between two
cloudlets is smaller. As a results, cloudlets have to wait in the disk queue before
execution.4For the sake of simplicity, each Cloudlet contains only one �le that needs to be added to the
persistent storage, itself composed of only one HDD.
As a reminder, the Transaction Time is the sum of the Rotation Latency, the Seek
Time and the Transfer Time, obtained according to the target HDD characteristics
(see 3.2.1). The waiting delay is the time the request spends in the disk queue,
waiting to be executed. If no requests are in the queue, the waiting time is zero.
Figure 18 presents in detail what happens when the data center receives an event
Tag CLOUDLET_SUBMIT. It begins by retrieving the Cloudlet object from the data
parameter of the event. Then, it retrieves the Cloudlet DataFiles which is a list
of �les that contains zero, one or many �les (see �rst part of Figure 18). Each
�le is added to the persistent storage according to the chosen algorithm (see 3.2.4).
The transaction time for an operation is returned by the HDD, and also stored in
the attributes of the File so that we can access this information later. After each
operation, the method processOperationWithStorage(...) is called to handle the
waiting delay of the request, the queue size of the HDD and the operation mode of
the HDD. This method is the fruit of long thoughts, gathering logical, mathematical
and engineering skills. Also, this method generates the CLOUDLET_FILE_DONE events
that will be used for output results.
When all the data �les have been handled, the same scenario is performed for the
required �les of the Cloudlet, except the list of required �les is a list of �lenames
that need to be retrieved, and so getFile(Filename_n) returns the requested File
(see second part of Figure 18).
45
Figure18.Process
when
datacenterreceives
aCLOUDLET_SUBMIT
event.
46
On the HDD side, adding a �le can be decomposed in three phases, chronologically
organized (see Figure 19):
• Firstly, the transaction time is determined by retrieving successively the seek
time, the rotation latency and the transfer time for the concerned �le. All this
information is dependent on the HDD model used.
• Secondly, the list of �les, the list of �lenames and the space used on the HDDs
are updated relatively to the �le added. This action is necessary to keep a
track on the content of each HDD. Also, it facilitates the implementation of
�le stripping.
• Thirdly, the �le's attributes "transactionTime" and "RessourceID" are set
respectively by the transaction time determined in phase I and the ID of the
concerned HDD. Hence, this information can be reused later, for example, to
analyze on which HDD �les are added.
After these three phases, the addFile method returns the transaction time. Note
that phase II does not exist for getting a �le since the �les on the HDDs are not
changed, so the list of �les, the list of �lenames and the space used on the HDD are
unchanged. Also, the "ResourceID" is not modi�ed in phase III since the �le is still
stored by the same device. Moreover, the �le object needs to be retrieved from the
list of �les in the HDD before phase I. In order to do that, an iterator is instantiated
to run through the list. A while loop compares the �lenames of each element in
the list until the required �le is found. If the name of the required �le does not
match any �les stored on the persistent storage, a null File object is returned by
the method getFile(fileName), otherwise the matched �le is returned.
CloudSimDisk users can de�ne any type of request arrival distribution as parameter
of the simulation, so the data center has to implement a scalable algorithm that
handle the persistent storage in any situation. This feature is provided inside the
method processOperationWithStorage().
Now, remember that the datacenter entity has to handle the persistent storage, all
along the simulation. This includes updating the storage state (idle or active) if
needed and to keep a history of the time spent in each mode by each HDD. This
task is not easy going since it mixes di�erent times, delays and durations. Moreover,
new cloudlets can arrive at any time to add or to retrieve some �les.
47
Figure19.HDD
internal
process
ofaddinga�le.
48
Figure 20a depicts an example to understand how the state of the persistent storage
is managed during a simulation. Additionally, Figure 20b shows the Java code
responsible for this process, and Table 5 summarizes the key time values. The
example consider three cloudlets, arriving at three di�erent times.
When a cloudlet arrives to interact with the persistent storage, two cases can be
identi�ed:
• The target HDD is in Idle mode. In that case5:
� the waiting time is null;
� the active end time is the current time plus the transaction time;
� the event delay is the transaction time;
� the storage is set in Active mode.
• The target HDD is in Active mode. In that case6:
� the waiting time is the active end time minus the current time;
� the active end time is increased by the transaction time;
� the event delay is the waiting time plus the transaction time.
Note that in both cases, the transaction time is retrieved from the �le's attributes
and the total active duration of the target HDD is incremented by this duration.
Further, the power needed by the HDD in active mode is retrieved, the energy is
computed (based on the transaction time), and the con�rmation event is scheduled
carrying all the previously established variables.
Table 5. Trace of HDD values related to Figure 20.
Clock() TransactionTime WaitingTime ActiveEndat EventDelayAt 0.311 s 0.014 s 0.000 s 0.325 s 0.014 sAt 0.321 s 0.008 s 0.004 s 0.333 s 0.012 sAt 0.326 s 0.002 s 0.007 s 0.335 s 0.009 s
5HDD is processing nothing.6HDD is already processing one or more request(s).
49
(a) HDD management at the datacenter level.
(b) Code snippet showing the storage's state update when receiving a new Cloudlet.
Figure 20. Example of storage management with one HDD: (a) graphic; (b) code.
50
3.4 Packages Description
CloudSimDisk is composed of 27 classes organized in 8 packages (see Figure 21), and
inherits from 8 classes of CloudSim gathered in 4 di�erent packages. Names follow
the same pattern used in CloudSim, so users will be able to use these extensions with
a minimum learning curve. The pre�x "My" has been added to some class names to
avoid confusion between CloudSim and CloudSimDisk. The package's architecture
has been thought to simplify a prospective integration in CloudSim. The overall
Class diagram is shown in Figure 22.
Figure 21. CloudSimDisk: 27 classes organized in 8 packages.
The following list describes each packages of CloudSimDisk (alphabetic order):
• cloudsimdisk contains MyCloudlet and MyDatacenter components, extended
from Cloudlet and Datacenter components of CloudSim, and MyHarddriveStor-
age component implementing the Storage interface of CloudSim;
• cloudsimdisk.core contains CloudSimTags component extended from CloudSim
to add the Tag CLOUDLET_FILE_DONE used in CloudSimDisk extension and to
implement a method converting "tag numbers" in "tag text" for logging pur-
pose;
51
• cloudsimdisk.distributions contains a basic distribution for Example0 and
Example1 that reads a simple �le, a Wikipedia distribution for Wikipedia ex-
amples that read Wikipedia traces �les, a seek time distribution characterized
by a minimum, a maximum and a mean, and a distribution tester to test
algorithm implemented in this package;
• cloudsimdisk.examples contains the three examples (MyExample0, MyEx-
ample1 and MyWikipediaExample1) provided with CloudSimDisk, classes gath-
ering constants variables used by examples, a Runner to run CloudSimDisk
examples and an Helper to assist in providing functionality to the Runner.
• cloudsimdisk.models.hdd contains three HDD models from three di�erent
manufacturer (HGST, Seagate and Toshiba) and one common abstract class
Similarly to HDD characteristic, HDD power modes are implemented in a "Switch-
case" statement that return a speci�c power consumption according to a key pa-
rameter. This key follows the same pattern for all HDD power models (see Table
4).
To implement a new HDD power mode, only two modi�cations need to be done:
1. In the target power model, a new case should be added which return the power
value of the new mode.
protected Object getPowerData(int key) {
switch (key) {
...
case <KEY_NUMBER>:
return <POWER_VALUE>; // the power value of the new mode
default:
return "n/a";
}
}
2. In the common abstract class PowerModelHdd.java, add one public method
with clear semantic which retrieves the new power value.
public double getPowerActive() {
return getPowerData(1);
}
public <TYPE> getPowerOfYourMode() {
return (CAST_OBJECT) getPowerData(<KEY_NUMBER>);
}
3.6.3 Randomized Characteristics
In real world, most of the HDD characteristics are variables, like seek time or rotation
latency. CloudSimDisk randomizes these values by applying di�erent distributions.
57
The rotation latency is generated from UniformDistr(0, 2 * avgRotationLatency)
which return values between 0 and two times the average rotation latency in a uni-
form way.
To apply a new distribution to the seek time, one modi�cation in MyHarddriveStorage
needs to be done. When the seek time is set, a number generator is created according
to the distribution wanted.
public boolean setSeekTime(double avgSeekTime) {
// previous SeekTime ditribution (to remove)
ContinuousDistribution generator = new MySeekTimeDistr(0.0002, 3 *
avgSeekTime, avgSeekTime);
// new SeekTime ditribution (to add)
ContinuousDistribution generator = new YourPersoDistribution(...);
this.genSeekTime = generator; // store generator
return true;
}
3.6.4 Data Center Persistent Storage Management
In CloudSimDisk, all the transactions are done with the persistent storage of the
data center entity. This persistent storage is a pool of HDD elements. When a
�le needs to be added to this system, the data center needs to choose one HDD in
the pool to which redirect the request. This choice will be done according to the
algorithm con�gured by the user before the simulation.
In MyDatacenter class, the method addFile(File) implements a "Switch-case"
statement where a key parameter de�ne which algorithm to apply while man-
aging incoming requests to the persistent storage. At the moment, the basics
FIRST-FOUND and ROUND-ROBIN algorithm are implemented: FIRST-FOUND
comes from CloudSim source code and ROUND-ROBIN has been added by the
CloudSimDisk team.
If a user want to implement its own algorithm, he needs to add one new case to the
"Switch-case" statement, and to write down its code.
58
public int addFile(File file) {
int key = 1;
...
switch (key) {
...
case <KEY_NUMBER>:
// write your own algorithm to manage request to the persistent
storage
break;
default:
System.out.println("ERROR: no algorithm corresponding to this
key.");
break;
}
...
}
3.6.5 Broker Request Arrival Distribution
The broker implemented in CloudSimDisk is responsible to send Cloudlets one by
one to the data center. Thus, MyPowerDatacenterBroker schedules each Cloudlet
at a speci�c time according to a distribution de�ned by the user.
In this class, the method setDistri(type, source) implements a "Switch-case"
statement which instantiate a distribution according to the type parameter (see
Table 7). The source parameter is used for wiki and basic distribution as a path
to the �le containing arrival times information.
Table 7. Sets the request arrival distribution in MyPowerDatacenterBroker.
TYPE DISTRIBUTION
expo Exponential distribution with an average of 60 seconds (arbitrary).unif Uniform distribution between 0 and 10 seconds (arbitrary).basic Read the arrival times in a �le for Basic Example.wiki Read the arrival times in a Wikipedia trace �le.
default Uniform distribution between 1 and 1.001 second.
59
To apply a new request arrival distribution, it needs to add one new case to the
"Switch-case" statement.
public void setDistri(String type, String source) {
switch (type) {
...
case "<TYPE>":
distri = new YourOwnDistribution(...);
break;
default:
distri = new UniformDistr(1, 1.0001); // arbitrary parameters
break;
}
}
3.7 Summary
This chapter presented CloudSimDisk module for modeling and simulation of storage
in CloudSim. The HDD model and power model are presented along with the execu-
tion �ow, and a description of the package diagram. Furthermore, energy awareness
and scalability has been demonstrated. The following chapter aims to validate this
module by presenting experimental results produced with CloudSimDisk.
60
4 RESULTS
This chapter presents simulation results obtained with CloudSimDisk module. At
�rst, input parameters and simulation outputs are described. As a central part, sev-
eral experiments are presented to demonstrate the core processing of the module,
including the incoming request distribution, the HDD characteristics variations and
the step by step evolution of the simulation. Additional results on energy consump-
tion and disk array management are discussed as well.
4.1 Inputs and Outputs
In this section, the input parameters of the simulator and the simulation outputs
are listed and described one by one.
4.1.1 Input Parameters
CloudSimDisk simulations are based on 10 input parameters presented in Table 8.
The requests arrival times on the data center persistent storage can be retrieved
from a de�ned distribution (uniform, exponential) or from a �les in which each
line correspond to a time7 (basic, wikipedia). Thus, the requestArrivalRateType
parameter informs the data center broker about the request arrival distribution to
create (expo, unif, basic or wiki). If the distribution needs to be read from a �le, the
path of this �le should be de�ned in the requestArrivalTimesSource parameter.
When it comes to create requests, one data �le and one required �le can be assigned
to each Cloudlet8. So a Cloudlet contains zero-to-one �le to add and zero-to-one �le
to retrieve. If some �les need to be retrieved, they have to be added to the persistent
storage before the simulation start. The startingFilesList parameter is used for
this purpose. The number of disk in the persistent storage should be at least one.
The default algorithm to manage the persistent storage disks at the data center level
is a Round-Robin (�rst �le added on the �rst drive, the second �le on the second
drive, and so on). This algorithm can be changed in MyDatacenter.java. The pool
of disk is uniform, that is to say it is based on one unique HDD model. The Power
model chosen should be in accordance with the HDD model.7examples are provided in the files folder of CloudSimDisk project.8technically, a Cloudlet can have more than one data �le and required �le.
61
Table 8. Input parameters for CloudSimDisk simulations.
No. NAME DESCRIPTION
1 nameOfTheSimulation The name of the simulation.
2 requestArrivalRateTypeThe type of distribution.(Ex: unif, expo, wiki)
3 requestArrivalTimesSourceThe path of the source �le containing the ar-rival times of requests.
4 numberOfRequest The number of request (Cloudlet) to create.
5 requiredFilesThe path of the �le containing the list of �le-names required.
6 dataFilesThe path of the �le containing the list of �le-names and �le sizes that needs to be storedduring the simulation.
7 startingFilesListThe path of the �le containing the list of �le-names and �le sizes that need to be storedbefore the simulation start.
8 numberOfDiskThe number of Hard Disk Drives (HDDs) inthe persistent storage of the datacenter.
9 hddModelThe model of HDD for the whole persistentstorage.
10 hddPowerModelThe power model of HDD in the persistentstorage.
4.1.2 Simulation Outputs
The outputs of a simulation are displayed in three di�erent formats, each of them
containing di�erent information for di�erent purposes. First, the IDE console shows
the step by step evolution of the simulation and a summary of the results (see Figure
24). This output is erased by the next simulation. Also, the console length is often
limited, so the output can be not total. Second, a log �le is created in the logs folder
of CloudSimDisk project. Its purpose is mainly to understand or debug the core
simulation. It contains a trace of all the events exchanged during the simulation,
the status of the persistent storage at the beginning of the simulation, the detailed
transaction time of each operation, the evolution of the length of each HDD queue
and the operating mode of each HDD in real time. Third, an Excel spreadsheet (see
Table 9) is created with all the information related to each Cloudlet operation9. This
Excel �le includes prebuilt graphs about the request arrival distribution, the seek
time and rotation latency distributions, the energy consumption, the transaction
times and the request waiting times.
9the spreadsheet can only store information about one operation per Cloudlet.
62
Figure 24. Example of console information output.
63
Table 9. Excel values output.
COLUMN NAME DESCRIPTION
A CloudletIDThe ID of each Cloudlet. All the re-sults are related to a speci�c Cloudlet,unique by this ID.
B Arrival Time (s)The time in second at which theCloudlet arrives.
C Waiting Time (s)The time in second the Cloudlet waitedin the waiting queue of the target disk.
D Transaction Time (s)The time in second of the transaction,sum of the seek time, the rotation la-tency and the transfer time.
E Seek Time (s)
The time in second to move the read-/write arm to the required track. Thisseek time is related to one particulartransaction.
F Rotation Latency (s)
The time in second to rotate the diskand bring the required sector under theread/write head. This rotation latencyis related to one particular transaction.
G Transfer Time (s)
The time in second to transfer a �le be-tween the internal controller to the sur-face of the magnetic disk. This transfertime is related to one particular trans-action.
H Done Time (s)The time in second at which the trans-action is �nished.
I FilenameThe name of the �le subject to thetransaction.
J File size (MB)The size of the �le subject to the trans-action.
K ActionDenotes if the �le has been added orretrieved.
L HDD name The name of the target HDD.
M Energy Consumption (J)
The energy consumed by the targetdisk to do the transaction. This out-put is available only with power-awarecomponent of CloudSimDisk.
64
4.2 Results
In this section, di�erent results are presented to demonstrate the core processing of
CloudSimDisk. The method used aims to analyze the output results, and to compare
them with the input con�gurations and the expected results from the analytical
models. In this way, the incoming request arrival distribution, the seek time, the
rotation latency and the data transfer time are analyzed. Additionally, the real time
console output is compared with the expected sequential processing proposed by
CloudSimDisk. Further, results on energy consumption and disk array management
are discussed as well.
4.2.1 Request Arrival Distribution
The request arrival rate is de�ned by input parameters "requestArrivalRateType"
and "requestArrivalTimesSource" (see Table 8). In the output Excel spreadsheet
produced by CloudSimDisk, the request arrival distribution is plotted to validate
the user input parameters. As an example, some wikipedia workload traces [93] have
been used as input request arrival rate. Figure 25 shows an example of Wikipedia
workload drawn by CloudSimDisk. A uniform distribution can be observed with an
average of 3060 requests per second (153 requests every 0.05 second).
Figure 25. Wikipedia workload distribution.
65
4.2.2 Sequential Processing
As presented in Chapter 3.3, Figure 20a, each Cloudlet (request) sent to an HDD is
executed according to a FIFO queue (First In, First Out). Thus, if one request is
received at the same time that another is executing, this request will have to wait
in the "waiting queue" before its execution.
For the sake of clarity, the following example is obtained using a simple workload
distribution shown by 26.
Figure 26. Simple example distribution.
Figures 28 to 35 present chronologically the console output of the example depicted
by Figure 27. The persistent storage is composed of 1 HDD.
Figure 27. Sequential processing of requests illustrated.
66
At �rst, the simulation is initialized, entities are started and components are created
(see Figure 28).
Figure 28. Sequential processing of requests part 1.1.
Then, Cloudlets are scheduled by the Broker entity (see Figure 29). In this exam-
ple, Cloudlets #1, #2 and #3 are scheduled respectively at 0.311, 0.321 and 0.356
second. Thus, they are expected to be executed in alphabetical order.
Figure 29. Sequential processing of requests part 1.2.
During the second part of the simulation, the datacenter entity start receiving
Cloudlets (see Figure 30) as scheduled during the �rst part by the Broker entity
(see Figure 29). Cloudlet #1 arrives �rst and start to be executed.
Figure 30. Sequential processing of requests part 2.1.
Cloudlet #2 arrived at 0.321000 second (see Figure 31), during the execution of
Cloudlet #1. Thus, Cloudlet #2 has to wait in the "waiting queue" of the disk
until Cloudlet #1 is done. Cloudlet #1 is completed at 0.333513 second, with a
transaction time of 0.022513 second and a null waiting time because the Cloudlet
was the �rst to be executed.
67
Figure 31. Sequential processing of requests part 2.2.
Cloudlet #3 arrived at 0.356000 second (see Figure 32), during the execution of
Cloudlet #2. Thus, Cloudlet #3 has to wait in the "waiting queue" of the disk
until Cloudlet #2 is done. Cloudlet #2 is completed at 0.374022 second, with a
transaction time of 0.040509 second and a waiting time of 0.012513 second in the
disk queue because of Cloudlet #1.
Figure 32. Sequential processing of requests part 2.3.
Cloudlet #3 is completed at 0.416238 second, with a transaction time of 0.042216
second and a waiting time of 0.018022 second in the disk queue because of Cloudlet
#2.
Figure 33. Sequential processing of requests part 2.4.
Since all the requests are now completed, the simulation termination is executed
(see Figure 34). All the entities are shut down, the CloudSim variables are reset and
the simulation is completed.
68
Figure 34. Sequential processing of requests part 3.1.
The �nal result is printed out (see Figure 35). The disk has been in Idle mode
between the beginning of the simulation and the reception of the �rst Cloudlet,
that is to say 0.311 second. Then the disk has been in active mode until the end
of the simulation. This total active time (0.105 second) is actually the sum of the
transaction time of each Cloudlet (0.022513 + 0.040509 + 0.042216) rounded to the
nearest one-thousandth second.
The maximum "waiting queue" of hdd1 is of 1 request, reached both when Cloudlet
#2 arrived during Cloudlet #1 execution and when Cloudlet #3 arrived during
Cloudlet #2 execution.
The total energy consumption of the persistent storage for this simulation is of 3.332
Joule(s). More explanations are provided by the section 4.2.7.
Figure 35. Sequential processing of requests part 3.2.
69
4.2.3 Seek Time Randomness
The seek time is the time to move the read/write head from an initial track x to the
required track y. Thus, this time is dependent on the distance between these two
tracks. The longer is the distance to cover, the longer is the seek time. However,
the seek time is not linear with the distance to travel because of acceleration and
deceleration periods of the actuator arm of the disk.
CloudSimDisk randomizes the seek time according to the average seek time parame-
ter of the target disk. The seek time distribution MySeekTimeDistr returns random
values between a minimum min, a maximum max and an average mean. According
to [92], CloudSimDisk distributes the seek time between 0 and 3 times the average
seek time, in second.
Figure 36 presents the seek time distribution obtained with CloudSimDisk for 5000
Cloudlets, based on HGST Ultrastar (REF: HUC109090CSS600) HDD model (Av-
erageSeekTime: 0.004 second).
Figure 36. Seek Time distribution.
The average seek time obtained by simulation is 0.0040 second. The minimum seek
time is 0.0000 s and the maximum is 0.0120 s. These values are in accordance with
the previously described model.
70
4.2.4 Rotation Latency Randomness
The rotation latency is the time to rotate the platter of the disk and bring the
required sector under the read/write head. This time is dependent on the rota-
tional speed of the disk measured in Rotation Per Minute (RPM). Also, the average
rotational latency of a disk is half the time needed for a full rotation. The mini-
mal rotation latency is null and correspond to the case where the required sector is
already under the read/write head.
CloudSimDisk randomized the rotation latency according to the average rotation
latency parameter of the target disk. Thus, a uniform distribution is used with a
minimum of 0 and a maximum of 2 times the average rotation latency, in second.
Figure 37 presents the rotation latency distribution obtained with CloudSimDisk for
5000 Cloudlets, based on HGST Ultrastar (REF: HUC109090CSS600) HDD model
(AverageRotationLatency: 0.003 second).
Figure 37. Rotation Latency distribution.
The average rotation latency obtained by simulation is 0.0030 second. The minimum
rotation latency is 0.0000 s and the maximum is 0.0060 s. These values are in
accordance with the previously described model.
71
4.2.5 Data Transfer Time Variation
As explained in section 3.2.1, the internal data transfer rate (or sustained rate)
has been identi�ed as a bottleneck of the overall data transfer rate of an HDD.
This parameter is very di�cult to model because of its dependency with several
factors including Zoned Bit Recording Variances, Cache E�ects and File System
Fragmentation. [90]
The current data transfer rate model of CloudSimDisk does not take into account
these parameters and considers the outermost zone of tracks on the platter, where
the internal data transfer rate is maximal [94].
Thus, the data transfer time varies only according to the size of the �le to transfer.
Figure 38 presents the transfer times obtained with CloudSimDisk for 500 Cloudlets,
based on HGST Ultrastar (REF: HUC109090CSS600) HDD model (MaxInternal-
DataTransferRate: 198.0 MB/second) and considering �le sizes between 1 MB to
10 MB.
Figure 38. Transfer times and �le sizes.
The result shows that the transfer time obtained with CloudSimDisk is linear with
the size of the �le processed (y = 198x + 1E-12 ). The intercept 1E-12 can be
approximated to 0. The slope of the line 198 corresponds to the maximum internal
data transfer rate of the HDD model used for the simulation. This value is in
accordance with the previously described model.
72
4.2.6 Seek Time, Rotation Latency and Data Transfer Time Compared
with Energy Consumption per Transaction
In the Excel spreadsheet output produced by CloudSimDisk, the energy consump-
tion for each transaction is calculated (see section 4.2.7, equation (6)). This section
compares the variation of the seek time, the rotation latency and the transfer time
compare with the variation of the energy consumption, for each transaction.
The following scenario considers a Wikipedia workload with 50 Cloudlets, each of
them adding a �le to the persistent storage. The model of HDD used is a Seagate
Enterprise (REF: ST6000VN0001) (ActivePower: 11.27 Watts). The �le sizes vary
between 1 MB to 10 MB.
The results, presented by Figure 40, show that the data transfer time and the energy
vary in a similar way. This indicates that the data transfer time has a major impact
on the energy consumption per transaction.
Figure 39 shows the detail of each transaction times for this scenario. For 86% of the
Cloudlet (43 Cloudlets out of 50), the transfer time is the most signi�cant part of
the total transaction time. This result is in accordance with the previous statement
relative to data transfer time and energy consumption per transaction (see Figure
40c).
Figure 39. The transaction time: sum of the seek time, the rotation latency and thetransfer time.
73
(a) Seek time compares with energy consumption per transaction.
(b) Rotation latency compares with energy consumption per transaction.
(c) Data transfer time compares with energy consumption per transaction.
Figure 40. Energy consumption per transaction compared with: (a) Seek Time; (b)Rotation Latency; (c) Data Transfer Time.
74
4.2.7 Persistent Storage Energy Consumption
The �nal result of a CloudSimDisk execution is the energy consumed by the storage
system during one simulation (see Figure 41).
Figure 41. "MyExampleWikipedia1", 5000 requests - Final result.
The energy consumed by the persistent storage of the data center is noted EpersistentStorage.
It is obtained by the sum of all Ehdd_i, the energy consumed by the i -th HDD out
of n in the persistent storage (see equation (1)).
EpersistentStorage =n∑
i=1
Ehdd_i (1)
Then, the energy consumed by the i -th HDD Ehdd_i is the sum of the energy con-
sumed by this HDD in Idle mode Ehdd_i, idle and in Active mode Ehdd_i, active (see
equation (2)).
Ehdd_i = Ehdd_i, idle + Ehdd_i, active (2)
The energy consumed in Idle mode for a speci�c HDD Ehdd_i, idle is the sum of all
Ehdd_i, idle_j, the energy consumed by the i -th HDD during the j -th idle interval out
of m (see equation (3)).
Ehdd_i, idle =m∑j=1
Ehdd_i, idle_j (3)
The energy consumed in Active mode for a speci�c HDD Ehdd_i, active is the sum of
all Ehdd_i, active_j, the energy consumed by the i -th HDD during the j -th operation
out of m (see equation (4)).
75
Ehdd_i, active =m∑j=0
Ehdd_i, active_j (4)
The energy consumed by a speci�c HDD during a speci�c Idle interval Ehdd_i, idle_j
is the interval duration tidle_j multiplied by the power required by the HDD in Idle
mode Phdd_i, idle (see equation (5)).
Ehdd_i, idle_j = tidle_j × Phdd_i, idle (5)
The energy consumed by a speci�c HDD during a speci�c operation Ehdd_i, active_j
is the operation duration thdd_i, operation_j multiplied by the power required by the
HDD in Active mode Phdd_i, active (see equation (6)).
Ehdd_i, active_j = thdd_i, operation_j × Phdd_i, active (6)
The time of one speci�c operation thdd_i, operation_j is called the transaction time (see
Figure 39). It is the sum of the seek time tseekTime, the rotation latency trotationLatency
and the transfer time ttransferTime (see equation (7)).
thdd_i, operation_j = ttransactionT ime = tseekT ime + trotationLatency + ttransferT ime (7)
This analytical model has been implemented in CloudSimDisk. To validate the im-
plementation, several scenarios have been executed with the simulator and the total
energy consumed by the persistent storage have been compared with the manually
calculated results. Both proved to be similar.
4.2.8 Energy Consumption and File Sizes
An operation on the persistent storage consists of adding/retrieving a �le on/from
an HDD. A �le has a name and a size. While the name is used to identify each �le,
the size has an impact on the energy consumed by the storage. Indeed, in equation
76
(7), the transfer time ttransferTime can be expressed as the data transfer rate R of the
target HDD multiplied by the size of the �le processed fsize (see equation (8)).
ttransferT ime = R× fsize (8)
Figure 42 shows the energy consumption per operation depending upon the size of
the �le processed by this operation. The scenario consists of adding 100 �les with
the same size to the persistent storage. The HDD model used is a HGST Western
Digital 900GB, with an average seek time of 0.003 second, an average rotation
latency of 0.004 second and a maximum internal data transfer rate of 198.0 MB/s
(ref: HUC109090CSS600). The scenario has been repeated four times with �le sizes
of 1, 10, 100, and 1000 megabytes.
To verify the validity of the simulations, an analytic result has been calculated from
the HDD characteristics and according to equation (9).