Evaluation Towards Cloud : Overview of Next Generation Computing
Architectureby Monowar Hasan & Sabbir Ahmed A Thesis submitted
to the Department of Computer Science and Engineering in partial
fulllment of the requirements for the degree of
Bachelor of Science (B.Sc.) in Department of Computer Science
and Engineering
Bangladesh University of Engineering and Technology 22 March
2012 Dhaka Bangladesh
AbstractNowadays Cloud Computing become a buzz-word in
distributed processing. Cloud Computing, originates from the ideas
of concurrent processing from Computer Cluster, enhanced the
established architecture and standards of Grid - another technology
of parallel processing with the ideas of Utility Computing and
Service oriented Computing. Cloud Computing is actually provide a
business model as form of X-as-a-Service where X may include
Hardware, Software, Developing platform or some Storage media.
End-users can consume any of these service form provides
pay-as-you-go basis without knowing details about underlying
architecture. Hence, cloud provide a layers of abstraction to
end-users and provides scope to modify the application demand for
end-users, developers and providers.
ii
AcknowledgementsWe are grateful to several people for this
thesis without whom it wont be a successful one. Our heart felt
thanks to our supervisor Professor Dr. Md. Humayun Kabir Sir for
his support and valuable guidelines. With his continuous feedback
and assistance helps us to clear our ideas and understandings on
the topics.
Special thanks to Professor Dr. Hanan Lutyya from University of
Western Ontario, Canada and Professor Dr. Ivona Brandic, Vienna
University of Technology, Vienna, Austria for proving their
research publications which helps to progress our thesis.
Department of Computer Science and Engineering, Bangladesh
University of Engineering and Technology provided us with sound
working environment and helps us to get on-line publications.
Last but not the least, we acknowledge the contribution and
support of our family members for being with us and encouraging us
all the way. Without their sacrice it would not end up a successful
one.
iii
Table of Contents
Abstract Acknowledgments Table of Contents List of Tables List
of Figures 1 Computing with Distributes Units: Computer Clusters
1.1 Distributed Systems . . . . . . . . . . . . . . . . . . . . . .
. . . . . 1.1.1 1.1.2 1.1.3 1.2 1.3 1.4 1.5 Centralized vs
Distributed Systems . . . . . . . . . . . . . . . Advantages of
Distributed Systems . . . . . . . . . . . . . . . Issues and
Challanges in Distributed Systems . . . . . . . . .
ii iii vii viii x 1 4 5 6 7 7 8 10 12 13 14
Computer Clusters . . . . . . . . . . . . . . . . . . . . . . .
. . . . . Architecture of Computer Clusters . . . . . . . . . . . .
. . . . . . . Cluster Interconnection . . . . . . . . . . . . . . .
. . . . . . . . . . . Protocols for Cluster Communication . . . . .
. . . . . . . . . . . . . 1.5.1 1.5.2 Internet Protocols . . . . .
. . . . . . . . . . . . . . . . . . . . Low-latency Protocols . . .
. . . . . . . . . . . . . . . . . . .
iv
1.5.2.1 1.5.2.2 1.5.2.3 1.5.2.4 1.5.2.5 1.5.3
Active Messages . . . . . . . . . . . . . . . . . . . . Fast
Messages . . . . . . . . . . . . . . . . . . . . . . VMMC . . . . .
. . . . . . . . . . . . . . . . . . . . U-net . . . . . . . . . . .
. . . . . . . . . . . . . . . BIP . . . . . . . . . . . . . . . . .
. . . . . . . . . .
14 15 16 16 17 17 18 19 22 23 24 24 25 26 27 28 29 30 31 33 34
35 35 36 37 38
Standards for Cluster Communication . . . . . . . . . . . . .
1.5.3.1 1.5.3.2 VIA . . . . . . . . . . . . . . . . . . . . . . . .
. . . InniBand . . . . . . . . . . . . . . . . . . . . . . . .
1.6 1.7
Single System Image (SSI) . . . . . . . . . . . . . . . . . . .
. . . . . Cluster Middleware . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 1.7.1 1.7.2 1.7.3 Message-based Middleware . . .
. . . . . . . . . . . . . . . . RPC-based Middleware . . . . . . .
. . . . . . . . . . . . . . Object Request Broker . . . . . . . . .
. . . . . . . . . . . . .
1.8
Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . .
. . . .
2 Grid Computing : An Introduction 2.1 Grid Computing: denitions
and overview . . . . . . . . . . . . . . . 2.1.1 2.1.2 2.2 2.3
Virtualization and Grid . . . . . . . . . . . . . . . . . . . .
.
Grids over Cluster Computing . . . . . . . . . . . . . . . . .
.
An example of Grid Computing environment . . . . . . . . . . . .
. . Grid Architecture . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 Fabric Layer: Interfaces
to Local Resources . . . . . . . . . . . Connectivity Layer:
Managing Communications . . . . . . . . Resource Layer: Sharing of
a Single Resource . . . . . . . . . Collective Layer :
Co-ordination with multiple resources . . . Application Layer :
User dened Grid Applications . . . . . .
2.4
Grid Computing with Globus . . . . . . . . . . . . . . . . . . .
. . . v
2.5
Resource Management in Grid Computing . . . . . . . . . . . . .
. . 2.5.1 2.5.2 Resource Specication Language . . . . . . . . . . .
. . . . . . Globus Resource Allocation Manager (GRAM) . . . . . . .
. .
39 40 41 42 43 45 45 47 48 55 57 61 62 62 63 64 67 68
2.6 2.7
Evolution towards Cloud Computing from Grid . . . . . . . . . .
. . Concluding remarks . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
3 An overview of Cloud Architecture 3.1 3.2 Cloud Components . .
. . . . . . . . . . . . . . . . . . . . . . . . . . Cloud
Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 3.2.1 3.2.2 3.2.3 3.3 A layered model of Cloud architecture -
Cloud ontology . . . . Cloud Business Model . . . . . . . . . . . .
. . . . . . . . . . Cloud Deployment Model . . . . . . . . . . . .
. . . . . . . .
Cloud Services . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 3.3.1 3.3.2 3.3.3 Infrastructure as a Service (IaaS) .
. . . . . . . . . . . . . . . Platform as a Service (PaaS) . . . .
. . . . . . . . . . . . . . . Software as a Service (SaaS) . . . .
. . . . . . . . . . . . . . .
3.4 3.5 3.6
Virtualization on Cloud . . . . . . . . . . . . . . . . . . . .
. . . . . Example of a Cloud Implementation . . . . . . . . . . . .
. . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
4 Grid and Cloud Computing Comparisons : Similarities &
Dierences 4.1 4.2 Major Focus . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . Points of Considerations . . . . . . . .
. . . . . . . . . . . . . . . . . 4.2.1 4.2.2 4.2.3 Business Model
. . . . . . . . . . . . . . . . . . . . . . . . . . Scalability
issues . . . . . . . . . . . . . . . . . . . . . . . . .
Multitasking and Availability . . . . . . . . . . . . . . . . . .
69 69 70 70 71 72
vi
4.2.4 4.2.5 4.2.6 4.3
Resource Management . . . . . . . . . . . . . . . . . . . . . .
Application Model . . . . . . . . . . . . . . . . . . . . . . . .
Other issues . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
72 76 76 77 77 77 79
Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 4.3.1 Comparative results . . . . . . . . . . . . . . . .
. . . . . . .
4.4
Concluding remarks . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
5 Conclusion and Future works
vii
List of Tables3.1 Example of existing Cloud Systems w.r.to
classication into layers of Cloud Ontology . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . 3.2 4.1 CPU utilization in Full
Virtualization and Paravirtualization . . . . . Comparative
analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
66 78
viii
List of Figures1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 2.6
2.7 Eras of Computing . . . . . . . . . . . . . . . . . . . . . . .
. . . . . Distributed computing . . . . . . . . . . . . . . . . . .
. . . . . . . . Architecture of Cluster Computing . . . . . . . . .
. . . . . . . . . . Categories of Cluster Interconnection Hardware.
. . . . . . . . . . . . Traditional Protocol Overhead and
Transmission Time. . . . . . . . . The InniBand Architecture . . .
. . . . . . . . . . . . . . . . . . . . Evolution of Grid Computing
. . . . . . . . . . . . . . . . . . . . . . Resource availability
according to demand . . . . . . . . . . . . . . . Serving job
requests in traditional environment . . . . . . . . . . . . Serving
job requests in traditional environment . . . . . . . . . . . .
Google search architecture . . . . . . . . . . . . . . . . . . . .
. . . . Grid Protocol Architecture . . . . . . . . . . . . . . . .
. . . . . . . . Collective and Resource layer protocols are
combined in various ways to provide application functionality . . .
. . . . . . . . . . . . . . . . 2.8 Programmers view of Grid
Architecture. Dotted lines denotes protocol interactions where
solid lines represent a direct call . . . . . . . . . . 2.9 A
resource management architecture for Grid Computing environment 37
40 42 43 36 3 5 9 11 14 20 28 30 31 32 33 34
2.10 Globus GRAM Architecture . . . . . . . . . . . . . . . . .
. . . . . . 2.11 Enhancement of generic Grid architecture to
Service Oriented Grid . ix
3.1 3.2 3.3
Components of a Cloud Computing Solution . . . . . . . . . . . .
. . Hierarchical abstraction layers of Cluster, Grid and Cloud
Computing Cloud layered architecture : consists of ve layers, gure
represents inter-dependency between layers . . . . . . . . . . . .
. . . . . . . . .
46 48
49
3.4
Non-cloud environment needs three servers but in the Cloud, two
servers are used . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 51 55 59 59 61 62 63 63 64
3.5 3.6 3.7 3.8 3.9
Cloud computing Business model . . . . . . . . . . . . . . . . .
. . . External or Public Cloud . . . . . . . . . . . . . . . . . .
. . . . . . . Internal or Private Cloud . . . . . . . . . . . . . .
. . . . . . . . . . . Example of Hybrid Cloud . . . . . . . . . . .
. . . . . . . . . . . . . Correlation between Cloud Architecture
and Cloud Services . . . . .
3.10 Infrastructure as a Service . . . . . . . . . . . . . . . .
. . . . . . . . 3.11 Platform as a Service . . . . . . . . . . . .
. . . . . . . . . . . . . . . 3.12 Software as a Service . . . . .
. . . . . . . . . . . . . . . . . . . . . . 3.13 A fully
virtualized deployment where operating platform running on servers
is displayed . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 3.14 A Paravirtualized deployment where many OS can run
simultaneously 4.1 4.2 Motivation of Grid and Cloud . . . . . . . .
. . . . . . . . . . . . . . Comparison regarding performance,
reliability and cost . . . . . . . .
65 66 70 71
x
Chapter 1 Computing with Distributes Units: Computer
ClustersComputing Industry is one of the fastest growing industries
and it started since 1943. Computers [1, 2] between 1943 and 1959
usually regarded as rst generation computers and are based on
valves and wire circuits. They are [3] characterized by the use of
punched cards and vacuum valves. All programming was done in
machine code.
The Second Generation Computers were built between 1959 and 1964
.They were based on transistors and printed circuits. So they were
much smaller. These computers were more powerful, accepting
English-like commands, and so were much more exible in their
applications.
Computers built between 1964 and 1972 are often regarded as
Third Generation computers. They werere based on the rst integrated
circuits and creating even smaller machines.
Computers built after 1972 are often called fourth generation
computers. These computers were based on LSI (Large Scale
Integration) of circuits such as microprocessors - typically 500 or
more components on a chip. Later developments include 1
VLSI (Very Large Scale Integration) typically 10,000
components.
The fth generation computers are based on parallel processing
and VLSI integration - but are still being developed. The recent
advances in VLSI(Very Large Scale Integration) technology has
played a major role in the development of powerful sequential and
parallel computers. Software technology is also developing fast as
well. Mature software, like Operating Systems, programming
languages, development methodologies, and tools, are now available.
This gives the opportunity of development and deployment of
applications needs for scientic, engineering, and commercial needs.
Again several challenging applications, such as weather forecasting
and earthquake analysis, have become the main driving force behind
the development of powerful parallel computers.
So we can show computing as two prominent eras: Sequential
Computing Era Parallel Computing Era A graphical view of the
changes in computing eras is shown in Figure 1.1. Each computing
era started with hardware architectures of the system and then
followed by system software specially operating systems and
compilers, applications, and reaching its limit with its growth
Problem Solving Environments. Each component of computing eras had
to face three phases: R&D (Research and Development),
commercialization, and commodity. Technology for the development of
components of parallel era is not so much developed as for
sequential era.
There are several reasons of using parallel computers. Some of
them are 2
Figure 1.1: Eras of Computing
3
Parallelism is one of the best ways to overcome the speed
bottleneck of a single processor. The price performance ratio of a
small cluster-based parallel computer as opposed to a minicomputer
is much smaller and consequently a better value. Developing and
producing systems of moderate speed using parallel architectures is
much cheaper than the equivalent performance of a sequential
system. In the 1980s it was believed that computer performance was
best improved by creating faster and more ecient processors. But
this idea was challenged by parallel processing. This means linking
together two or more computers to jointly solve some computational
problem. Since the early 1990s there has been an increasing trend
to move away from expensive and specialized proprietary parallel
supercomputers towards networks of workstations. This was the
driving force for starting Cluster Computing. Later several
distributed computing systems are developed like Grid Conputing and
Cloud Computing. In this chapter we will going to discuss about
Cluster Computing.
1.1
Distributed Systems
A distributed system is a computing system in which several
autonomous computers are linked by a computer network that appear
to the users of the system as a single computer.
The computers in the network interact with each other in order
to achieve a common goal. The program which runs in a distributed
system are called distributed program. By running a distributed
system software the computers are enabled to: Coordinate their
activities. 4
Figure 1.2: Distributed computing Share resources: hardware,
software, data. Achieve transparancy of resources. Illusion of
single system while running upon multiple system. Distributed
systems are useful for [4] breaking down an application into
individual computing (Figure 1.2) agents so that they can be easily
solve. These systems are distributed over a network. They work
together on a cooperative task. They can solve larger problems
without larger computers. So they are very cheap in comparison to
single system computing. So now a days Distributed systems become
more preferableThere is a central sever and several clients and
they are connected together. Various parallel devices are connected
to the whole system through distributed system and both operator
and client can use them.
1.1.1
Centralized vs Distributed Systems
Here are some [5] dierence between Centralized and Distributed
Systems. Centralized Systems: Centralized systems have
non-autonomous components. 5
Centralized systems are often build using homogeneous
technology. Multiple users share the resources of a centralized
system at all times. Centralized systems have a single point of
control and of failure. Distributed Systems: Distributed systems
have autonomous components. Distributed systems may be built using
heterogeneous technology. Distributed system components may be used
exclusively. Distributed systems are executed in concurrent
processes. Distributed systems have multiple points of failure.
1.1.2
Advantages of Distributed Systems
Distributed system has [6] several advantages over single
system. Some of them are Performance: Very often a collection of
processors can provide higher performance than a centralized
computer.Distributed System has better Price/performance ratio.
Distribution: There are applications involve, by their nature,
spatially separated machines (banking, commercial, automotive
system). Reliability: There may be crash of the machines. For
single system if machine crashes then all data will be lost but for
Distributed system if some of the machines crash, the system can
survive. Incremental growth: As requirements on processing power
grow, new machines can be added incrementally. 6
Sharing of data/resources: Shared data is essential to many
applications (banking, computer supported cooperative work,
reservation systems); other resources can be also shared (e.g.
expensive printers). Communication: Give the opportunity of
human-to-human communication.
1.1.3
Issues and Challanges in Distributed Systems
Though there are several advantages of [7] Distributed system,
there are some disadvantages also. Some of them are Diculties of
developing distributed software: It is dicult to develop software
for these distributed systems. It is hard to nd that how should
operating systems, programming languages and applications look
like. Networking problems: Several problems are created by the
network infrastructure, which have to be dealt with: loss of
messages, overloading, etc. Security problems: Sharing generates
the problem of data security. More components to fail: As
Distributed systems deals with larger network, so there are more
possibility of failure of the system and data transfer.
1.2
Computer Clusters
A cluster [8] is a type of parallel or distributed processing
system. It consists of a collection of interconnected stand-alone
computers and they working together as a single,integrated
computing resource. All the component subsystems of a Cluster are
supervised within a single administrative domain, usually residing
in a single room 7
and managed as a single computer system. We can use Cluster
computing [9] for load balancing as well as for high availability.
We can also use Cluster computing as a relatively low-cost form of
parallel processing for scientic and other applications that lend
themselves to parallel operations. Some properties of cluster
computing: The computers also known as nodes on a cluster are
networked in a tightlycoupled fashion .They are all on the same
subnet of the same domain and often networked with very high
bandwidth connections. The nodes of a Cluster are homogeneous.They
all use the same hardware, run the same software, and are generally
congured identically. Each node in a cluster is a dedicated
resource generally only the cluster applications run on a cluster
node. We use the Message Passing Interface (MPI) [10] in Cluster
which is a programming interface that allows the distributed
application instances to communicate with each other and share
information. In Cluster Computing we use Dedicated hardware,
high-speed interconnects, and MPI that provide clusters the ability
to work eciently on ne-grained parallel problems where the subtasks
must communicate many times per second, including problems with
short tasks, some of which may depend on the results of previous
tasks.
1.3
Architecture of Computer Clusters
In Cluster Computing a computer node can be a single or
multiprocessor system [11].The nodes can be PCs, workstations,or
SMPs with memory, I/O facilities, and 8
Figure 1.3: Architecture of Cluster Computing an operating
system. In Cluster Computing two or more nodes are connected
together. These nodes can exist in a single cabinet or be
physically separated and connected via a LAN. This LAN-based
inter-connected cluster of computers appear as a single system to
users and applications. Cluster Computing can provide a
cost-efective way to gain features and benets like fast and
reliable services that could previously found only on more
expensive proprietary shared memory systems. The typical
architecture of a cluster is shown in Figure 1.3. A Cluster
Computing system is consist of several components. The following
are some prominent components of cluster computers: Cluster is
consist of Multiple High Performance Computers. There can be PCs,
Workstations, or SMPs. There is a state-of-the-art Operating
Systems. Operating System can be Layered or Micro-kernel based.
Several high Performance Networks/Switches are used to connect the
nodes of the Cluster. Among them Gigabit Ethernet and Myrinet are
most common. Cluster Interconnection use Network Interface Cards
.
9
Several Fast Communication Protocols and Services are used to
communicate within nodes. Active messages, Fast Messages are such
type of Protocols. Later we get some standards like InniBand for
communicate . There is a Middleware which sits between operating
system and application. Middleware provides the system Single
System Image (SSI) and System Availability Infrastructure.
Middleware consist of Several hardwares like Digital (DEC) Memory
Channel and Operating System Kernel or Gluing Layer such as Solaris
MC and GLUnix . Applications and Subsystems that consist of several
applications,runtime systems and resource management and scheduling
software. Applications such as system management tools . Runtime
Systems such as software DSM and parallel le system. Resource
Management and Scheduling software such as LSF (Load Sharing
Facility). Cluster includes Parallel Programming Environments and
Tools such as compilers, MPI (Message Passing Interface). Both
Sequential and Parallel or Distributed Applications. Sequential
Parallel or Distributed
1.4
Cluster Interconnection
In Cluster Computing The choice of interconnection technology is
a key component. We can classify the Interconnection technologies
into four categories. These four 10
Figure 1.4: Categories of Cluster Interconnection Hardware.
categories depends on the internal connection and how the nodes
communicate each other . The internal connection can be from the
I/O bus or the memory bus and the communication between the
computers can be performed primarily using messages or using shared
storage [12]. Figure 1.4 illustrates the four types of
interconnection. Among the four interconnection categories I/O
attached message-based systems are by far the most common. This
system includes all commonly-used wide-area and local-area network
technologies. It also includes several recent products that are
specically designed for cluster computing. I/O attached shared
storage systems include computers that share a common disk
sub-system. Memory attached systems are not common like I/O
attached systems,since the memory bus of an individual computer
generally has a design that is unique to that type of computer.
However, many memory-attached systems are implemented.Most of the
time they are implemented in software or with memory-mapped I/O,
such as Reective Memory [13]. There are several Hybrid systems that
combine the features of more than one category. Example of a Hybrid
system the Inniband standard. Inniband [14] is an I/O attached
interconnection. It can be used to send data to a shared disk
sub-system as well as to send messages to another computer.There
are many factors that aect the choice of interconnect technology
for a cluster. Factors like compatibility with the cluster hardware
and operating system, price, and performance.Performance of a
Cluster depends on the latency and bandwidth.
11
Latency is the time needed to send data from one computer to
another. Latency also includes overhead for the software to
construct the message as well as the time to transfer the bits from
one computer to another. Bandwidth is the number of bits per second
that can be transmitted over the interconnect hardware.
Applications that utilise small messages will have better
performance particularly because the latency is
reduced.Applications that send large messages will have better
performance particularly as the bandwidth increases.The latency is
a function of both the communication software and network
hardware.
1.5
Protocols for Cluster Communication
A communication protocol denes a set [15] of rules and
conventions for communicating between nodes among the cluster. Each
protocol uses dierent technology to exchange information.
Communication protocols can be classied as: Connection-oriented or
connectionless. Oering various levels of reliability.Protocol can
be reliable that fully guaranteed to arrive in order or can be
unreliable that not guaranteed to arrive in order. Communication
can be not buered which is synchronous, or buered which is
asynchronous. By the number of intermediate data copies between
buers, which may be zero, one, or more.
12
Several protocols are used in clusters. Formerly Traditional
internet protocols are used for clustering. Later several protocols
that have been designed specically for cluster communication
.Finally two new protocol standards have been specially designed
for use in cluster computing.
1.5.1
Internet Protocols
The Internet Protocol (IP) is the standard for networking
worldwide. The Transmission Control Protocol (TCP) and the User
Datagram Protocol (UDP) are both transport layer protocols built
over the Internet Protocol. TCP and UDP protocols and the de facto
standard BSD sockets Application Programmers Interface (API) to TCP
and UDP were among the rst messaging libraries used for [16]
cluster computing. Internet Protocol use one or more buers in
system memory with the help of operating system services. User
application constructs the message in user memory, and then makes
an operating system request to copy the message into a system buer.
A system interrupt is required for send and receive. In Internet
protocol Operating system overhead and the overhead for copies to
and from system memory are a signicant portion of the total time to
send a message. As network hardware became faster during the 1990s,
the overhead of the communication protocols became signicantly
larger than the actual hardware transmission time for messages, as
shown in Figure 1.5. So there needed the necessity of new types of
protocols for cluster computing.
13
Figure 1.5: Traditional Protocol Overhead and Transmission
Time.
1.5.2
Low-latency Protocols
For avoiding operating system intervention several research
projects were done during the 1990s.These projects led to the
development low-latency protocols. These protocols at the same time
providing user-level messaging services across high-speed networks.
Low-latency protocols developed during the 1990s include Active
Messages, Fast Messages, the VMMC (Virtual Memory-Mapped
Communication) system, U-net and Basic Interface for Parallelism
(BIP), among others. 1.5.2.1 Active Messages
Active Messages was developed in the university of Berkeley. It
[17] is the enabling low-latency communications library for the
Berkeley Network of Workstations (NOW) project [18]. Short messages
in Active Messages are synchronous and based on the concept of a
request-reply protocol. Sending user-level application constructs a
message in user memory. To transfer the data, the receiving process
allocates a receive buer in user memory on the receiving side and
sends a request to the sender. The sender replies by copying the
message from the user buer on the sending
14
side directly to the network. No buering in system memory is
performed. Network hardware transfers the message to the receiver,
and then the message is transferred from the network to the receive
buer in user memory. It is required in Active Messages that user
virtual memory on both the sending and receiving side be pinned to
an address in physical memory .The reason behind it so that it will
not be paged out during the network operation.Once the pinned user
memory buers are established, no operating system intervention is
required for a message to be sent. Since no copies from user memory
to system memory are used, this protocol is known as a zero-copy
protocol. To support multiple concurrent parallel applications in a
cluster Active Messages was extended to Generic Active Messages
(GAM). In GAM, a copy sometimes occurs to a buer in system memory
on the receiving side so that user buers can be reused more
eciently. In this case, the protocol is referred to as a one-copy
protocol. 1.5.2.2 Fast Messages
Fast Messages was developed at the University of Illinois.Itl
similar to Active Messages [19]. Fast Messages extends Active
Messages by imposing stronger guarantees on the underlying
communication. Fast Messages guarantees that all messages arrive
reliably and in-order, even if the underlying network hardware does
not. Fast Message using ow control to ensure that a fast sender
cannot overrun a slow receiver, thus causing messages to be lost.
Flow control is implemented in Fast Messages with a credit system
that manages pinned memory in the host computers.
15
1.5.2.3
VMMC
The Virtual Memory-Mapped Communication(VMMC) [20] system was
low-latency protocol for the Princeton SHRIMP project. One goal of
VMMC is to view messaging as reads and writes into the user-level
virtual memory system. VMMC works by mapping a page of user virtual
memory to physical memory.It makes a correspondence between pages
on the sending and receiving sides. It uses specially designed
hardware .This hardware allows the network interface to snoop
writes to memory on the local host and have these writes
automatically updated on the remote hosts memory. Various
optimisations of these writes have been developed that help to
minimize the total number of writes, network trac, and overall
application performance. VMMC is an example of a paradigm known as
distributed shared memory (DSM). In DSM systems memory is
physically distributed among the nodes in a system, but processes
in an application may view shared memory locations as identical and
perform reads and writes to the shared memory locations. 1.5.2.4
U-net
The U-net network interface architecture [21] was developed at
Cornell University. U-net provides zero-copy messaging where
possible. U-net adds the concept of a virtual network interface for
each connection in a user application. Just as an application has a
virtual memory address space that is mapped to real physical memory
on demand. Each communication endpoint of the application is viewed
as a virtual network interface mapped to a real set of network
buers and queues on demand. 16
The advantage of this architecture is that once the mapping is
dened, each active interface has direct access to the network
without operating system intervention. The result is that
communication can occur with very low latency. 1.5.2.5 BIP
BIP [22] (Basic Interface for Parallelism) is a low-latency
protocol that was developed at the University of Lyon. BIP is
designed as a low-level message layer over which a higher-level
layer such as Message Passing Interface (MPI) [10]can be built.
Programmers can use MPI over BIP for parallel application
programming. The initial BIP interface consisted of both blocking
and non-blocking calls. Later versions (BIP-SMP) provide
multiplexing between the network and shared memory under a single
API for use on clusters of symmetric multiprocessors.
BIP achieves low latency and high bandwidth by using dierent
protocols for various message sizes.It also provides a zero or
single memory copy of user data. To simply the design and keep the
overheads low, BIP guarantees in-order delivery of messages,
although some ow control issues for small messages are passed to
higher software levels.
1.5.3
Standards for Cluster Communication
Research on low-latency protocols had progressed suciently
during those years.So new standard for low-latency messaging to be
developed, the Virtual Interface Architecture (VIA). During a
similar period of time industrial researchers worked on standards
for shared storage subsystems. The combination of the eorts of
many
17
researchers has resulted in the InniBand standard.
1.5.3.1
VIA
The Virtual Interface Architecture (VIA) [23] is a
communications standard that combines many of the best features of
various academic projects. A consortium of academic and industrial
partners, including Intel, Compaq, and Microsoft, developed the
standard. VIA supported heterogeneous hardware and was available as
of early 2001. It was based on the concept of a virtual network
interface.Before a message can be sent in VIA, send and receive
buers must be allocated and pinned to physical memory locations.
there was no need of system calls after the buers and associated
data structures are allocated. A send or receive operation in a
user application consists of posting a descriptor to a queue. The
application can choose to wait for a conrmation that the operation
has completed, or can continue host processing while the message is
being processed.
Several hardware vendors and some independent developers have
developed VIA implementations for various network [24, 25]
products. VIA implementations can be classied as native or
emulated. A native implementation of VIA o-loads a portion of the
processing required to send and receive messages to special
hardware on the network interface card. 18
When a message arrives in a native VIA implementation, the
network card performs at least a portion of the work required to
copy the message into user memory. An emulated VIA implementation,
the host CPU performs the processing to send and receive messages.
Although the host processor is used in both cases, an emulated
implementation of VIA has less overhead than TCP/IP. However, the
services provided by VIA are dierent than those provided by TCP/IP,
since the communication may not be guaranteed to arrive reliably in
VIA.
1.5.3.2
InniBand
The InniBand standard [26] was an another standard for cluster
protocol was supported by a large consortium of industrial
partners, including Compaq, Dell, HewlettPackard, IBM,Intel,
Microsoft and Sun Microsystems. The InniBand architecture replaces
the standard shared bus for I/O on current computers with a
high-speed serial, channel-based, message-passing, scalable,
switched fabric. There are two types of adaptors. Host channel
adapters(HCA) and target channel adapters(TCA). All systems and
devices attach to the fabric through host channel adapters (HCA) or
target channel adapters (TCA), as shown in Figure 1.6. In InniBand
data is sent in packets, and six types of transfer methods are
available, including: Reliable and unreliable connections, Reliable
and unreliable datagrams, Multicast connections and Raw
packets.
19
Figure 1.6: The InniBand Architecture InniBand supports remote
direct memory access (RDMA) read or write operations.This allows
one processor to read or write the contents of memory at another
processor, and also directly supports IPv6 [27] messaging for the
Internet. There are several components of InniBand: Host channel
adapter(HCA): Host channel adapter is an interface that resides
within a server.Communicates directly with the servers memory,
processor, target channel adapter or a switch. It guarantees
delivery of data and can recover from transmission errors. Target
channel adapter(TCA): Target channel adapter enables I/O devices to
be located within the network independent of a host computer. It
includes an I/O controller that is specic to its particular devices
protocol. TCAs can communicate with an HCA or a switch. Switch:
Switch is virtually equivalent to a trac cop.It allows many HCAs
and TCAs to connect to it and handles network trac. Oers higher
availability, higher aggregate bandwidth, load balancing, data
mirroring and much more. Looks at the local route header on each
packet of data and forwards it to the 20
appropriate location. A group of switches is referred to as a
fabric. If a host computer is down, the switch still continues to
operate. The switch also frees up servers and other devices by
handling network trac. Router: Router forwards data packets from a
local network (called a subnet) to other external subnets. Reads
the global route header and forwards to appropriate
address.Rebuilds each packet with the proper local address header
as it passes it to the new subnet. Subnet Manager: It is an
application responsible for conguring the local subnet and ensuring
its continued operation.Conguration responsibilities include
managing switch and router setups and reconguring the subnet if a
link goes down or a new one is added. The IBA is comprised of four
primary layers that describe communication devices and methodology.
Physical Layer: Denes the electrical and mechanical characteristics
of the IBA, including the cables, connectors and hot-swap
characteristics. IBA connectors include ber, copper and backplane
connectors. There are three link speeds specied as 1X, 4X and
12X.1X link cable has four wires, two for each direction of
communication (read and write). Link Layer: Includes packet layout,
point-to-point link instructions, switching within a local subnet
and data integrity.Two types of packets, management and data.
Management packets handle link congurations and maintenance. Data
packets carry up to 4 kilobytes of transaction payload. Every
device in a local subnet has a local ID (LID) for forwarding data
appropriately. Handles data integrity by 21
including variant and invariant cyclic redundancy checking
(CRC). The variant CRC checks elds that change from point-to-point
and the invariant CRC provides end-to-end data integrity. Network
Layer: The network layer is responsible for routing packets from
one subnet to another. The global route header located within a
packet includes an IPv6 address for the source and destination of
each packet. For single subnet environments, the network layer
information is not used. Transport Layer: Handles the order of
packet delivery.Also handles partitioning, multiplexing and
transport services that determine reliable connections.
1.6
Single System Image (SSI)
Single System Image (SSI) is a property through which we can
view a distributed system as a single unied computing resource.
This property hides the distributed and heterogeneous nature of the
available resources and represents them before the users as a
single, powerful, unied computing resource [28]. A system using SSI
gives the users a system view of the resources available to them
but they dont have to know the node to which they are physically
associated.These resources can range from access and manipulation
of remote processes to the use of a global le-system.SSI provides
high availability,the system can operate after some failure. It
also ensures that the nodes are evenly loaded. SSI cluster-based
systems are mainly focused on complete transparency of resource
management, scalable performance and system availability in
supporting user applications [28, 29, 30, 31, 32].There are several
key attributes of SSI. The following are among some of the
desirable key SSI attributes : point of entry, user interface,
pro22
cessspace, I/O and memory space, job-management system and point
of management and control. The most important benets of SSI [28]
include: SSI allows the use of resources in a transparent way.The
user dont have to think about their physical location. It oers the
same command syntax as in other systems and thus reduces the risk
of operator errors, with the result that end-users see an improved
performance, reliability and higher availability of the system. The
end-userdont have to know where in the cluster an application will
run. SSI greatly simplies system management and thus reduced cost
of ownership. It promotes the development of standard tools and
utilities.
1.7
Cluster Middleware
Middleware is the layer of software sandwiched between the
operating system and applications.It has re-emerged as a means of
integrating software applications that run in a heterogeneous
environment.There is large overlap between the infrastructure that
is provided to a cluster by high-level Single System Image (SSI)
services and those provided by the traditional view of middleware.
Middleware helps a developer overcome three potential problems with
developing applications on a heterogeneous cluster: Gives the
ability to access to software inside or outside their site. Helps
to integrate softwares from dierent sources. Rapid application
development. 23
The services that middleware provides are not restricted to
application development.Middleware also provides services for the
management and administration of a heterogeneous system.
1.7.1
Message-based Middleware
Message-based middleware uses a common communications protocol
to exchange data between applications. The communications protocol
hides many of the low-level message passing primitives from the
application developer. Message-based middleware software can pass
messages directly between applications, send messages via software
that queues waiting messages, or use some combination of the two.
Examples of this type of middleware are the three upper layers of
the OSI model [33], the session, presentation and applications
layers.
1.7.2
RPC-based Middleware
There are many applications where the interactions between
processes in a distributed system are remote operations, often with
a return value. For these applications (RPC) Remote Procedure Call
is used.The implementation of the client/server model in terms of
Remote Procedure Call (RPC) allows the code of the application to
remain the same whether the procedures are the same or not.
Inter-process communication mechanisms serve four important
functions [34]: They oer mechanisms against failure. Thay also
provides the means to cross administrative boundaries. They allow
communications between separate processes over a computer network.
They enforce clean and simple interfaces, thus providing a natural
aid for the modular structure of large distributed applications.
24
They hide the distinction between local and remote
communication, thus allowing static or dynamic reconguration.
1.7.3
Object Request Broker
An Object Request Broker (ORB) is a type of middleware that
supports the remote execution of objects. An international ORB
standard is CORBA (Common Object Request Broker Architecture). It
is supported by more than 700 groups and managed by the Object
Management Group (OMG) [35].The OMG is a non prot-making
organization whose objective is to dene and promote standards for
object orientation in order to integrate applications based on
existing technologies.
The Object Management Architecture (OMA) is characterized by the
following: The Object Request Broker (ORB): It is the controlling
element of the architecture and it supports the portability of
objects and their interoperability in a network of heterogeneous
systems. Object services: These are specic system services for the
manipulation of objects.Their goal is to simplify the process of
constructing applications. Application services. These oer a set of
facilities for allowing applications access databases, to printing
services, to synchronize with other application, and so on.
Application objects: These allow the rapid development of
applications.A new application can be formed from objects in a
combined library of application services.
25
1.8
Concluding Remarks
As a begining of thesis, we are studying the necessity and
issues related of parallal computation and focusing architectures,
protocols and standerds of Computer Clusters. The motivation of
distributed processing using Computer Cluster turns into more
advance technology named Grid Computing which we will going to
discuss in next Section.
26
Chapter 2 Grid Computing : An IntroductionGrid Computing, more
specially Grid Computing System is a virtualized distributed
environment. Grid environment provides dynamic runtime selection,
sharing and aggregation of geographically distributed resources
based on resources availability, capability, performance and cost
of these computing resources. Fundamentally, Grid Computing is the
advanced form of distributed processing which is the combination of
decentralized architecture for managing computing resources and a
layered hierarchical architecture for providing services to the
user [36].
The rest of the chapter is organized as follows. We begin our
discussion with denition of Grid Computing and the benets of
virtualization on Grid in Section 2.1. In Section 2.3 and 2.4 we
consider the underlying layers of Grid Computing in details.
Resource management architecture is discussed in Section 2.5 and
the a protocol for resource management (GRAM) is discussed in
Section 2.5.2 . We Conclude our discussion in Section 2.6
introducing a new approach of distributed processing named Cloud
Computing.
27
2.1
Grid Computing: denitions and overview
The concept of Grid was introduced in early 1990s, where high
performance computers were connected by fast data communication.
The motivation of that approach was to support calculation- and
data-intensive scientic applications. Figure 2.1 [37] shows the
evolution of grid over time.
Figure 2.1: Evolution of Grid Computing The basics of Grid is to
co-allocation of distributed computation resources. The most cited
denition of Grid is [38]:
A computational grid is a hardware and software infrastructure
that provides dependable, consistent, pervasive, and inexpensive
access to high-end computational capabilities.
Again, according to IBM denition [39],
A grid is a collection of distributed computing resources
available
28
over a local or wide area network that appear to an end user or
application as one large virtual computing system. The vision is to
create virtual dynamic organizations through secure, coordinated
resource-sharing among individuals, institutions, and
resources.
A Grid Computing environments must include: Coordinated
resources: Grid environment must be facilitated with necessary
infrastructure for co-ordination of resources based upon policies
and service level agreements. Open standard protocols and
frameworks: Open standards can provide interoperability and
integration facilities. These standard should be applied for
resource discovery, resource access and resource co-ordination.
Open Grid Services Infrastructure (OGSI) [40] and Open Grid
Services Architecture (OGSA) [41] was published by the Global Grid
Forum (GGF) as a proposed recommendation for this approach. Grid
Computing can be distinguished also from High Performance Computing
(HPC) and Clustered Systems in following way: while Grid focuses on
resource sharing and can result in HPC, whereas HPC does not
necessarily involve sharing of resources [42].
2.1.1
Virtualization and Grid
Virtualization is the process of making resources accessible to
a user as if they were a single, larger, homogeneous, resource.
Virtualization supports the concept of dynamically shifting
resources across various platforms so that computing demands can be
scaled with available resources [43]. Figure 2.2 shows the
necessity of virtualization 29
to support the proper utilization of resources. Although average
utilization of the resources may be relatively low, during peak
cycles the server might be overtaxed and resources may not be
available.
Figure 2.2: Resource availability according to demand Grid
environments can supports the benets of virtualization. Grid
enables the abstraction of distributed systems and resources such
as processing, network bandwidth and data storage to create a
single system image. Such abstraction provides continuous access to
large pool of IT capabilities. Figure 2.3 and 2.4 [37] compares the
Grid environment over the traditional computations. In Figure 2.4
and organization-owned computational grid is shown where a
scheduler sets policies and priorities for placing jobs in the grid
infrastructure.
2.1.2
Grids over Cluster Computing
Computer Clusters detailed in Chapter XX are local to the
domain. The Clusters are designed to resolve the problem of
inadequate computing power. It provides more computation power by
pooling of computational resources and parallelizing the workload.
As Clusters provide dedicated functionality to local domain, they
are not suitable solution for resource sharing between users of
various domains. Nodes in the Cluster controlled centrally and
Cluster manager is monitoring the state of the node 30
Figure 2.3: Serving job requests in traditional environment
[44]. So, in brief, Cluster units only provide a subset of Grid
functionality.
2.2
An example of Grid Computing environment
We consider searching world wide web in Google as an example of
Grid Computing. Figure 2.5 shows the abstract view of Google search
architecture [45]. Google process tens of thousands of queries per
second. Each of this query is rst received by one of the Web
Servers, then passes it to the array of Index Servers. Index
Servers are responsible for keeping index of words and phrases
found in websites. The servers are distributed in several machines
and hence the searching should run concurrently. In fraction of
second, index servers perform a logical AND operation and return
the reference of the websites containing query (searching phrase).
The resultant references then sent to Store Servers. Store Servers
maintain compressed copies of all the pages known to Google. These
compressed copies are used to prepare page snippets and nally
presented to the end user in a readable form.
31
Figure 2.4: Serving job requests in traditional environment
Crawler Machines synchronizing through the web and updating the
Google database of pages stored in Index and Store servers. So, the
Store Servers actually contains relatively recent and compressed
copies of all the pages available in the web.
Grid Computing can facilitates the above scenario of ecient
searching. As it stated earlier the servers are distributed and
searching should be parallel in order to achieve eciency. The
infrastructure also need to scale with the growth of web as the
number of pages and indexes increased. Dierent organizations and
numerous servers are shared with Google. Copy the content and
transforming it into its local resource is allowed by Google. Local
resources contain keyword database of the Index Servers and cached
content in the database of the Store Servers. The resources
partially shared with end-users who send queries through their
browsers. Users can then directly contact with the original servers
to request the full content of the web page.
Google is also shared its computing cycles. Google shares its
computing resources, such as storage and computing capabilities
with the end-user by performing data 32
Figure 2.5: Google search architecture caching, ranking and
searching of query.
2.3
Grid Architecture
In this section we will discuss about grid architecture, which
identies the basic components of a grid system/ It also denes the
purpose and functions of such components. However, this layered
Grid architecture also indicates how these components actually
interacts with one another. Here we present Grid architecture in
accordance with Internet protocol architecture. As Internet
protocol architecture extends from network to application, however,
we can relate the Grid layers into Internet layers [46]. Figure 2.6
shows the Grid layers from top to bottom.
Grid architecture described in [46], the Resource and
Connectivity protocol is respon-
33
Figure 2.6: Grid Protocol Architecture sible for sharing
individual resources. The protocols in this layer are designed to
implement on a top of various types of resources which we identied
as Fabric layer. The raw Fabrics, however, can be used to support
application specic requirements.
2.3.1
Fabric Layer: Interfaces to Local Resources
Fabric layer provides the resources that can be shared in Grid
environment. An example of such resources may be computational
resources, storage systems, sensors and network systems. Grid
architecture do not deals with logical resources for example
distributed le systems, where resource implementation requires
individual internal protocols [46].
Components of the Fabric layer implement the local and
resource-specic operations on specic resources. Such resources are
physical or even logical. These resourcespecic operations provides
functionalities of sharing operations at higher levels. In order to
support sharing mechanisms we need to provide [44] : an inquiry
mechanism so that the components of Fabric are allowed to discover
and monitor resources. 34
an appropriate (either application dependent or unied or both)
resource management functionalities to control the QoS in Grid
environment.
2.3.2
Connectivity Layer: Managing Communications
Connectivity layer denes the core communication and
authentication protocols necessary for grid networks. Communication
protocol transfers data between Fabric layer resources.
Authentication protocols, however, build on communication services
for providing cryptographically secure mechanisms to the Grid users
and resources.
The communication protocol can work with any of the networking
layer protocols that support transport, routing, and naming
functionalities. In computational Grid, TCP/IP Internet protocol
stack is commonly used [46].
2.3.3
Resource Layer: Sharing of a Single Resource
Resource layer is on top of Connectivity layer to dene the
protocols along with API and SDKs for secure negotiation,
monitoring, initialization, control and payment of sharing
operations on individual resources. Resource layer uses Fabric
layer interfaces and functions to access and control local
resources. This layer entirely consider local and individual
resources and therefore, ignores global resource management issues
[46]. To share single resource, we need to classify two resource
layer protocols [46]: Information protocols: Information protocols
are used to discover the information about state and structure of
the resource for example - the conguration of resource, current
load state, usage policy or costing of the resource. Management
protocols: Management protocols in Resource layer are used to
control and access to a shared resource. The protocols specify
resource re35
quirements, which includes advanced reservation and QoS and the
operations on resources. Such operations include process creation,
data access etc. There need to present some protocols to support
monitoring application status and termination of an operation.
2.3.4
Collective Layer : Co-ordination with multiple resources
Resource layer, described in Section 2.3.3 deals with operation
and management of single resource. But for global resource
co-ordination Collective layer protocols have been used. This layer
provides necessary API and SDKs not associated with specic resource
rather the global resources in overall grid environment.
Figure 2.7: Collective and Resource layer protocols are combined
in various ways to provide application functionality The
implementation of Collective layer functions can be built on
Resource layer or other Collective layer protocols and APIs [46].
Figure 2.7 shows a Collective coallocation API and SDK that uses a
Resource layer management protocol to control resources. In the top
of this, we dene a co-reservation service protocol and the service
itself. So by calling the co-allocation API to implement
co-allocation operations provide additional functionality such as
authorization, fault tolerance etc. An appli-
36
cation then use the co-reservation service protocols to request
and perform end-to-end reservations.
2.3.5
Application Layer : User dened Grid Applications
The top layer of the Grid consists with user applications, which
are constructed by utilizing the services dened at each lower
layer. At each layer, we have well-dened protocols that access some
useful services for example resource management, data access,
resource discovery etc. Figure 2.8 shows the correlation between
dierent layers [46]. APIs are implemented by SDKs, which use Grid
protocols to provide functionalities to end user. Higher level SDKs
can also provide functionality so that it is not directly mapped to
a specic protocol. However, it may combine protocol operations with
calls to additional APIs to implement local functionality.
Figure 2.8: Programmers view of Grid Architecture. Dotted lines
denotes protocol interactions where solid lines represent a direct
call
37
2.4
Grid Computing with Globus
Globus [47] provides a software infrastructure so that
applications can distributed computing resources as a single
virtual machine [48]. Globus Tooklit, the core component of the
infrastructure denes basic servise and capabilities required for
computational Grid. Globus is designed as a layered architecture
where high-level global services are built on the top of low-level
local services. In this section we will discuss how Globus toolkit
protocols actually interacts with Grid layers. Fabric Layer: Globus
toolkit is designed to use existing fabric components [46]. For
example, enquiry software is provided for discovering and state
information of various common resources such as computer
information (i.e. OS version, hardware conguration etc), storage
systems (i.e. available spaces) etc. In the higher level protocols
(particularly at the Resource layer) implementation of Resource
management, is normally assumed to be the domain of local resource
managers. Connectivity Layer: Globus uses public-key based Grid
Security Infrastructure (GSI) protocols [49, 50] for
authentication, communication protection, and authorization. GSI
extends the Transport Layer Security (TLS) protocols [51] to
address the issues of single sign-on, delegation, integration with
various local security solutions. Resource Layer: A Grid Resource
Information Protocol (GRIP) [52] is used to dene standard resource
information protocol. The HTTP-based Grid Resource Access and
Management (GRAM) [53] protocol is used for allocation of
computational resources and also for monitoring and controlling the
computation of those resources. An 38
extended version of the FTP, GridFTP [54], is used for partial
le access and management of parallelism in the high-speed data
transfers [46].
The Globus Toolkit denes client-side C and Java APIs and SDKs
for these protocols. However, Server-side SDKs can also provided
for each protocol, to provide the integration of various resources
for example computational, storage, network into the Grid [46].
Collective Layer: Grid Information Index Servers (GIISs) supports
arbitrary views on resource subsets, LDAP information protocol used
to access resource-specic GRISs to obtain resource state and Grid
Resource Registration Protocol (GRRP) is used for resource
registration. Also couple of replica catalog and replica management
services are used to support the management of dataset replicas.
There is an on-line credential repository service known MyProxy
provide secure storage for proxy credentials [55]. The The
Dynamically-Updated Request Online Coallocator (DUROC) provides an
SDK and API for resource co-allocation [56].
2.5
Resource Management in Grid Computing
In this section we will discuss about a resource management
architecture for Grid environment described in [53]. The block
diagram of the architecture is found in Figure 2.9. To communicate
request for resources between components we use an Resource
Specication Language (RSL) which is described details later in
Section 2.5.1. With the help of the process called specialization,
Resource Brokers transfer the high level RSL specication into
concrete specication of resources. This specication of request
39
Figure 2.9: A resource management architecture for Grid
Computing environment named ground request is passed to a
co-allocator, which is responsible for allocating and management of
resources at multiple sites. A multi-request is a request which is
involved resources at multiple sites. Resource co-allocators can
brake such multirequest into components and pass each element into
appropriate resource manager. The information service, working
between Resource Broker and Co-allocator is responsible for giving
access to availability and capability of resources.
2.5.1
Resource Specication Language
Resource Specication Language (RSL) is combination of parameters
including the operators: & : conjunction of parameter
specications | : disjunction of parameter specications + :
combining two or more request into single compound request or
multi-request Resource broker, co-allocators and resource managers
each dene a set of parameter-name. Resource managers generally
recognize two types of parameter-name in order to com40
municate with local schedulers. MDS attribute name: to express
constraint on resources. For example: memory>64 or network=atm
etc. Scheduler parameters: used to communicate information related
to job, i.e. count (number of nodes required), max_time (maximum
time required), executables , environment (environment variables)
etc. For example the following simple specication taken from [53],
&(executable=myprog) (|(&count=5)(memory>=64))
(&(count=10)(memory>=32))) requests 5 nodes with at least
64MB memory or 10 nodes with atleast 32 MB memory. Here, executable
and count are scheduler parameters.
Again, the following is an example of multi-request:
+(&count=80)(memory>=64) (executable=my_executable)
(resourcemanager=rm1) (&(count=256)(network=atm)
(executable=my_executable) (resourcemanager=rm2) Here two requests
are concatenated by + operator. This is also an example of ground
request as every component of the request requires a resource
manager.
2.5.2
Globus Resource Allocation Manager (GRAM)
Globus Resource Allocation Manager (GRAM) is designed to run
jobs remotely and providing an API for submitting, monitoring, and
terminating job. GRAM is the lowest level of Globus resource
management architecture [57]. 41
Figure 2.10: Globus GRAM Architecture Figure 2.10 shows the
basic architecture for GRAM. When a job is submitted, the request
is sent to the gatekeeper of the remote computer. The gatekeeper
handles the request and creates a job manager for the job. The job
manager then starts and monitors the remote program, communicating
state changes back to the user on the local machine. When the
remote application terminates (either normally or by failing). the
job manager also terminates [57].
2.6
Evolution towards Cloud Computing from Grid
The convergence of Grid Computing with Service-Oriented
Computing (SOC) resembles the Grid functionality in form of
services. Service-oriented Grid oers virtualization of the
available resources which increase the versatility of Grid [58]. It
also binds Grid specic services on the hardware level and
application services. With the help of Grid Computing it is
possible to integrate heterogeneous physical resources in to
virtualized and centrally accessible computing unit. Based on the
convergence with 42
SOC, Grid Computing is oered in form of Grid services [42] as
shown in Figure 2.11.
Figure 2.11: Enhancement of generic Grid architecture to Service
Oriented Grid In order to meet market demands, providers approach
to oer the following functionality [42] : Scalable, exible, robust
and reliably physical infrastructure Platform services to enable
programming access to physical infrastructure with abstraction in
interfaces SaaS (described in Chapter XX) supported scalable
physical infrastructure All this is emerging in new on-line
platforms referred as Cloud Computing, that provides X-as-a-Service
products which we will going to discuss in next Chapter.
2.7
Concluding remarks
In this Chapter, we have discussed about the brief of Grid
Computing environment and comparing it with traditional Clusters.
Besides we have also discussed about layered architecture of Grid.
As an implementation method of Grid, we consider 43
Globus toolkit and correlates Grid layers with Globus
implementations. Later, we also discuss about the resource
management issues in Grid and focusing how GRAM protocol actually
used in Globus toolkit to manage resource requests. We conclude the
chapter introducing Cloud Computing a new trend in distributed
systems inspired form Grid and Service-oriented Computing.
44
Chapter 3 An overview of Cloud ArchitectureIn a Cloud
environment hardware and software services needed to be stored on
web servers the Cloud, rather spreading over a single computer
connected through Internet. Cloud computing responsible for
delivering IT functionalities to external users by obtaining that
functionality from external providers with the services in
pay-peruse manner over the Internet. These Cloud services are
consumed via web browser or a dened API [59].
The rest of the chapter is organize as follows: we began our
discuss with detail architectural overview of Cloud Computing
environment in Section 3.2 . A details of Cloud services (PaaS,
IaaS, SaaS) is discussed in Section 3.3 and Virtualization on Cloud
is discussed in Section 3.4. We conclude the Chapter by explaining
a practical Cloud implementation in Section 3.5.
3.1
Cloud Components
Cloud environments consist with the following elements :
Clients, Data-centers and Distributed Servers [60]. These
components are combined together to build a Cloud Computing
Solution as shown in Figure 3.1. Each element has distinct
functionalities which will describe next. 45
Figure 3.1: Components of a Cloud Computing Solution i.
Clients:
Clients are same as in traditional Local Area Networks (LAN). In
general, clients are computers or machines used for accessing
functionalities. These machines may include laptops, tablet
computers, mobile or cellular phones, PDAs because of their
mobility. Clients are generally classied into following three
categories: Mobile Clients: includes mobile devices like PDA or
Smartphones. Examples are Blackberry, Windows Mobile Smartphone,
iPhone/iPad etc. Thin Clients: the computers that do not have their
internal hard drive, instead, server do all the work, the the
clients task to display the information. Generally used as a
terminal. Thick Clients: are regular computer, using a web browser
to connect in the Cloud.
ii. Data-centers:
46
Data-center is collection of servers where the processes or
application is hosted. Severs can be physically grouped in a room
or building or can be distributed throughout the world. In
virtualized severs, application is installed and allow multiple
instances so the all virtual servers can access it. Using this
principle, several virtual severs can run into one physical
servers. Number of virtual server in a physical server depends upon
type of the application, size and speed of the server and service
provided by provider.
iii. Distributed Servers:
As said earlier, often servers are in geographically disparate
locations. But to the end-users servers act as if they are operate
right next to each other. This gives exibility in operations and
enhance security and privacy. If any of the servers downs due to
failure or maintenance purpose, service provided by system can
still accessed thorough other distributed server(s).
3.2
Cloud Architectures
Cloud architectures focus the diculties arise in large-scale
data processing. In traditional approach it is dicult to allocate
processing units as per application demand. Also sometimes it is
dicult to access CPU according to users requirement. Job allocation
is another problem: often it is dicult to distribute and maintain
largescale jobs on dierent machines. We need to provide recovery
mechanism by another machine in for avoiding failures. Also,
scalability is another issue in traditional approach, it is dicult
to scale-up and scale-down automatically. Cloud architectures, in
contrast to traditional approaches, concentrates to solve these
problems [61].
47
In Cloud computing, computational resources are provided as
services, generally XassS - known as X-as-a-Service. In particular,
Cloud is vritualization of grid and traditional web services. When
Cloud services and platform has been created, it is possible to
access virtual Grid to the companies which request it by creating
Guest Virtual Organizations (GVO) [62]. One possible distinction of
Cluster, Grid and Cloud architecture is shown in Figure 3.2.
Figure 3.2: Hierarchical abstraction layers of Cluster, Grid and
Cloud Computing Rest of the section we will discuss about various
approaches on Cloud architectures and give a brief about underlying
layers.
3.2.1
A layered model of Cloud architecture - Cloud ontology
The Cloud ontology is considered as stack of layers. Each layer
consist with one or more Cloud services. Services with same level
of abstraction (determined by their targeted users) belongs to same
layers [63]. For example, Cloud software environment mainly
targeted to programmers or developers. On the other hand, Cloud
applications target end-users. So, Cloud software environment and
Cloud applications classied as dierent layer.
The ordering in Cloud stack is important, it determines the
work-ow in Cloud. 48
For example, Cloud applications are composed from Cloud Software
environments. Hence, application layer is in upper position in
Cloud stack. The Cloud ontology is shown in 3.3 which is depicted
as stack of ve layers [63]: a) Cloud Application Layer b) Cloud
Software Environment Layer c) Cloud Software Infrastructure Layer
d) Software Kernel Layer e) Hardware Layer
Figure 3.3: Cloud layered architecture : consists of ve layers,
gure represents interdependency between layers
(a) Cloud Application Layer:
Application layer is top of cloud layer and most visible to the
end-users. Users can access these services through Internet by
paying necessary fees. It carried out computational works from
users terminal (input) to processing units (e.g. data centers) in
where the applications are hosted. Total procedures are abstracted
to the end-users and provides the outputs of CPU-intensive and
memory intensive large scale tasks in their local machine.
From providers persecutive, higher manageability can be
achieved. The application is deployed in providers infrastructure,
not in client machine, hence, they can maintain or upgrade system
without interrupting users. 49
The model generally known as Software as a Service (SaaS). Cloud
applications can be grouped as a service for another Cloud
services. Cloud applications can be developed in the Cloud software
environments or sometimes in Cloud infrastructure components.
(b) Cloud Software Environment Layer:
The layer just below the Application layer is Software
Environment layer. The layer mainly targets the developer, who
build and deploy softwares for end-users in the Cloud. Providers of
this layer provide suitable programming-language level developing
environment by means of well dened and documented API. The API
integrates developers softwares as well as provides necessary
deployment and scalability support. The services provided by this
layer is known as Platform as a Service (PaaS).
Developers are beneted by developing their application in Cloud
Programming environment with a support of automatic load balancing,
authentication services, e-mail services etc. Developers can add
necessary services to their application ondemand, which makes
application development is less tidier and minimize logic faults
[63]. Hadoop [64], a Cloud Software Environment, provides
developers a programming environment (MapReduce - programming model
for data processing in large clusters [65]). Yahoos Pig [66] is a
high level language which can process very large les in hadoop
environment. That is how developers can beneted by several services
as per necessity.
50
(c) Cloud Software Infrastructure Layer:
Software Infrastructure Layer provides necessary resources to
the higher level layers. Services oered in this layer classied into
following subclasses : i. Computational Resources, ii. Data Storage
and iii. Communications. i. Computational Resources:
Cloud users get the computational resources by Virtual Machines
(VMs) in this layer. Services provided is often known as
Infrastructure as a Service (IaaS). Virtualization providers the
user exibility in conguring settings. At the same time, it protects
the physical infrastructure of providers data center [63].
Virtualization shown in Figure 3.4 where traditional non-cloud
environment runs three dierent applications on its own server. On
the other hand, Cloud shares the servers for OS and applications
which results fewer servers [67].
Figure 3.4: Non-cloud environment needs three servers but in the
Cloud, two servers are used
51
IaaS get beneted by two type of virtualization technologies :
Paravirtualiztion and Hardware-assisted virtualiztion. Still, the
problem of performance interference between VMs and sharing same
cache and TLB hierarchy remains unsolved. Modern multi-core
machines in main servers sometimes create performance isolation
problem. This lack of performance isolation between VMs, that share
same physical node is problematic for optimal performance [63]. We
will cover more on virtualization in Section 3.4.
ii. Data Storage:
Data storage is another infrastructure resource in this layer
which allows user to store their data in remote storage devices and
provide an access mechanism anytime and from anywhere, The service
provided by Cloud providers is known as Database as a Service
(DaaS). DaaS facilitates scalability to cloud applications for both
users and developers.
In preliminary level, Cloud storage system needs one data server
for connecting to Internet. Client can access data by interacting
with database server using a web-based interface. Server may send
the le back kept by user or provide functionality to manipulate the
data on-line. However, practically commercial Cloud storage systems
use hundreds if data servers, For server maintenance or repairing
purpose it is necessary to keep multiple machines to fulll users
demand. Which creates redundancy but without this redundancy
clients might not access information at any given time. Often,
providers keep data on servers running in dierent power supplies.
Which ensures, clients can still access and manipulate data even in
case of power failures [68]. 52
Some example of Data storage systems are: distributed le systems
(e.g. Google File System [69]), replicated relational databases
(RDBMS) (e.g. Bayou [70]) and key-value stores (e.g. Dynamo [71]).
RDBMS model gives more focus on consistency model [72, 73] but paid
the cost of availability of data. On the other hand, key-value
stores give much importance on the availability on data loosen up
consistency model [63].
iii. Communication:
The rate of data transfer is high in Cloud environment. For
providing Quality of Service (QoS) communication plays vital role
in Cloud infrastructure. To meet QoS, concept of Communication as a
Service (CaaS) introduces which consist of network security,
dynamic trac isolation or dedicated bandwidth, guaranteed message
delay, communication encryption, network monitoring etc. [63].
Though CaaS is least discussed topic in literature, there are
couple of research publications and articles [74, 75, 76] focus
design and architecture of CaaS for providing QoS in communication
systems. A practical example of CaaS is Microsofts new Connected
Service Framework (CFS) [77]. Also, VoIP telephone systems and
instant messaging softwares in Cloud can also use CaaS for better
network utilization. (d) Software Kernel:
Software kernel layers provides software management
functionalities for physical servers in Cloud. Such software kernel
can be implemented as an OS kernel,
53
hypervisor, Virtual Machine Monitor (VMM) and/or as a clustering
middleware [63]. Grid applications can run in this layer connected
through several clusters of machines. But due to lack of
virtualization in Grid, periodic check-pointing and load balancing
is bit complicated because jobs are mainly tied in actual hardware
infrastructure, not in kernel. Two such middleware for Grid are
Globus [78] and Condor [79].
(e) Hardware and Firmware:
The bottom layer in Cloud layered architecture is fabric layer
i.e. actual physical hardware and switches which are so-called
backbone for the Cloud [63]. Users of this layer are organizations
with massive IT requirements. Providers sometimes facilitates
Hardware as a Service (HaaS). This model helps enterprise clients
so that they need not build and maintain large data centers.
Services included (but not limited to) in HaaS are servers,
desktops, notebooks, infrastructure components, licensing etc.
[80].
Some technical challenges still exist to implement HaaS
eectively. Eciency in speed in large scale systems is a challenging
issue. Remote scriptable bootloaders (for example UBoot [81] is one
solution to boot the system remotely and deploy applications hosted
in distributed data centers. Another challenges in HaaS are data
center management, scheduling power consumption optimizations etc.
[63]. In Table 3.1 [63] we provide example of some existing Cloud
system and then classied into the layers of Cloud Ontology . 54
Cloud Layers Cloud Application Layer Cloud Software
Environment
Example of existing Cloud Solutions Google Apps, Salesforce
Customer Relation Management (CRM) Google App Engine, Salesforce
Apex System
Computational Resources: Amazon EC2, Enomalism Elastic Cloud
Software Infrastructure Cloud Storage: Amazon S3, EMC Storage
Manages Service Communication: Microsoft Connected Service
Framework (CSF) Software Kernel Grid and Cluster Computing Systems
(for example : Globus and Condor) Firmware or Hardware IBM-Morgan
Stanleys Computing Sublease, IBM Kittyhawk Project
Table 3.1: Example of existing Cloud Systems w.r.to classication
into layers of Cloud Ontology
3.2.2
Cloud Business Model
Cloud computing provides a service-driven business model [82].
In Cloud, hardware and platform resources (which is actually
provided as services) are available as per demand. Each layer
discussed in the layered architecture can be used as a service to
the upper layer. In other words, every layer is considered as
consumer to the lower level layer.
Figure 3.5: Cloud computing Business model Cloud services
generally grouped into three categories : a) Infrastructure as a
Service 55
(IaaS). b) Platform as a Service (PaaS) c) Software as a Service
(SaaS). (a) Infrastructure as a Service (IaaS): In IaaS, customer
can deploy his own software on the infrastructure. IaaS provides
infrastructural resources (for example: servers, storage systems,
networking devices, data center space etc. [83]) based upon demand
with the benet of Virtual Machines (VM). Organization oers the IaaS
know as IaaS provider. Common example of IaaS providers include
Amazon EC2 [84], GoGrid [85] and 3Tera [86]. (b) Platform as a
Service (PaaS): PaaS provides platform level resources which may
include support for operating systems and software development
frameworks [82]. The combination of operating systems and software
development frameworks (for example LAMP platform - Linux, Apache,
MySQL, PHP) ensures manageability and scalability of Cloud
environment [83]. Microsoft Windows Azure [87], Google App Engine
[88], Force.Com [89] are common example of PaaS providers. (c)
Software as a Service (SaaS): SaaS provides on-demand applications
through Internet. Single instance of the service (single or
multiple softwares) runs on the Cloud and multiple users connected
through Cloud can access it. Customers beneted by saving their
equipment investment and software licensing cost. On the other
hand, providers are beneted because of only single instance of the
software (service) needs to be hosted and maintained. SaaS is oered
by Google [90], Microsoft [91], Rackspace [92]. Figure 3.5
illustrates typical Cloud business model. Based upon layer
architecture of Cloud, PaaS providers run on top of IaaS providers
services. But, in current business markets, IaaS and PaaS providers
provides services jointly (for example Google and
56
Salesforce) [82]. For that reason PaaS and IaaS providers often
considered as Infrastructure providers or Cloud providers [93].
We will cover details of these services on section 3.3.
3.2.3
Cloud Deployment Model
Cloud Deployment Model describes Cloud deployment scenarios
available to any typical organization. Deployment model mainly
denes [94]: a) External (or Public) Cloud, b) Internal (or Private)
Cloud, c) Hybrid (or Integrated) Cloud and d) Community (or
Vertical) Cloud.
Other than traditional Cloud solutions, organization can
implement Cloud internally, commonly known as Private Cloud. In
Private Cloud business organization can provide eective utilization
of computing resources, at the same time, security and privacy of
data can be ensured. Many analysts suggest that implementing Cloud
systems internally inside organization actually defeats the main
objective of Cloud [94].
Main focus of traditional Cloud is obtaining computing resources
form a network of Cloud service provider based upon demand with a
provision of dynamic addition or subtraction of capacity.
Implementing internal Cloud means internal capacity. In traditional
(public) Cloud, end-users need not pay infrastructure costs once
they purchase services form the providers. But, Private Cloud, like
internal data centers incur depreciated costs. As a matter of fact,
some would argue that, Public Cloud actually use of internal
resources through a highly virtualized hardware and application
wrapper [94]. Regardless of this debate, These dierent types of
Cloud, each with its own advantages and drawbacks are discussed
here.
57
(a) External (Public) Cloud:
This Cloud solutions is Provided by independent third-party
cloud service providers. Service providers oer their resources as
services to all (general public to business organizations). Example
of External (Public) Cloud Deployment Model are Amazon, Salesforce,
Google and other Cloud service providers. Key attributes [94] of
this deployment model are: Services are accessed through web with a
self-service user interface, Well documented user guides, APIs and
technical support, Service Level Agreements (SLA) between clients
and providers, Availibilty of multiple virtual machines with
various congurations based upon requirements (which includes
conguration of processor, memory, operating system, application
server, development environments and so on), Provision of dierent
types of Cloud resources: for example, Amazon provide dierent
services targeting dierent group of users - Amazon Simple Storage
Service (S3), Amazon Simple DB for storage, Amazon Elastic Compute
Cloud (EC2) for computation etc. Figure 3.6 shows example of Public
Cloud. One of the major Benets of Public Cloud includes no initial
investment for infrastructure. But there is a controversy that
Public Clouds lack control over data, network and security settings
which may hamper eectiveness in many business organizations
[82].
58
Figure 3.6: External or Public Cloud (b) Internal (Private)
Cloud:
Internal or Private Clouds are mainly designed for single
organization. These type of Cloud can be built and manage by
organization itself or by external providers. Benets of Private
Cloud includes highest degree of control over performance.
reliability, security and privacy. But as said earlier, Private
Clouds are being criticized due to similarity with traditional
proprietary servers or data centers and hence do not provide benets
of no up-front capital costs [82]. Figure 3.7 shows example of
Public Cloud.
Figure 3.7: Internal or Private Cloud Private vs. Public Cloud
Computing: Several distinguish characteristics [95] of a Private
Cloud actually diers it from traditional distributed systems.
59
Firstly, Private Cloud diers from Public Clouds that the
infrastructure in Private Cloud is solely dedicated to a single
business enterprise and which is not shared with others. This
infrastructure may include corporate clients, business partners,
intranet vendors or any other groups. Secondly, Security
credentials are generally strict in Private Cloud Deployment Model.
Though Private Cloud is not inheretently more secure than Public
Cloud, but the organization that has security issues and risk
concerns make adopt tighter security accessories.
(c) Hybrid (Integrated) Cloud:
Combination of Public and Private Cloud model is Hybrid (or
Integrated) Cloud. In this type of Deployment model, part of the
services runs in Private Cloud while rest of services runs under
Public Cloud. Hybrid Deployment Model provide more adaptability
which makes it exible than Public or Private models. More
generally, Hybrid Clouds provide strong security features and more
control over application and data compared to Public Clouds.
Besides it is still able to provide scalability and can serve
clients on-demand requests. But, the complex part is to determine
the optimum partition or splitting boundary of public and private
components [82]. Hence, Hybrid Cloud requires Cloud integration.
So, often this model is known as Integrated Deployment Model. Cloud
integration and interoperability is one of major research
challenges in Cloud industry [94]. There are some Cloud interfaces
and APIs, Cloud integration and interoperability standards, tools
for cross-cloud composition exists to meet the business
60
requirements and need to improvise for optimized performance and
meet future demands. Figure 3.8 shows example of Hybrid Cloud.
Figure 3.8: Example of Hybrid Cloud Major attributes [94] of
Hybrid Clouds are: A combination of Private (Internal) Cloud and
Public (External) Cloud enabled resources. Benets of
cost-eectiveness of external third-party Clouds with mitigation of
risks by maintaining internal Private Cloud for critical process
(and application data). Integration of external and internally
provided capabilities which includes integration of vendor
proprietary APIs with internal interfaces.
3.3
Cloud Services
In Cloud Business Model in Section 3.2.2 we give a brief of
Cloud Services. Now in this Section, we will cover the the services
in more detail and describe how they logically connected to each
other.
61
3.3.1
Infrastructure as a Service (IaaS)
IaaS provides computing resources such as processing or storage
that can be received as a service. IaaS providers typically oer
virtualised infrastructure as a service so that end-users need not
to buy raw hardware infrastructure. Raw hardware resources, such as
compute, storage and network resources, are considered as the
fabric layer. Typically by virtualization, hardware level resources
are abstracted and encapsulated and exposed to end users through a
standardized interface [59] as shown in Figure 3.9.
Figure 3.9: Correlation between Cloud Architecture and Cloud
Services IaaS allows to provide resources such as Server space,
Network equipment, Memory, CPU cycles, Storage space etc [68].
Figure 3.10 shows an example of IaaS. The infrastructures can be
dynamically scaled up or down, based on the application demand of
resources.
3.3.2
Platform as a Service (PaaS)
Platforms are an abstraction layer between the software
applications (SaaS) and the virtualized infrastructure (IaaS). PaaS
are targeted for software developers. Developers can write
applications based on specications of a particular platform without
going deeper about