1 VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014 An Overview of Cloud & Distributed Computing - Programming Paradigms VCV.Rao Centre for Development of Advanced Computing (C-DAC), Pune University Campus National Workshop on Big Data Analytics (BiDA2014) at CRRao AIMSCS August 22-24, 2014, jointly with CMSD, U of Hyderabad, & Computer Society of India.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
An Overview of Cloud & Distributed Computing -
Programming Paradigms
VCV.Rao
Centre for Development of Advanced Computing (C-DAC),
Pune University Campus
National Workshop on Big Data Analytics (BiDA2014) at CRRao AIMSCS
August 22-24, 2014,
jointly with CMSD, U of Hyderabad, & Computer Society of India.
2VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
VCV. Rao, Ph.D
Associate Director /HoD
HPC Frontier technologies Exploration (HPC-FTE) Group
Centre for Development of Advanced Computing (C-DAC)
CDAC is a R&D Institute of Department of Electronics and Information Technology (DeitY) ,
Ministry of Communications & Information Technology (MCIT) );
Government of India
3VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
An Overview of Distributed Computing Architectures –
(HPC Systems – High Performance Computing, & High
Throughput Computing, Grid Computing & Utility
Computing) and Cloud Computing
Programming on Cloud and Distributed Computing
Systems
Opportunities for Applications
Lecture Outline
Following topics will be discussed
Distributed and Cloud Computing -
Prog. & Software Environments – An Overview
4VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Presentation is prepared based on Author’s experience on various research projects on Cloud & Distributed Computing as well as references given in this presentation.
Source :Text Books, Research Articles, Web Sites as indicated in many slides and References of this presentation
Courtesy :Authors research work as indicated in Text Books, Research Articles, Web Sites as indicated in many slides and References of this presentation
References & Source for Presentation
5VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Centre for Development of Advanced Computing (C-DAC)
is the premier R&D organization of the Department of
Electronics and Information Technology (DeitY), Ministry of
Communications & Information Technology (MCIT) for
carrying out R&D in IT, Electronics and associated areas.
The setting up in 1988 - C-DAC has been undertaking
building of multiple generations of Supercomputer starting
from PARAM with 1 GF in 1988.
C-DAC , in brief
6VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Scalable Computing over the Internet
The New Computing Paradigms
Computing Paradigm Distinctions
Centralized Computing
Parallel Computing
Distributed Computing
Cloud Computing
Distributed System Models & Enabling Technologies
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
7VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Scalable Computing over the Internet
High-Performance Computing
Supercomputers (massively parallel processors or MPPs)
are gradually replaced by clusters of cooperative
computers out of desire to share computing resources).
The Cluster is often a collection of homogeneous
computer nodes that are physically connected in close
range to one another.
Distributed System Models and Enabling Technologies
8VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Scalable Computing over the Internet
High-Throughput Computing (HTC)
Peer-to-peer (P2P) networks are formed for distributed file
sharing and content delivery applications.
Change will take place on HPC Paradigms to an HTC
paradigms (High-flux computing). The applications for high-
flux computing is the Internet searches and web services by
million or more users simultaneously.
Distributed System Models & Enabling Technologies
9VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Partitioning
Computation Partitioning
Data Partitioning
Mapping
Synchronization
Communication
Scheduling
Consider a distributed computing system consisting of a set of network nodes
or workers. For Parallelization of application on Distributed Cloud computing,
below important issues should be addressed.
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by
Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Parallel and Distributed Programming Paradigms
Distributed Comp. & Software Environments
10VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
11VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Shared Pool of Computing
Resources: Processors,
Memory, Disks.
Guarantee at least one
workstation to many
individuals (when active)
Deliver large % of collective
resources to few individuals at
any one time.
Shared Pool of Computing – Applications
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
12VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
14VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
SMP
NUMA
Cloud
Grid
MPP
P2P network
Cluster
High(100%)
Low(0)
Small System size (# processor cores) Large (106)
Syst
em a
vaila
bili
ty
Estimated system availability by system size of common configurations in 2010.
Distributed System Models and Enabling Technologies
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgann & Kaufmann, Publishers 2012
15VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
SMP
NUMAGrid
P2P
Cloud
Cluster
1 10 102 103 104 105 106 107
1
10
102
103
104
105
106
107 Scalability (No. of processors or cores in a system)
System scalability versus multiplicity of OS images
Distributed System Models and Enabling Technologies
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgann & Kaufmann, Publishers 2012
16VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Grid Services: A Layered Grid Architecture Application
Fabric
“Controlling things locally”: Access to, & control of, resources
Connectivity
“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Classification of Grid applicationsAll applications which make use of coupled computational resources that are not available at a SINGLE SITE
Grid aware applications
Multi-disciplinary Applications
Meta Applications
SAME
Applications in Grid Computing
Source : References & The Globus CoG Home Page. http://www.globus.org/cog.Globus. http://www.globus.org. http://www.teragrid.org/ The NASA Information Power Grid Home Page. http://www.ipg.nasa.gov. http://www.cdac.in/
17VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Source : References & The Globus CoG Home Page. http://www.globus.org/cog.Globus. http://www.globus.org. http://www.teragrid.org/ The NASA Information Power Grid Home Page. http://www.ipg.nasa.gov. http://www.cdac.in/
18VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
More related to cloud computing
– Applications, storage, computing power and network
Requires cloud like infrastructure
Pay by the drink model
– Similar to electric service at home
Pay for extra resources when needed
– To handle expected surge in demand
– Unanticipated surges in demand
Better economics
Utility computing is the packaging of Computing resources such as computation
and storage, as a metered service similar to a traditional Public Utility(such as
electricity, water, natural gas or telephone network ).
A utility computing service is one in which customers receive computing resourcesfrom a service provider (hardware and/or software) and “pay by the drink,” muchas you do for your electric service at home
• Amazon Web Service (AWS)
• Elastics Computer Cloud (EC2) –
• Simple Storage Service (S3)
• EMC cloud Storage –
• Microsoft Azure
• Google App Engine
Utility Computing
Source : References & The Globus CoG Home Page. http://www.globus.org/cog.The NASA Information Power Grid Home Page. http://www.ipg.nasa.gov. http://www.cdac.in/
Raj Kumar Buyya,. Bubendorfer (Eds.), Market Oriented Grid and Utility Computing, John Wiley & Sons, 2009.
19VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Web services
Data centers
Utility computing
Service computing
Grid computing
P2P computing
Cloud computing
Technology
convergence
HTC in business and HPC
in scientific applications
Ubiquitous: Reliable and scalable
Automatic: Dynamic and discovery
Composable: QoS, SLA, etc.Attributes/capabilities
Computing paradigms
The trend towards Utility Computing: The vision of computer utilities in
modern distributed computing systems.
Distributed System Models & Enabling Technologies
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Raj Kumar Buyya,. Bubendorfer (Eds.), Market Oriented Grid and Utility Computing, John Wiley & Sons, 2009.
20VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
HTC Systems HPC Systems
P2P network Clusters of MPPs
Computational and data grids
Web 2.0 services
Internet clouds
Internet of Things
Homogenous nodesDisparate nodes
File sharing
Distributed control
Geographically sparse
Service-oriented architecture (SOA)
Centralized control
High speed
Disparate clusters
RFID and sensors
Virtualization
Evolutionary trend toward parallel distributed, and cloud computing with clusters,
MPPs, P2P network and grids and clouds, web services, and the Internet of Things.
Distributed and Cloud Computing
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
21VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
24VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Hardware Software
Storage
Service
Network
Internet cloud
User
Paid services
Submit requests
Virtualized resources from data centres to form a Internet cloud, provisioned with hardware, software, storage, network, and services for paid users to run their applications
Cloud Computing over the Internet
Distributed System Models & Enabling Technologies
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
25VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Hardware Software
Storage
Service
Network
Internet cloudUser
Paid services
Submit requests
Cloud Computing over the Internet
Distributed System Models & Enabling Technologies
Features of Cloud & Grid Platforms Cloud Capabilities & Platform
Features Traditional Features Common to
Grids and Clouds Data Features and Databases Programming and Runtime Support
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Cloud Platforms – Capabilities Should offer cost-effective utility
computing with the elasticity to scale-up and down in power
Commercial Clouds offer different capabilities• Commonly termed as “Platform as a
Service (PaaS)” Ex : Azure• Commonly termed as “Infrastructure
as a service (Iaas)” Ex : Amazon
26VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Microsoft Azure
Amazon AWS
PublicSalesforce Force.com
Google App engine
The InternetIBM Blue Cloud
Private cloud (IBM RC2)
A hybrid cloud
An Intranet
Cloud users
. . .
Server cluster (VMs)
Data center
Cloud storage
Cloud service queues
Platform fronted (web service API)
A typical public cloud
To users or other public cloud over the Internet
Public, private and hybrid clouds illustrated by functional architecture and connectivity of representative clouds
Cloud Computing and Service Models
Public, Private and Hybrid Clouds
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
27VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Standard data-center networking for the cloud to access the Internet.
Data-Center Networking Structure
Cloud Computing and Service Models
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
28VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Service Consumers
ComponentLibrary
CloudAdministrator
HPC/HTC SystemInfrastructure
Monitor & Manage
Resources
Component Vendors /
Software Publishers
Publish & Update
Components
Access
Services
IT Cloud
Parallel and Distributed Programming Paradigms Parallel Computing and Programming ParadigmsMapReduce, Twister and Iterative MapReduce Hadoop Library from Apache Dryad and DryadLINQ from Microsoft Sawzail and Pig Latin High-Level LanguagesMapping Apps. to Parallel & Distributed Systems
Cloud Programming & Software Environment
Programming Support Google App Engine Programming on Google App EngineGoogle File System (GFS) BigTable , Google’s NOSQL System Chubby, Google’s Distributed Lock Service
Programming on Amazon AWS and Micro Azure Programming on Amazon EC2 Amazon Simple Storage Service (S3) Amazon Elastic Block Store (EBS) and SimpleDBMicrosoft Azure Programming Support
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
29VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Technologies for Network-based systems
Virtual Networking and Virtualization Middleware
Virtual Machines (VMs) - X86 Architecture – Guest OS
Between the VMs and host-platform, one need to deploy
a middleware layer called a “virtual machine monitor
(VMM)”
A native VM installed with the use of a VMM called a
hypervisor in privileged node.
Distributed System Models & Virtualization
30VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Logical Representation
Virtual Servers
Virtual Storage
Virtual Network
Virtual Applications Middleware
Virtual Clients
Cloud Computing : Storage Virtualization
Physical Resources
Virtualization
Storage Virtualization is Technology that makes one set of resources look and feel like another set of resources, preferably with more desirable characteristics …
A logical representation of resources not constrained by Physical limitations
- Hides some of the complexity- Adds or integrates new function with existing services
- Can be nested or applied to multiple layers of a system
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Programming APIs -Integrating with Schedulers & Resource Utilization –Threading on Multi-core CPUs & GPUs
31VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Cloud Storage and Virtualization
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Storage
Cloud storage is of two types : Object based & Block based• Block storage is like a raw disk space on which file system can be
created.
• For both the storage types, backend can be anything like local attached disks, nas, das, iscsi based or fiber channel based storage boxes.
In Cloud the use of these storage :• Mostly Object storage is used for holding Virtual Machine/Virtual
Instances templates & ISOs
• This type can also be used to store the app/user related files.
• Block storage serves the persistent disk space for Virtual Machines/Virtual Instances.
The cloud giants, amazon also provides these options. S3 (Simple Storage Service; Object based storage) and elastic storage (block based storage).
32VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Cloud Storage and Virtualization
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Storage
Cloud storage is of two types : Object based & Block based• Block storage is like a raw disk space on which file system can be created.
• For both the storage types, backend can be anything like local attached disks, nas, das, iscsi based or fiber channel based storage boxes.
In Cloud the use of these storage :• Mostly Object storage is used for holding Virtual Machine/Virtual Instances
templates & ISOs
• This type can also be used to store the app/user related files.
• Block storage serves the persistent disk space for Virtual Machines/Virtual Instances.
The cloud giants, amazon also provides these options. S3 (Simple Storage Service; Object based storage) and elastic storage (block based storage).
33VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Cloud Storage and Virtualization
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Storage : The opensource cloud middle-wares (OpenStack & eucalyptus) For Openstack based implementation, refer these URLS:
High level overview of Object based (Swift) and block based storage (Cinder)
• Service; Object based storage) and elastic storage (block based storage).
34VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Cloud Storage and Virtualization
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Storage : Eucalyptus cloud based implementation, refer these urls :
Overview of object based storage (Walrus) in Eucalyptus clouds: http://www.eucalyptus.com/blog/2013/03/18/eucalyptus-walrus-storage-
considerations [http://www.eucalyptus.com]
few slides on block storage (elastic block storage) in eucalyptus cloud
Storage types in Eucalyptus : http://www.eucalyptus.com/blog/2012/10/03/cloud-storage-types
[www.eucalyptus.com]
35VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Model Description
PVM PVM is a software package that permits a heterogeneous collection
of serial, parallel and vector computers which are connected to a
network to appear as one large computing resource
MPI A library of subprograms that can be called from C or FORTRAN
to write parallel program running on distributed computer systems
MapReduce A web programming model for scalable data processing on large
clusters over large data sets, or in web search operations
Hadoop A software library to write and run large user applications on vast
sets in business applications (http://hadoop .apache.org/core)
Programing on Cloud & Distributed Computing
Parallel and Distributed Programming Models & Tool Sets.
36VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Model Description Features
PVM PVM is a software package that
permits a heterogeneous collection
of serial, parallel and vector
computers which are connected to a
network to appear as one large
computing resource
Parallel Virtual Machine (PVM) uses the message
passing model to allow programmers to exploit
distributed computing across a wide variety of
computer types, including MPPs..PVM transparently
handles all message routing, data conversion, and
task scheduling across a network of incompatible
computer architectures.
MPI A library of subprograms that can be
called from C or FORTRAN to write
parallel program running on
distributed computer systems
Specify synchronous or asynchronous point-to-point
and collective communication commands and I/O
operations in user programs for message-passing
execution
MapReduce A web programming model for
scalable data processing on large
clusters over large data sets, or in
web search operations
Map function generates a set of intermediate
key/value pairs; Reduce function merges all
intermediate values with same key
Hadoop A software library to write and run
large user applications on vast sets
in business applications
(http://hadoop .apache.org/core)
A scalable, economical, efficient, and reliable tool for
providing users with easy access of commercial
clusters
Programing on Cloud & Distributed Computing
Parallel and Distributed Programming Models & Tool Sets.
37VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Motivation :Simplicity of writing parallel programs in an
important metric for parallel and distributed programming
paradigms. Motivations behind parallel and distributed
programming models are :
1. To improve productivity of programmers
2. To decrease program’s time to market
3. To leverage underlying resources more efficiently
4. To increase system throughput, and
5. To Support higher levels of abstraction
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Parallel and Distributed Programming Paradigms
Cloud Prog. & Software Environments
38VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
MapReduce : The software framework abstracts the data
flow of running a parallel program on a distributed
computing system by providing users with two interfaces
in the form of two interfaces : Map and Reduce
Users can override these two functions to interact with
and manipulate the data flow of running their program.
In MapReduce framework, the “value” part of the data
(key, value), is the actual data, and the “key” part is only
used by the MapReduce controller to control the data flow.
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Parallel and Distributed Programming Paradigms
Cloud Prog. & Software Environments
39VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Definition of MapReduce : The MapReduce software
framework provides an abstraction layer with the data flow
and flow of control to users, and hide the implementation
of all data flow steps such as
data partitioning
mapping
Synchronization
Communication
Scheduling
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
40VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
1. MapReduce
2. Hadoop
3. Dryad
Used for Information Retrieval Applications
Traditional Parallel Computing Models : MPI
Recently promoted proposed Parallel and distributed
Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Parallel and Distributed Programming Paradigms
Cloud Prog. & Software Environments
41VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Programing on Cloud and
Distributed Computing
Programming Enviornment
42VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
• P4
• Chameleon
• Parmacs
• TCGMSG
• CHIMP
• NX (Intel i860, Paragon)
• PVM (Got PVM Now ! -Good for Distributed Computing)
• MPI (Got MPI now !. - Good for Large Multi-Processing)
A long history of research efforts in message passing
• MPI is a standard, has a steeper learning curve and doesn’t have a standard way
to start tasks. MPICH does have an “mpirun” command
• If building a new scalable, production code, should use MPI (widely supported
now)
Evaluate the needs of your application then choose (PVM or MPI)
Source : MPI References
43VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Message-Passing Programming Paradigm : Processors are connected
using a message passing interconnection network.
Message Passing Architecture (MPI) Model
COMMUNICATION
NETWORK
P • • • •
M
P
M
P
M
P
M
On most Parallel Systems, the processes involved in the execution of a
parallel program are identified by a sequence of non-negative integers.
If there are p processes executing a program, they will have ranks 0,
1,2, ……., p-1.
Source : MPI References
44VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Minimal Set of MPI Routines
The MPI library contains over 125 routines
But fully functional message-passing programs can be written
using only the following 6 MPI routines
All 6 functions return MPI_SUCCESS upon successful completion,
otherwise return an implementation-defined error code
All MPI routines, data-types and constants are prefixed by MPI_
All of them are defined in mpi.h ( for C/C++)
MPI_Init Initializes MPI
MPI_Finalize Terminates MPI
MPI_Comm_size Determines the number of processes
MPI_Comm_rank Determines the label of the calling process
MPI_Send Sends a message
MPI_Recv Receives a message
Source : MPI References
45VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Starting and Terminating the MPI Library
1.#include <mpi.h>
2.main(int argc, char *argv[ ] )
3. {
4. MPI_Init(&argc, &argv );
5. … … … // do some work
6. MPI_Finalize( );
7. }
Both MPI_Init and MPI_Finalize must be called by all
processes
Command line should be processed only after MPI_Init
No MPI function may be called after MPI_Finalize
46VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Parallel Virtual Machine (PVM) uses the message passing model to allow programmers to exploit distributed computing across a wide variety of computer types, including MPPs.
When a programmer wishes to exploit a collection of networked computers, they may have to contend with several different types of heterogeneity:
• architecture
• data format
• computational speed
• machine load and
• network load.
An Overview Message Passing : Parallel Virtual Machine (PVM)
Source : PVM References
47VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
An Overview Message Passing : Parallel Virtual Machine (PVM)
PVM in Nutshell
Each host (could be an MPP or SMP) runs a PVMD
A collection of PVMDs define a virtual machine
Once configured, tasks can be started (spawned), killed, signaled from a console
Basic message passing
Performance is OK, but API Semantics limit optimizations
48VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Partitioning function
1 2 3
Partitioning function
1 2 3
Partitioning function
1 2 3
Partitioning function
1 2 3
1 2 3
Reduce worker
Map worker
Regions
Use of MapReduce partitioning function to link the Map and Reduce workers.
The Main Responsibility of the MapReduce Framework :
Partitioning function
MapReduce Actual Data and Control Flow :
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
49VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Input Files
Output Files
Map Function
Reduce FunctionFlow of data
Control flow
MapReduce LibraryController
MapReduce software
framework
Abstraction layer
User interfaces
MapReduce framework. Input data flows through the Map and Reduce functions to generate the output result under the control flow using MapReduce software library. Special user interfaces are used to access the Map and Reduce resources.
Parallel and Distributed Programming Paradigms
MapReduce, Twister, and Iterative MapReduce
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Cloud Prog. & Software Environments
50VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Map worker Input split
M M
K1:v k2:v k1:v k3:v K1:v k2:v k3:v k2:v
Sort and group
C
K1:v k2:v k3:v
Partitioning function
k2:v k3:v k1:v
R1 R2
Map worker Input split
M M
Sort and group
C
K1:v k2:v k3:v
Partitioning function
k3:v k5:v k1:v
R1 R2
M
K3:v k5:v
K3:v k5:v K3:v K1:v
Map worker Input split
M
Sort and group
C
K3:v k4:v
Partitioning function
k4:v k3:v
R1 R2
M
K4:v k4:v K3:v k4:v
Input file partitioning
Map function
Combiner
Partitioning
R
k2:v k3:v k3:v k4:v k3:v
Sort and group
k2:v k3:v k3:v k4:v k3:v
Output file
Reduce worker
R
k1:v k5:v k1:v
Sort and group
k1:v, v, v k5:v
Output file
Reduce worker
Synchronization communication
Sorting and grouping
Reduce
Output
R1 R2
Data flow implementation of many functions in the Map workers and in the Reduce workers through multiple sequences of partitioning, combining, synchronization and communication, sorting and grouping, and reduce
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
51VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
User program
Master
Controller
User program
User program
Worker
User program
Worker
User program
Worker
User program
Worker
User program
Worker
Output files
File
1Fi
le 1
(12) Write(11) Reduce
Split
1Sp
lit 2
Split
3
Input files
(4) Read(5) Read
(3) Assign reduce task
(3) Assign map task
(9) Communications
(2) Fork(2) Fork
(2) Fork
Control flow implementation of the MapReduce functionalities in Map workers and Reduce workers (running user programs) from input files to the output files under the control of the master user program.
MapReduce Actual Data and Control Flow :
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
52VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Main Responsibility of the MapReduce Framework :
Summary of the following steps :
How to describe the map workers. ?
MapReduce Actual Data and Control Flow :
Reading the Input data (data distribution)
• Each Map worker reads its corresponding portion of the
input data, namely the input data split and sends it to its
Map function.
• Although a map worker may run more than one Map
function, which means it has been assigned more than one
input data split, each worker is usually assigned one inout
split only
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
53VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The input data to the Map function is in the form of a
(key, value) pair.
The output data from the Map function is structured as
key, value) pairs called intermediate (key value) pairs.
The user-defined Map function processes each input
(key, value) pair and produces a number of (zeros, one
or more) intermediate (key, value) pairs.
The goal is to process all input (key, value) pairs to the
Map functions in parallel.
MapReduce Logical Dataflow
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
54VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Reduce function receives the intermediate (key, value)
pairs in the form of a group of intermediate values associated
with one intermediate key (key, [set of values] ).
The MapReduce framework forms these groups by first
sorting the intermediate (key, value) pairs and then grouping
values with the same key.
Data is sorted to simplify the grouping process.
The Reduce function processes each (key, [set of
values] ) group and produces a set of (key, value) pairs as
output.
MapReduce Logical Dataflow
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
55VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Main Responsibility of the MapReduce Framework : Run a user’s
program on a distributed computing system. Therefore, the MapReduce
framework meticulously handles all
MapReduce Actual Data and Control Flow :
data partitioning
mapping
Synchronization
Communication and
Scheduling
Details of such data flows are required to address overheads and
performance issues from scalability point of view.
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
56VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Main Responsibility of the MapReduce Framework :
Summary of the following steps :
MapReduce Actual Data and Control Flow :
1. Data Partitioning : The MapReduce library splits the input
data (files) already stored in GFS into M pieces that also
correspond to the number of map tasks.
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
57VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Main Responsibility of the MapReduce Framework :
Summary of the following steps :
MapReduce Actual Data and Control Flow :
2. Computation Partitioning : This is implicitly handled (in
the MapReduce framework) by obliging users to write
their programs in the form of the Map and Reduce
functions. Therefore, the MapReduce library only
generates copies of a user program (e.g., by a fork system
call) containing the Map and Reduce functions, distributes
them, and starts them up on a number of available
computation engines.
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
58VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Main Responsibility of the MapReduce Framework :
Summary of the following steps :
MapReduce Actual Data and Control Flow :
3. Determining the master and workers : The MapReduce
architecture is based on a master-worker model.
Therefore, one of the copies of the user program
becomes the master and the rest become workers. The
master picks idle workers, and assigns the Map and
Reduce tasks to them.
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
59VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
The Main Responsibility of the MapReduce Framework :
Summary of the following steps :
MapReduce Actual Data and Control Flow :
3. Determining the master and workers : The MapReduce
architecture is based on a master-worker model.
What is Compute Engine ?
A map/reduce worker is typically a computation engine such
as a cluster node to run map/reduce tasks by executing
Map/Reduce functions.
Cloud Prog. & Software Environments
Parallel and Distributed Programming Paradigms
Source : Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, By Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgan & Kaufmann, Publishers 2012
Source : References given in the presentation
60VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Programing on Cloud and
Distributed Computing
Application Challenges
61VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Application-oriented Opportunities Through
Artificial Intelligence Algorithms
High Performance Comp. for massive graphs
Streaming
Visualization of large Data Sets
Heterogeneous Systems under CLOUD Comp.
• Combined use of Many-core (PARAM YUVA-II) (A
combination of Co-processors & GPU Accelerators)
• Power-aware Computing – Energy Efficient
Where are Opportunities ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
62VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Graphs are pervasive in large-scale data analysis
Astro-Physics
Problem : Outlier Detection of
Observations
Challenges: Massive Data Sets
– Variation - timewise
Graph Problems: Data Mining
Alg. Association & Clusters;
Data Storage handling
Bio-Informatics
Problem : Drug Design
/Protein/Seq. Analysis
Challenges: Massive Data
Sets – Data Heterogeneity
Graph Problems: Data
Mining Alg. Clustering; Fast
Query Searching Alg.
Bio-Informatics
Problem : Discover emergent
communities, Real-time Information
spread & Knowledge recovery
/Challenges: Massive Data Sets – Data
Heterogeneity – New analytics
Graph Problems: Data Mining Alg.
Clustering – Shortest Path, Small /Large
Queries; Data Movement – Irregular
Access
• Graphs are spared through-out in large-scale data analysis• Source of massive data : Petascale simulations, internet, scientific applications
• Challenges : Data Size, heterogeneity, uncertainty, data quality.: Discover
emergent communities, model spread of Information
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
63VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
SST
Precipitation
NPPPressure
SST
Precipitation
NPPPressure
Longitude
Latitude
Timegrid cell zone
...
A key interest is finding connections between the ocean and the land. – Large Computing Power
Ocean and Land Temperature (Jan 1982)
Research Goals: Data Mining
l Find global climate patterns of interest to Earth Scientists
l Parallel Data Mining
l Global snapshots of values for a number of variables on land surfaces or water.
Application: Data Mining – Large Scale Weather Simulation
Social Networks Analysis & Pattern Recognition
Massive Data Computing - BIG DATA ANALYTICS
64VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
FIMIPDE NLP
Level Set
Computer
VisionPhysical
Simulation
(Financial)
Analytics Data Mining
Particle
Filtering
SVM
Classification
SVM
TrainingIPM
(LP, QP)
Fast Marching
Method
K-Means
Index
BenchMonte Carlo
Body
Tracking
Face
DetectionCFD
Face,
Cloth
Rigid
BodyPortfolio
Mgmt
Option
Pricing
Cluster/
Classify
Text
Index
Basic matrix primitives
(dense/sparse, structured/unstructured)
Basic Iterative Solver
(Jacobi, GS, SOR)Direct Solver
(Cholesky)
Krylov Iterative
Solvers (PCG)
Rendering
Global
Illumination
Collision
detectionLCP
Media
Synthesis
Machine
learning
Filter/
transform
Basic geometry primitives
(partitioning structures, primitive tests)
Non-Convex
Method
Source : Intel
Many Core- “Killer Apps of Tomorrow” Parallel Alg. Design
65VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Where are Opportunities ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
Opportunity 1 : Use of Massive Graph Computations in High Performance Computing
• Interactions: Vertices (No.s); Edges (Type of
Interactions); Time Wise variation; Non re-use
• Non-Locality of Data : Distributed cache based systems
• Massive Threads based models may be required
• Data Movement – Storage – Energy Efficiency
A Focused research is required to design and implement new class of algorithms based on Graph Computations
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
66VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
Opportunity 1 : Use of Massive Graph Computations in High Performance Computing
O(Billion) vertices, O(Trillion) edges,
1 Million updates/sec
Challenge
Maintain Analytics, update quickly
• Connected Components
• Agglomerative Clustering
Problem :
• Irregular execution
• Bandwidth and latency bound
• Irregular memory access
• Low-to-no reuse• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
67VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Opportunity 2 : Use of Graph Data Bases
• Efficient Data Structures – Few Billion vertices of Graph – Face Book- contains nodes and relationships; Nodes contain properties (key-value pairs); Relationships are named, directed and always have a start and end node; Relationships can also contain properties
• Data Movement – Storage – Energy Efficiency
A Focused research is required to implement Graph based Computations (FOCUS on GRAPH DATABASES)
Where are Opportunities ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
68VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Going Beyond Social Networks
Graphs are suitable for
capturing arbitrary relations
between the various
elements.
VertexElement
Element’s Attributes
Relation Between
Two Elements
Type Of Relation
Vertex Label
Edge Label
Edge
Data Instance Graph Instance
Relation between
a Set of Elements
Hyper Edge
Provide enormous flexibility for modeling the underlying data as they allow
the modeler to decide on what the elements should be and the type of
69VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Graph Classification
Approach
Discover Frequent
Sub-graphs1 Select Discriminating
Features2
Learn a Classification
Model 4Transform Graphs
in Feature
Representation3
Graph
Databases
• Graphs with multi-dimensional labels
• Stream graphs
– phone-network connections
• Hypergraphs
– compact representation of set relations
• Benchmarks and real-life test cases!
Modeling Data with Graphs
Source : References & Nishith-Pathak, Analyzing Information Flow in Social Networks: Communities, Topics, Cognition and Influence, Doctor of Philosophy Thesis, Department of Computer Science, University of Minnesota, Minneapolis, March 2012
73VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Opportunity 3: Quick Visualization Data –
New tech. for Massive Graph Data
Scientific Computing – Billion of Vertices with multiple
weights proportional to Floating Point Data
Graph Analytics – for Discrete Data – Million to few
Billion Vertices –
Use od Data Mining & Artificial Intelligence Alg.
Context-based or Situation-aware based Computing – Co-
processors & Accelerators
Where are Opportunities ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
74VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Opportunity 4: Combination of using
Heterogeneous– Cloud Computing framework with
Many Core Systems
Solving Data analysis
Combination of Grid Computing & Cloud Computing
Combination of Hadoop & Parallel Programming for
Massively Multi-threading for SNA & BIG DATA
New Algorithms for Storage APIs for Data Science
Where are Opportunities ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
75VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Opportunity 5: Energy Efficiency Computing
Power aware Computing Methodologies (Calculate Power in Mill-watts and Performance of Application Kernels)
Consumption of Power-in-Milliwatts : Understand Floating Point or Memory Intensive Computing
Consumption of Power in Milliwatts: Movement of Data Across different Levels of Storage
New Algorithms for less power consumption – for SNA and BIG-DATA
Where are Opportunities ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
76VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Graph Algorithms - Large Scale Data Bases -
Graph - Data Mining – Association & Clustering
Use Graph Data Bases (Irregular Computations
/Irregular Memory Access)
BIG Data Analytics – Scientific Comp./ Non-
Scientific Comp.) – Data Parallelism Approach
To work on each Opportunity
What is Required ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
77VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
Storage – Data Access in real time & Data
Management - Unstructured /Irregular Data
Storage – HPC Storage (HDF5, Berkeley DB,
PLFS, netCDF, MPI-IO, SWIFT)
Hadoop implementation of MapReduce
To work on each Opportunity
What is Required ?
Massive Data Computing - BIG DATA ANALYTICS
Distributed Commuting - Prog. Paradigms
• Source : References & David Bader ‘s talk on Massive Scale Graph Analytics in HIPC-2012 India by George Bader
:Georgia Tech College of Computing
78VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
An Overview of Distributed Computing and Cloud
Computing
Programming on Cloud and Distributed Computing
Systems
Opportunities for Applications on Disturbed Computing
Conclusions
Cloud and Distributed System Technologies
79VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
1. Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgann & Kaufmann, Publishers 2012
2. Amazon EC2 and S3, Elastic Compute Cloud (EC2) and Simple Scalable Strorage (S3), http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud and http://spatten_presentations.s3.amazonaws.com/s3-pm-rails.pdf
3. F. Berman, G. Fox, T. Hey (Eds.), Grid Computing, Wiley, 2003.4. K. Birman, Reliable Distributed Systems: Technologies, Web Services, and Applications, Springer-
Verlag, 2005.5. G. Boss, et al., Cloud Computing-The BlueCloud Project.
www.ibm.com/developerworks/webspehere/zones/hipods/, October 2007.6. J. Dongarra, et al. (Eds.), Source Book of Parallel Computing, Morgan Kaufmann, San Francisco, 2003.7. V. K. Garg, Elements of Distributed Computing, Wiley-IEEE Press, 2002.8. K. Hwang, Advanced Computer Architecture: Parallelsim, Scalability, Programming, McGraw-Hill,
1993.9. K. Hwang, Z. Xu, Scalable Parallel Computing, McGraw-Hill, 1998.10. NVIDIA Corp. Kepler: NVIDIA’s Next-Generation CUDA Compute Architecture, White paper, 2013.11. I. Taylor, From P2P to Web Sertives and grids, Springer-Verlag, London 2005.12. Twister, Open Source Software for Iterative MapReduce, http://www.iterativemapreduce.org/.13. Wikipedia, Internet of Things, http://en.wikipedia.org/wiki/Internet_of_Things, June 2010.14. Wikipedia, CUDA, http://en.wikipedia.org/wiki/CUDA, March 2011.15. Wikipedia. TOP500, http://en.wikipedia.org/wiki/TOP500, February 2011.
80VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
15. D. Bader, R. Pennington, Cluster computing applications, Int. J. High Perform. Comput. (May) (2001).16. R. Buyya (Ed.), High-Performance Cluster Computing. Vols. 1 and 2, Prentice Hall, New Jersey, 1999.17. J. Dongarra, Survey of present and future supercomputer architectures and their interconnects, in:
International Supercomputer Conference, Heidelberg, Germany 2004.18. Wikipedia, CUDA. http://en.wikipedia.org/wiki/CUDA, 2011, (accessed 19.02.2011).19. R. Buyya, J. Broberg, A. Goscinski (Eds.), Cloud Computing; Principles and Paradigms, Wiley Press,
New York, 2011.20. K. Hwang, D. Li, Trusted cloud computing with secure resources and data coloring, IEEE Internet
Comput., (September/October) (2010) 30-39.21. D. Meyer, et. al., Parallax: Virtual disks for virtual machines, in: Proceedings of EuroSys, 2008.22. L. Shri, H. Chen, J. Sun, vCUDA: GPU accelerated high performance computing in virtual machines, in:
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, 2009.23. J. Smith, R. Nair, The architecture of virtual machines, IEEE Comput., (May) (2005).24. J. Smith, R. Nair, Virtual Machines: Versatile Platforms for Systems and Processes, Morgan Kaufmann,
2006.25. R. Ublig, et al. Intel virtualization technology, IEEE Comput., (May) (2005).26. VMware (white paper). Understanding Full Virtualization, Paravirtualization, and Hardware Assist,
www.vmware.com/files/pdf/VMware_paravirtualization.pdf.27. J. Walters, et al., A Comparison of virtualization technologies for HPC, in: Proceedings of Advanced
Information Networking and Applications (AINA), 2008. 28. K. Aberer, Z. Depotovic, Managing trust in a peer-to-peer information system, in: ACM CIKM
International Conference on Information and Knowledge Management, 2001.29. Amazon EC2 and S3, Elastic Compute Cloud (EC2) and Simple Scalable Storage (S3).
81VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
30. M. Armbrust, A. Fox, R. Griffifth, et al., Above the Clouds: A Berkeley View of Cloud Computing, Technical Report No. UCB/EECS-2009-28, University of California at Berkeley, 10 February 2009
31. I. Arutyun, et al., Open circus: a globalcloud computing testbed, IEEE Comput. Mag. (2010) 35-43.32. G. Boss, P. Mllladi, et al., Cloud Computing: the bluecloud project.www.ibn.com/developerworks/
websphere/zones/hipods/, 2007.33. R. Buyya, CS Yeo, S. Venugopal, Market-oriented cloud computing: vision, hype, and reality for delivering IT
services as a computing utilities, in: Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications (HPCC), Dalian, China, 25-27 September 2008.
34. I. Foster, The grid: computing without bounds, Sci. Am. 288 (4) (2003) 78-85.35. V. Jinesh, Cloud Architectures, White paper, Amazon. http://aws.amazon.com/about-aws/whats-
new/2008/07/16/cloud-architectures-white-paper/.36. W. Norman, M. Paton, T. de Aragao, et al., Optimizing utility in cloud computing through autonomous
workload execution, in: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2009.
37. S. Roschke, F. Cheng, C. Meinel, Intrusion detection in the cloud, in: IEEE International Conference on Dependable. Autonomic, and Secure Computing (DASC 09), 13 December 2009.
38. J. Rittinghouse, J. Ransome, Cloud Computing: Implementation, Management, and Security, CRC Publishers 2010.
39. Salesforce.com, http://en/wikipedia.org/wikiSalesforce.com/,2010.40. VMware, Inc., Migrating Virtual Machines with Zero Downtime, www.vmware.com/, 2010 (accessed 07).41. Wikipedia Cloud computing, http://en.wikipedia.org/wiki/Cloud_computing, 2010 (accessed 26.01.10)42. Wikipedia, Data center, http://en.wikipedia.org/wiki/Data_center, 2010 (accessed 26.01.10)
82VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
43. M. Hadley, Web Application Description Language (WADL), W3C Member Submission, www.w3.org/Submission/wadl/, 2009 (accessed 18.10.10).
44. I. Foster, H. Kishimoto, A. Savva, et al., The open grid services architecture version 1.5, Open Grid Forum, GFD.80, www.ogf.org/documents/GFD.80.pdf, 2006.
45. G. Fox, D. Gannon, A Survey of the Role and Use of Web Services and Service Oriented Architectures in Scientific/Technical Grids, http://grids.ucs.indiana.edu/ptliupages/publications/ReviewofServices and Workflow-IU-Aug2006B.pdf, 2006 (accessed 16.10.10)
46. Apache ActiveMQ open source messaging system, http://activemq-apache.org/.47. NaradaBrokering open source content distribution infrastructure, ww.naradabrokering.org/.48. Amazon Simple Queue Service (Amazon SQS), http://aws.amazon.com/sqs/.49. Microsoft Azure Queues, http://msdn.microsoft.com/en-us/windowsazure/ff635854.aspx.50. N. Wilins-Diehr, Special issue: Science gateways – common community interfaces to grid resources Concurr.
Comput. Pract, Exper. 19 (6) (2007) 743-749.51. D. Thain, T. Tannenbaum, M. Livny, Distributed computing in practice: the condor experience, Concurr, Pract.
Exper. 17 (2-4) (2005) 323-356.52. Open Grid Computing Environments web site, www.collab-pgce.org (accessed 18.10.10).53. E. Deelman, D. Gannon, M. Shields, I. Taylor, Workflows and e-Science: an overview of workflow system
features and capabilities, Future Generation Comp. Syst. 25 (5) (2009) 528-540, doi: http://dx.doi.org/10.1016/j.future.2008.06.012
54. R. Allen, Workflow: An Introduction, Workflow Handbook. Workflow Management Coalition, 2001.55. Kepler Open Xource Scientific Workflow System, http://kepler-project.org.56. H. Gadgil, G. Fox, S. Pallickara, M. Pierce, Managing grid messaging middleware, in: Challenges of Large
Applications in Distributed Engironments (CLADE), 2006, pp. 83-91.57. Condor home page, www.cs.wisc.edu/condor/.
83VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
58. Microsoft, Project Trident: A Scientific Workflow Workbench, http//research.microsoft.com/en-us/collaboration/tools/trident.aspx, 2010.
59. Zend PHP Company, The Simple Cloud API for Storage, Queues and Table, http://www.simplecloug.org/ home, 2010.
60. NOSQL Movement, Wikipedia list of resources, http://en.wikipedia.org/wiki/NosQL, 2010.61. NoSQL Link Archive, LIST OF NOSQL DATABASES, http://nosql-database.org/, 2010.62. Amazon, Welcome to Amazon SimpleDB, http://docs.amazonwebservices.com/AmazonSimpleDB/latest/
DevelperGuide/index.html, 2010.63. Apache, The CouchDB document-oriented database project, http://couchdb.apache.org/, 2010.64. Pig! Platform for analyzing large data sets, http://hadoop.apache.org/pig/, 2010.65. A. Grama, G Karypis, V. Kumar, A. Gupta, Introduction to Parallel Computing, second ed., Addison Wesley,
2003.66. S. Hariri, M. Parashar, Tools and Environments for Parallel and Distributed Computing, Series on Parallel and
Distributed Computing, Wiley, 2004, ISBN: 978-0471332886.67. L. Silva, Rajkumar. Buyya, Parallel Programming Models and paradigms, (2007).68. G. Fox, MPI and MapReduce, in: Clusters, Clouds, and Grids for Scientific Computing CCGSC, Flat Rock, NC,
http://grids.ucs.indiana.edu/ptliupages/presentations/CCGSC-Sept8-2010.pptx, 8 September 2010.69. J. Ekanayake, X. Qiu T. Gunarathne, S. Beason, G. Fox, High Performance Parallel Computing with Clouds and
Cloud Technologies, Cloud Computing and Software Services: Theory and Technologies, CRC Press (Taylor and Francis), 2010.
70. Wikipedia, MapReduce, http://en.wikipedia.org/wiki/MapReduce, 2010 (accessed 06.11.10).71. R. Lammel, Google’s MapReduce programming model – Revisited, Sci. Comput. Prog. 68 (3) (2007) 208-237.72. S. Ghemawat, H. Gobioff, S. Leung, The Google File System, in: 19th ACM Symposium on Operating Systems
Principles, 2003, pp.20-43.73. Google, Introduction to Parallel Programming and MapReduce, http://code.ggogle.com/edu/parallel/
84VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
74. SALSA Group, Iterative MapReduce, http://www,iterativemapreduce.org/, 2010.75. T. White, Hadoop: The Definitive Guide, Second ed., Yahoo Press, 2010.76. Apache, HDFS Overview, http://hadoop.apache.org/hdfs/, 2010.77. Apache! Pig! (part of Hadoop), http://pig.apache.org/hdfs/, 2010.78. G. C. Fox, A. Ho, E, Chan, W. Wang, Measured characteristics of distributed and cloud computing
infrastructure for message-based collaboration applications, in: Proceedings of the 2009 International Symposium on Collaborative Technologies and Systems, IEEE Computer Society, 2009, pp 465-467.
79. Eucalyptus LLC, White Papers. http://www.eucalyptus.com/whitepapers.80. Nimbus, Cloud computing for science, http://www.nimbusproject.org, 2010.81. Nimbus, Frequently Asked Questions, http://www.nimbusproject.org/docs/current/faq.html, 2010.82. Amazon, Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2.83. Sector/Sphere, High Performance Distributed File System and Parallel Data Processing Engine.
http://sector.sourceforge.net84. Open Stack, Open Source, Open Standards Cloud, http://openstack.org/index.php, 2010.85. SALSA Group, Catalog of Cloud Material, http://salsaphc.indiana.edu/content/cloud-materials, 2010.86. Microsoft Research, Cloud Futures Workshop, http://research.microsoft.com/en-us/events/
cloudfutures2010/default.aspx, 2010.87. P. Chaganti, Cloud computing with Amazon Web Services, Part 1: Introduction – When it’s smarter to rent
than to buy, http://www.ibm.com/developerworks/architecture/library/ar-cloudaes1/, 2008.88. Cloud computing with Amazon Web Services, Part 2: Storage in the cloud with Amazon Simple Storage
Service (S3) – Reliable, flexible, and inexpensive storage and retrieval of your data, http://www.ibm.com/developerworks/architecture/library/ar-cloudaws2/, 2008.
89. P. Changanti, Cloud computing with Amazon Web Services, Part 3: Servers on demand with EC2, http:// www.ibm.com/developerworks/architecture/library/ar-cloudaws3/, 2008.
90. G. Fox, S. Bae, J. Ekanayake, X. Qiu, H. Yuan, Parallel Data Mining from Multicore to Cloudy Grids, book chapter of High Speed and Large Scale Scientific Computing, IOS Press, Amsterdam, 2009 http://grids.ucs.indiana.edu/ptilupages/publications.CetratoWriteupJune11-09. pdf.
85VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
91. W. Allcock, J. Bresnahan, R. Kettimuthu, et al., The globus stripped gridFTP framework and server, in: Proceedings of the ACM/IEEE Conference on Supercomputing, 2005.
92. R. Aydt, D. Gunter, W. Smith, et al., A Grid Monioring Architecture, Global Grid Forum Performance Working Group, 2002.
93. Raj Kumar Buyya,. Bubendorfer (Eds.), Market Oriented Grid and Utility Computing, John Wiley & Sons, 2009.
94. Dongarra, I. Fister, G. Fox, et al., Sourcebook of Parallel Computing, Morgan Kaufman Publishers, 2002.
95. L. Ferreira, et al., Grid Computing in Research and Education, (http://www.redbooks.ibm.com/abstracts/sg246895.html?Open )
96. L. Ferreira, et al., Grid Computing in Research and Education, (http://www.redbooks.ibm.com/abstracts/sg246649.html?Open )
97. I. Foster, C. Kesselman, S. Tuecke, The anatomy of the grid: enabling scalable virtual organizations, Int. J High. Perform. Comput. Appl. 15 (3) (2001) 200.
98. I. Foster, Globus toolkit version 4: software for service-oriented systems, J. Comput. Sci. Technol, 21 (4) (2006) 513-520.
99. H. Jin. Challenges of grid computing, Advances in Web-Age Information Management. Lecture Notes in Computer Science, 3739 (2005) 25-31.
100. I Taylor, From P2P to Web Services and Grids, Springer-Verlag, London, 2005.101. D. Thain, T. Tannenbaum, M. Livny, Distributed computing in practice: the condor experience,
86VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
103. S. Androutsellis, D. Spoinellis, A survey of P2P content distribution technologies. ACM,. Comput. Surv. (December) (2004).
104. J. Buford, H. Yu, E. Lua, P2P Networking and Applications, Morgan Kayfmann, December 2008. Also www. P2pna.com.
105. X. Cheng, J. Liu, NetTube: exploring social networks for peer-to-peer short video sharing, in: Proceedings of IEEE Inforcom, March 2009.
106. P. B. Godfrey, S. Shenker, I. Stoica, Minimizing churn in distributed systems. in Proceedings of ACM SIGCOMM, 2006.
107. K. Ross, D. Rubenstein, Peer-to-peer systems in: IEEE Inforcom, Hong Kong, 2004, (Tutorial slides).
108. M. Ambrust, A. Fox R. Giffith, et al., Above the Clouds: A Berkeley View of Cloud Computing, Technical Report No. UCB/CCES-2009-28, University of California at Berkeley, 10 February 2009.
109. J. Ekanayke, T. Gunarathne, J. Qiu, Cloud technologies for bioinformatics applications, in: IEEE Transactions on Parallel and Distributed Systems, accepted to appear, http://grids.ucs.indiana.edu/ptliupages/publications/BioCloud_TPDS_Journal_Jan4_2010.pdf, 2011)
110. S. Garfinkel, Commodity grid computing with Amazon’s S3 and EC2, Login 32 (1) (2007) 7-13111. Grid ‘5000 and ALLADIN-G5K: An infrastructure distributed in 9 sites around France, for research in
large-scale parallel and distributed systems. http://www.grid5000.fr/mediawiki/index.php/Grid5000: Home, (accessed 20.11.10).
112. R. Grossman, Y. Gu, M. Sabala, et al., The open cloud testbed: Supporting open source cloud computing systems based on large scale high performance, in: A. Doulamis, et al., (Eds.), DynamicNetwork Services, Springer, Berlin Heidelberg, 2010, pp. 89-97.
113. J. Rittinghouse, J. Ransome, Cloud Computing: Implementation, Management and Security, CRC Publisher, 2010.
87VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
114. H. Song, Exploring Facebook and Twitter Technologies for P2P Social Networking, in: EE 657 Final Project Report, University of Sothern California, May 2010.
115. E. Walker, Benchmarking Amazon EC2 for high-performance scientific computing, Login 33 (5) (2008) page 18-23.
116. M. Weng, A Multimedia Social Networking Community for Mobile Devices, Tisch School of The Arts, New York University, 2007.
117. M. Yigitbasi, A. losup, D. Epema, C-Meter: a framework for performance analysis of computing clouds, in: International Workshop on Cloud Computing, May 2009.
118. PVM Parallel Virtual Machine – A User’s Guide and Tutorial For Networked Parallel Computing Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Mancheck, Vaidy Sunderam, The MIT Press Cambrdge, Massachusetts, London England, 1994
119. pvmhome http://www.csm.ornl.gov/pvm/pvm_home.htmloo120. David Browning, “Embarrassingly Parallel Benchmark under PVM”, Computer Science Corp., NASA Ames
Research Labs.121. J.Dongara et al “ PVM- experiences, current status and Future Directions”, ORNL- Tenessee.122. Building Linux Clusters” – O-Reilly123. A Performance Comparison of DSM, PVM, and MPI, Paul Werstein, Mark Pethick, Zhiyi Huang, University
88VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
124. Andrews, Grogory R. (2000), Foundations of Multithreaded, Parallel, and DistributedProgramming, Boston, MA : Addison-Wesley
125. Butenhof, David R (1997), Programming with POSIX Threads , Boston, MA : Addison WesleyProfessional
126. Culler, David E., Jaswinder Pal Singh (1999), Parallel Computer Architecture - AHardware/Software Approach , San Francsico, CA : Morgan Kaufmann
127. Grama Ananth, Anshul Gupts, George Karypis and Vipin Kumar (2003), Introduction to Parallelcomputing, Boston, MA : Addison-Wesley
128. Intel Corporation, (2003), Intel Hyper-Threading Technology, Technical User's Guide, Santa ClaraCA : Intel Corporation Available at : http://www.intel.com
129. Shameem Akhter, Jason Roberts (April 2006), Multi-Core Programming - Increasing Performancethrough Software Multi-threading , Intel PRESS, Intel Corporation,
130. Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell (1996), Pthread Programming O'Reillyand Associates, Newton, MA 02164,
131. James Reinders, Intel Threading Building Blocks – (2007) , O’REILLY series132. Laurence T Yang & Minyi Guo (Editors), (2006) High Performance Computing - Paradigm and
Infrastructure Wiley Series on Parallel and Distributed computing, Albert Y. Zomaya, Series Editor133. Intel Threading Methodology ; Principles and Practices Version 2.0 copy right (March 2003), Intel
Corporation134. William Gropp, Ewing Lusk, Rajeev Thakur (1999), Using MPI-2, Advanced Features of the
Message-Passing Interface, The MIT Press..135. Pacheco S. Peter, (1992), Parallel Programming with MPI, , University of Sanfrancisco, Morgan
Programming), McGraw Hill New York.137. Michael J. Quinn (2004), Parallel Programming in C with MPI and OpenMP McGraw-Hill
International Editions, Computer Science Series, McGraw-Hill, Inc. Newyork138. Andrews, Grogory R. (2000), Foundations of Multithreaded, Parallel, and Distributed
89VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
139. SunSoft. Solaris multithreaded programming guide. SunSoft Press, Mountainview, CA, (1996),Zomaya, editor. Parallel and Distributed Computing Handbook. McGraw-Hill,
140. Chandra, Rohit, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon,(2001),Parallel Programming in OpenMP San Fracncisco Moraan Kaufmann
141. S.Kieriman, D.Shah, and B.Smaalders (1995), Programming with Threads, SunSoft Press,Mountainview, CA. 1995
142. Mattson Tim, (2002), Nuts and Bolts of multi-threaded Programming Santa Clara, CA : IntelCorporation, Available at : http://www.intel.com
143. I. Foster (1995, Designing and Building Parallel Programs ; Concepts and tools for ParallelSoftware Engineering, Addison-Wesley (1995
144. J.Dongarra, I.S. Duff, D. Sorensen, and H.V.Vorst (1999), Numerical Linear Algebra for HighPerformance Computers (Software, Environments, Tools) SIAM, 1999
145. OpenMP C and C++ Application Program Interface, Version 1.0". (1998), OpenMP ArchitectureReview Board. October 1998
146. D. A. Lewine. Posix Programmer's Guide: (1991), Writing Portable Unix Programs with the Posix.1 Standard. O'Reilly & Associates, 1991
147. Emery D. Berger, Kathryn S McKinley, Robert D Blumofe, Paul R.Wilson, Hoard : A ScalableMemory Allocator for Multi-threaded Applications ; The Ninth International Conference onArchitectural Support for Programming Languages and Operating Systems (ASPLOS-IX).Cambridge, MA, November (2000). Web site URL : http://www.hoard.org/
148. Marc Snir, Steve Otto, Steyen Huss-Lederman, David Walker and Jack Dongarra, (1998) MPI-TheComplete Reference: Volume 1, The MPI Core, second edition [MCMPI-07].
149. William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, WilliamSaphir, and Marc Snir (1998) MPI-The Complete Reference: Volume 2, The MPI-2 Extensions
150. A. Zomaya, editor. Parallel and Distributed Computing Handbook. McGraw-Hill, (1996)151. OpenMP C and C++ Application Program Interface, Version 2.5 (May 2005)”, From the OpenMP
web site, URL : http://www.openmp.org/152. Stokes, Jon 2002 Introduction to Multithreading, Super-threading and Hyper threading Ars
Technica, October (2002)153. http://www.cdac.in/opecg2009/
90VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
154.Stokes, Jon 2002 Introduction to Multithreading, Super-threading and Hyper threading Ars Technica,October (2002)
155.Andrews Gregory R. 2000, Foundations of Multi-threaded, Parallel and Distributed Programming,Boston MA : Addison – Wesley (2000)
156.Deborah T. Marr , Frank Binns, David L. Hill, Glenn Hinton, David A Koufaty, J . Alan Miller, MichaelUpton, “Hyperthreading, Technology Architecture and Microarchitecture”, Intel (2000-01)
157.http://www.erc.msstate.edu/mpi/158.http://www.arc.unm.edu/workshop/mpi/mpi.html159.http://www.mcs.anl.gov/mpi/mpich160.The MPI home page, with links to specifications for MPI-1 and MPI-2 standards :
http://www.mpi–forum.org161.Hybrid Programming Working Group Proposals, Argonne National Laboratory, Chiacago (2007-2008)162.TRAC Link : https://svn.mpi-forum.org/trac/mpi-form-web/wiki/MPI3Hybrid163.Threads and MPI Software, Intel Software Products and Services 2008 - 2009164.Sun MPI 3.0 Guide November 2007165.Treating threads as MPI processes thru Registration/deregistration –Intel Software Products and
167. The MPI home page, with links to specifications for MPI-1 and MPI-2 standards : http://www.mpi–forum.org
168. Hybrid Programming Working Group Proposals, Argonne National Laboratory, Chiacago (2007-2008)169. TRAC Link : https://svn.mpi-forum.org/trac/mpi-form-web/wiki/MPI3Hybrid170. Threads and MPI Software, Intel Software Products and Services 2008 - 2009171. Sun MPI 3.0 Guide November 2007
173. Intel MPI library 3.2 - http://www.hearne.com.au/products/Intelcluster/edition/mpi/663/174. http://www.cdac.in/opecg2009/175. PGI Compilers http://www.pgi.com176. David A. Bader, Massive-Scale Graph Analytics, Georgia Tech College of Computing, Department of
Computational Science and Engineering, HiPC-2012, Pune, India 177. D.A. Bader, G. Cong, and J. Feo, “On the Architectural Requirements for Efficient Execution of Graph
Algorithms,” The 34th International Conference on Parallel Processing (ICPP 2005), pp. 547-556, Georg Sverdrups House, University of Oslo, Norway, June 14-17, 2005.
178. D.A. Bader and K. Madduri, “Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors,” The 12th International Conference on High Performance Computing (HiPC 2005), D.A. Bader et al., (eds.), Springer-Verlag LNCS 3769, 465-476, Goa, India, December 2005.
179. Kamesh Madduri, David A. Bader, Jonathan W. Berry, Joseph R. Crobak, and Bruce A. Hendrickson, “Multithreaded Algorithms for Processing Massive Graphs,” in D.A. Bader, editor, PetascaleComputing: Algorithms and Applications, Chapman & Hall / CRC Press, Chapter 12, 2007.
180. Cloud Applications Architectures Building Applications and Infrastructure in the Cloud – George Reese O-REILLY June 2011
181. Cloud Computing Bible Barrie Sosinsky, Wiley India Ltd, 2011182. Cloud Computing Explained John Rhoton, British Library Cataloguing in Publication Data, 2009183. Distributed and Cloud Computing Book, From Parallel Processing to the Internet of Things, by Kai
Hwang, Geoffrey C. Fox, Jack J. Dongarra, Morgann & Kaufmann, Publishers 2012 180. Cloud Computing Infrastructure http://www.ibm.com181. Cloud Computing – Virtualization Made Easy http://www.cisco.com/182. Cloud Computing Infrastructure http://www.softlyaer.com183. Microsoft Cloud OS : http://www.microsoft.com184. Amazon EC2 Cloud – Discover Windows Azure Storage http://www.windowsazure.com
92VCV.Rao. C-DAC, Pune National Workshop- BIDA-2014 at CRRAO AIMSCS @ UoH August 22-24, 2014
185. Programming Windows Azure http://www.amazon.com186. App Engine — Google Cloud Platform https://cloud.google.com/appengine187. Google App Engine – Google Developers; https://developers.google.com/appengine/188. Oracle NoSQL Database Architecture & Implementation http://www.oracle.com/NOSQLDatabase189. NoSQL - MangoDB www.mongodb.com190. Cloud Computing Meghdoot : http://www.cdac.in191. High level overview of Object based (Swift) and block based storage http://www.openstack.org/software/openstack-storage/192. Storage types in Eucalyptus : http://www.eucalyptus.com/193. Graph Partitioning Software – METICS /ParMETIS /HMETIS : http://www.cs.umn.edu/~karypis194. Graph Partitioning Software – METICS /ParMETIS /HMETIS http://www.cs.umn.edu/~kumar 195. Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. 2009. Social influence analysis in large-scale networks. In Proceedings of the
15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘09). ACM, New York, NY, USA, 807-816
196. Nishith-Pathak, Analyzing Information Flow in Social Networks: Communities, Topics, Cognition and Influence, Doctor of Philosophy Thesis, Department of Computer Science, University of Minnesota, Minneapolis, March 2012
197. Prof. Jaideep Srivastava ; http:/www.cs.umn.edu/~jaideep198. G.Karypis, V. Kumar, and Vipin Kumar, Multilevel k-way partitioning scheme for irregular graphs, Jorunal of Parallel and
Distributed Computing, 48:96-129, 1998199. The Anatomy of the Grid Enabling Scalable Virtual Organizations by Ian Foster, Carl Kesselman & etc..200. Foster, I. Internet Computing and the Emerging Grid. Nature Web Matters, 2000.
http://www.nature.com/nature/webmatters/grid/grid.html201. Foster, I. and Karonis, N. A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems. In Proc.
SC'98, 1998.202. The Globus CoG Home Page. http://www.globus.org/cog.203. Globus. http://www.globus.org.204. The NASA Information Power Grid Home Page. http://www.ipg.nasa.gov.205. TeraGrid; http://www.teragrid.org/