Top Banner
BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus
44

BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

Dec 18, 2015

Download

Documents

Brian Holt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS PilaniHyderabad Campus

BITS Pilani presentationD. PowarLecturer,

BITS-Pilani, Hyderabad Campus

Page 2: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS PilaniHyderabad Campus

SSZG527

Lecture 18

Cloud Computing

Page 3: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Lectures

Lecture No Objectives

Lecture 10 Capacity management

Lecture 11 Introduction to PAAS (Drupal, Wolf frameworks, force.com), 5 Principles of UI Design by AWS: MADPO Principles

Lecture 12 RAID (Redundant Array of Independent Disks)

Lecture 13 MapReduce - distributed programming frame work, Pig, Hive

Lecture 14 Distributed File System (GFS,HDFS), cloud storage

Lecture 15 Multi-Tenancy, 4 levels multi-tenancy

Lecture 16 Cloud security

Lecture 17 OpenStack – a cloud computing operating system

Page 4: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

MapReduce

Page 5: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Map:– Accepts input key/value pair– Emits intermediate key/value pair

Reduce – Accepts intermediate key/value* pair– Emits output key/value pair

Map+Reduce

Very big

dataResult

MAP

REDUCE

Page 6: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Data type: key-value records

Map function:

(Kin, Vin) list(Kinter, Vinter)

Reduce function:

(Kinter, list(Vinter)) list(Kout, Vout)

MapReduce Programming Model

Page 7: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

let map(k,v) =emit (k.toUpper(), v.toUpper() )– (“foo”, “bar”) -> (“FOO”,”BAR”)– (“key2”,”data”) -> (“KEY2”,”DATA”)

let map(k,v)= foreach char c in v :emit (k,c)– (“A”,”cats”)->(“A”,”c”),(“A”,”a”),(“A”,”t”),(“A”,”s”)– (“B”,”hi”) ->(“B”,”h”), (“B”,”i”)

let map(k,v)= if (isPrime(v)) then emit (k,v)– (“foo”,7) -> (“foo”,7)– (“test”,10) -> (nothing)

let map(k,v)= emit(v.length,v)– (“hi”,”test”)->(4,”test”)– (“x”,”quux”) ->(4,”quux”)

Examples

Page 8: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Example: Word Count

def mapper(line): foreach word in line.split(): output(word, 1)

def reducer(key, values): output(key, sum(values))

Page 9: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Word Count Execution

the quickbrown

fox

the fox ate the mouse

how now

brown cow

Map

Map

Map

Reduce

Reduce

brown, 2

fox, 2how, 1now, 1the, 3

ate, 1cow, 1mouse,

1quick, 1

the, 1brown, 1

fox, 1

quick, 1

the, 1fox, 1the, 1

how, 1now, 1

brown, 1ate, 1

mouse, 1

cow, 1

Input Map Shuffle & Sort Reduce Output

Page 10: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

http://hadoop.apache.org/docs/stable/mapred_tutorial.html

http://wiki.apache.org/hadoop/WordCount

Word Count example code (java)

Page 11: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Distributed File Systems

Page 12: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

GFS stores a huge number of files, totaling many terabytes of data

Individual file characteristics– Very large, multiple gigabytes per file– Files are updated by appending new entries to the

end (faster than overwriting existing data)– Files are virtually never modified (other than by

appends) and virtually never deleted.– Files are mostly read-only

The Google File System

Page 13: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Divide files in large 64 MB chunks, and distribute/replicate chunks across many servers.

A couple of important details:– The master maintains only a (file name, chunk server) table in main memory:

minimal I/O– Files are replicated using a primary-backup scheme; the master is kept out of the

loop

Google File System

Page 14: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster.

It is inspired by the Google File System. Hadoop DFS stores each file as a sequence of

blocks, all blocks in a file except the last block are the same size.

Blocks belonging to a file are replicated for fault tolerance. The block size and replication factor are configurable per file. Files in HDFS are "write once" and have strictly one writer at any time.

HDFC??

Page 15: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Hadoop Distributed File System – Goals:• Store large data sets• Cope with hardware failure• Emphasize streaming data access

Page 16: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Terminology differences:– GFS master = Hadoop namenode– GFS chunkservers = Hadoop datanodes

Functional differences:– No file appends in HDFS (planned feature)– HDFS performance is (likely) slower

From GFS to HDFS

Page 17: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad CampusAdapted from (Ghemawat et al., SOSP 2003)

(file name, block id)

(block id, block location)

instructions to datanode

datanode state(block id, byte range)

block data

HDFS namenode

HDFS datanode

Linux file system

HDFS datanode

Linux file system

File namespace/foo/bar

block 3df2

Application

HDFS Client

HDFS Architecture

Page 18: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Managing the file system namespace:– Holds file/directory structure, metadata, file-to-

block mapping, access permissions, etc. Coordinating file operations:

– Directs clients to datanodes for reads and writes– No data is moved through the namenode

Maintaining overall health:– Periodic communication with the datanodes– Garbage collection

Namenode Responsibilities

Page 19: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Cloud storage is a model of networked online storage where data is stored in virtualized pools of storage

Companies operate large data centers, and people who require their data to be hosted, buy or lease storage capacity from them

Cloud storage services may be accessed through a web service application programming interface (API), a cloud storage gateway or through a Web-based user interface

It is difficult to pin down a canonical definition of cloud storage architecture, but object storage is reasonably analogous

Cloud???

Page 20: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Multi-tenanancy

Page 21: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

1. ad-hoc /custom

2. configurable single tenant

3. configurable multi tenant

4. configurable multi tenant (scalable)

basic SaaS maturity model

Page 22: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Each customer has their own custom vision of the software

Represents a enterprise data center where there are multiple instances and versions of the software

Each customer would have their own binaries, as well as their own dedicated processes for implementation of the application

Disadv: Difficulty in Management: Each customer would need their own management support

Ad-hoc /customizable instances

Page 23: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

All customers share the same vision of the software (one copy for each customer)

adv: Easy Management: Single copy of the software

Configurable instances

Page 24: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

All customers share the same version of the software (only single copy among all customers)

adv: Easy Management: running of only single instance

Configurable multi-tenant efficient instances

Page 25: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

All customers share the same version of the software (only single copy among all customers)

Software is hosted on a cluster of computers Hence, allows the capacity of the system to

scale almost limitlessly Thus, increase in no. of customers and capacity

as well Ex: Gmail, yahoo mail, etc Disadv: Shared storage problem

Configurable multi-tenant efficient instances (scalable)

Page 26: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

share isolate

vs

business model (can I monetise?)architectural model (can I do it?)operational model (can I guarantee SLAs?)

Page 27: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

access control

meta-data

Page 28: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Unlike traditional computer systems, the tenant would specify the valid users, and cloud service provider would authenticate them

Two basic approaches are used Centralized authentication Decentralized authentication

Authentication

Page 29: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Centralized authentication: Authentication is performed using a centralized user database Cloud admin gives the tenant admin rights to manage user

accounts for that tenant Multiple (two) sign-on service Given self service nature of the cloud, it is more generally

used

Decentralized authentication: Each tenant maintains their own user database, and needs to

deploy a federation service that interface between that tenant’s authentication framework and the cloud system’s authentication service

Single sign-on service

Authentication (contd..)

Page 30: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Two major resource that need to be shared are storage and servers

Sharing storage resources (two types) File system Databases

Since file system storage is well known mechanism, we will restrict our discussion to database storage

Resource sharing

Page 31: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

There are two methods of sharing data in a single database Dedicated tables per tenant Shared table

Dedicated tables per tenant: Each tenant stores their data in a separate set of tables

different from other tenants ex: www.mygarage.com portal Shows the way auto repair stores may store each table

as separate file

Database

Page 32: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Dedicated tables per tenant:

Car license Service Cost

Car license Service Cost

Car license Service Cost

Best garage

Friendly garage

Honest garage

Page 33: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

The data for all the tenant is stored in the same table in different rows.

One of the column in the table identifies a tenant to which a particular row belongs

It is more space efficient than previous approachA auxiliary table, called a metadata table, stores

information about the tenants

Shared table:

Page 34: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Shared table (contd..)

Tenant ID Car license Repair Cost

1

2

2

1

3

2

Data table 1

Tenant ID Data

1 Best garage

2 Friendly garage

3 Honest garage

Metadata table 1

Page 35: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

It is important for the cloud infrastructure to support customization of the stored data, since it is likely that different tenants may want store different data in their tables

In Dedicated table method, each tenant has their own table, and therefore can have different schema

Difficulty is with shared table approach Three method used

Pre-allocated columns Name-value pair XML method

Data customization

Page 36: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Space is reserved in the tables for custom columns, which can be used by tenants for defining new columns

Salesforce.com reserves 500 columnsSome of the tenants may not use these columns

Disadv: There could be a lot of wasted space

Pre-allocated columns

Page 37: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Pre-allocated columns

Tenant ID Car license Service Cost Custom1 Custom2

1

2

2

1

3

2

Data table 1

Tenant ID Tenant name Custom1 name Custom1 type

1 Best garage Service rating int

2 Friendly garage Service manager string

3 Honest garage

Metadata table 1

Page 38: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

The standard table will have an extra column which is a pointer to a table of name-value pair, which indicates additional custom fields for a record

The table name-value pair is also called as a pivot table

This method overcomes the deficiencies of storage wastage from previous method

Name-value pair

Page 39: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Name-value pair (contd..)Tenant ID Car license Service Cost Name-value pair record1 27522132

Name-value pair Name ID Value

275 15 5.5

Name ID Name Type

15 Service rating int

Service manager string

Tenant ID Data

1 Best garage

2 Friendly garage

3 Honest garage

Metadata table 2Metadata table 1

Data table 1

Data table 2

Page 40: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

OpenStack – a cloud computing operating system

Page 41: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Nova - Compute Service

Swift - Storage Service

Glance - Imaging Service

Keystone - Identity Service

Horizon - UI Service

Quantum - Network connectivity Service

Cinder - Block Storage Service

Ceilometer - billing, benchmarking, scalability, and statistics purposes

Heat: Orchestrates multiple composite cloud applications

9 core components of OpenStack (Havana)

Page 42: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

OpenStack conceptual architecture

Page 43: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Table 1.1. OpenStack current services (Havana)Service Project name Description

Dashboard Horizon Enables users to interact with OpenStack services to launch an instance, assign IP addresses, set access controls, and so on.

Compute Nova Provisions and manages large networks of virtual machines on demand.

Networking NeutronEnables network connectivity as a service among interface devices managed by other OpenStack services, usually Compute. Enables users to create and attach interfaces to networks. Has a pluggable architecture that supports many popular networking vendors and technologies.

StorageObject Storage Swift Stores and gets files. Does not mount directories like a file server.Block Storage Cinder Provides persistent block storage to guest virtual machines.

Shared services

Identity Service Keystone Provides authentication and authorization for the OpenStack services. Also provides a service catalog within a particular OpenStack cloud.

Image Service Glance Provides a registry of virtual machine images. Compute uses it to provision instances.

Metering/Monitoring Service

Ceilometer Monitors and meters the OpenStack cloud for billing, benchmarking, scalability, and statistics purposes.

Higher-level services

Orchestration Service

HeatOrchestrates multiple composite cloud applications by using either the native HOT template format or the AWS CloudFormation template format, through both an OpenStack-native REST API and a CloudFormation-compatible Query API.

Page 44: BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus.

BITS Pilani, Hyderabad Campus

Capacity management Introduction to PAAS (Drupal, Wolf frameworks,

force.com), 5 Principles of UI Design by AWS RAID (Redundant Array of Independent Disks) MapReduce - distributed programming frame work, Pig,

Hive Distributed File System (GFS,HDFS), cloud storage Multi-Tenancy, 4 levels multi-tenancy Cloud security OpenStack – a cloud computing operating system

Summary