CEG7380 Cloud Computing Lecture 1 Keke Chen. Outline Syllabus Scope of this course Tentative schedule Prerequisites Resources Assignments Introduction.

Post on 11-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

CEG7380 Cloud ComputingLecture 1

Keke Chen

Outline Syllabus

Scope of this course Tentative schedule Prerequisites Resources Assignments

Introduction

Scope of this course Understand the basic ideas of cloud

computing Get familiar with

Tools Systems

Expose to some research topics

Two major parts: Processing large data with the cloud Scaling up/down web applications

with the cloud

Note: some programming parts need self-study

Prerequisites Some programming skills

Java, python, shell Comfortable with learning new

programming frameworks

Sufficient knowledge about Data structure and databases Operating systems Distributed systems

Assignments and Grading Reading papers (~3) (10%) Some miniprojects (4~5) (60%)

Help you master the concepts Learn to use tools and systems

Self-motivated research projects are strongly encouraged!

Final exam (20%) Class attendance and discussion

(10%)

Resources updated reference list Inhouse hadoop cluster AWS access

coupon code for each student

Pilot Submitting reading assignments and

projects

Tentative Schedule Parallel data processing

Distributed file systems (GFS, HDFS) MapReduce High-level distributed data management

Cloud infrastructures Virtualization AWS and Eucalyptus Interactive front-end – Google App Engine

Cloud security and privacy Research topics

In projects, we will learn to use Hadoop Mapreduce, Pig Latin AWS google app engine

Cloud Computinglecture 1-2

Some slides are borrowed from UC Berkeley RAD Lab

Keke Chen

Outline What is cloud computing? Why now? Cloud killer applications Cloud economics Challenges and opportunities

“above the cloud” “Clairemont Report”

What is Cloud Computing?

Old idea: Software as a Service (SaaS) Def: delivering applications over the

Internet Recently: “[Hardware, Infrastrucuture,

Platform] as a service”

Utility Computing: pay-as-you-use computing Illusion of infinite resources No up-front cost Fine-grained billing (e.g. hourly)

12

Cloud computing vs. grid computing Cloud computing = virtualization+

grid + services + utility computing Grid computing: resource provisioning,

load balancing, parallel processing

Views of different users System admin/hadoop users: grid Application owners/service users:

service, utility

Users and cloud providers

Why Now?

Experience with very large datacenters – profitable for cloud providers economics of scale Pervasive broadband Internet Fast x86 virtualization Pay-as-you-go billing model

Large user base Online payment Online Ads Content distribution Web 2.0 lowers the entry point to e-business

more small e-business owners Large user base of clouds

15

Spectrum of Clouds

Instruction Set VM (Amazon EC2, 3Tera)

Bytecode VM (Microsoft Azure) Framework VM

Google AppEngine, Force.com

EC2 Azure AppEngine Force.com

Lower-level,Less management

Higher-level,More management

16

Cloud Killer Apps

Mobile and web applications Batch processing / MapReduce

Data analytics (big data) E.g., OLAP, data mining, machine learning

Extensions of desktop software Matlab, Mathematica

17

Unused resources

Cloud Economics

• Pay by use instead of provisioning for peak

Static data center Data center in the cloud

Demand

Capacity

Time

Demand

Capacity

Time

18

Unused resources

Economics of Cloud Users

• Risk of over-provisioning: underutilization

Static data center

Demand

Capacity

Time

19

Economics of Cloud Users

• Heavy penalty for under-provisioning

Lost revenue

Lost users

Demand

Capacity

Time (days)1 2 3

Demand

Capacity

Time (days)1 2 3

Demand

Capacity

Time (days)1 2 3

20

Economics of Cloud Providers

5-7x economies of scale [Hamilton 2008]

Extra benefits Amazon: utilize off-peak capacity Microsoft: sell .NET tools Google: reuse existing infrastructure

ResourceCost in

Medium DCCost in

Very Large DC Ratio

Network $95 / Mbps / month $13 / Mbps / month 7.1x

Storage $2.20 / GB / month $0.40 / GB / month 5.7x

Administration ≈140 servers/admin >1000 servers/admin 7.1x

21

Adoption Challenges

Challenge Opportunity

Availability Multiple providers & DCs

Data lock-in Standardization

Data Confidentiality, Auditability, and privacy

Encryption, VLANs, Firewalls; Geographical Data Storage; Privacy preserving data outsourcing

22

Growth Challenges

Challenge Opportunity

Data transfer bottlenecks

FedEx-ing disks, Data Backup/Archival

Performance unpredictability

Improved VM support, flash memory, scheduling VMs

Scalable storage Invent scalable store

Bugs in large distributed systems

Invent Debugger that relies on Distributed VMs

Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots

23

Policy and Business Challenges

Challenge Opportunity

Reputation Fate Sharing Offer reputation-guarding services like those for email

Software Licensing Pay-for-use licenses; Bulk use sales

24

Research Challenges Mentioned by Database Community (Claremont

Report)

Functionality and operational cost Background: compare massive-scale

data intensive computing systems with today’s DBMS

Limited functionality Simple APIs (e.g. mapreduce) Pushes more burden on developers

Benefits Easier to manage Lower operational cost Service Level Agreement (SLA) that is hard

to provide for a SQL DBMSP.S. DB Systems are notorious for their expenses in

installation and maintenance.

Manageability Features of cloud systems

Limited human intervention High variance workloads A variety of shared infrastructures No DBAs or Administrators to assist developers

Systems need to do work automatically Self-managing Adaptive (autonomous) computing

Data security and privacy Users sharing physical resources in a

cloud Protect from each other (security) Protect from curious cloud providers

(privacy)

Successes may depend on specific target usage scenarios Examples

Query based services Mining based services

Datasets over multiple clouds Interesting datasets might be

available in different clouds Different cloud providers Private or public clouds

Services mashing up datasets Inevitably crossing clouds

Federated cloud architectures

Algorithms on Big data Working on “Big Data”

Data mining Machine learning Visualization

Traditionally assume data is in flat files or relational databases

Distributed data organization puts new challenges Redesign algorithms Redesign frameworks

top related