Top Banner
49

Cloud Architectures - Jinesh Varia - GrepTheWeb

Jun 10, 2015

Download

Technology

jineshvaria

Paper: http://media.amazonwebservices.com/AWS_Cloud_Architectures.pdf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 2: Cloud Architectures - Jinesh Varia - GrepTheWeb

On Cloud Computing….

“We in academia and the government labs have not kept up with the times, Universities really need to get on board.”

- Randal E. Bryant, Dean of the Computer Science School at Carnegie Mellon University.

source: http://www.nytimes.com/2007/10/08/technology/08cloud.html

Page 3: Cloud Architectures - Jinesh Varia - GrepTheWeb

What is Amazon?

3

Page 4: Cloud Architectures - Jinesh Varia - GrepTheWeb

1996 1997 1998 1999 2000 2001 20022001 2002 2003 2004 2005 2006 2007

Bandwidth consumed byAmazon Web Services

Bandwidth consumed byAmazon’s global websites

2008

Amazon.com and AWS

Page 5: Cloud Architectures - Jinesh Varia - GrepTheWeb

AWS Customer Momentum (490,000)

0 100 200 300 400 500 600

Q4 2008

Q1 2008

Q1 2007

Q1 2006

Page 6: Cloud Architectures - Jinesh Varia - GrepTheWeb

Amazon S3 Momentum

6

Q2

2006

800,000,000

Total Objects Stored in Amazon S3

Q2

2007

5,000,000,000

Q3

2007

10,000,000,000

Q4

2008

40,000,000,000

Page 7: Cloud Architectures - Jinesh Varia - GrepTheWeb

Why Are People So Excited ?

Page 8: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 9: Cloud Architectures - Jinesh Varia - GrepTheWeb

Most Companies Worry About This

Your Idea Successful

Product

Undifferentiated

“Heavy Lifting”

Power/Cooling

Hardware Management

Bandwidth Management

Contract Negotiations

Maintenance

Deployment

Purchasing Decisions

Load Balancing/Scaling

Managing Growth

Page 10: Cloud Architectures - Jinesh Varia - GrepTheWeb

70/30 Switch

Page 11: Cloud Architectures - Jinesh Varia - GrepTheWeb

Focus on Innovation

Successful

Product

Undifferentiated

“Heavy Lifting”Your Idea

Cloud Computing

Page 12: Cloud Architectures - Jinesh Varia - GrepTheWeb

Amazon Cloud Computing

Focus On Your Idea

Spend Cash Wisely

Get Big Fast

Pay As You Go

Simple, Reliable, Fast

Elastic Unlimited Capacity

Page 13: Cloud Architectures - Jinesh Varia - GrepTheWeb

Amazon

EC2-EBS

Amazon

SimpleDB

Amazon

S3

Amazon

EC2Amazon

SQS

Page 14: Cloud Architectures - Jinesh Varia - GrepTheWeb

ANIMOTO.COM

Page 15: Cloud Architectures - Jinesh Varia - GrepTheWeb

Scale: 50 servers to 5000 servers in 3 days

Nu

mb

er

of

EC

2 I

nsta

nces

4/12/2008

Launch of Facebook modification.

Amazon EC2 easily scaled

to handle additional traffic

Peak of 5000 instances

4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/20084/17/20084/13/2008

Steady state of ~40 instances

Page 16: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 17: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 18: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 19: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 20: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 21: Cloud Architectures - Jinesh Varia - GrepTheWeb

“TimesMachine” from NY Times

1851-1922 Articles

TIFF -> PDF

Input: 11 Million Articles (4TB of data)

What did he do ?

100 EC2 Instances for 24 hours

All data on S3

Output: 1.5 TB of Data

Hadoop, iText, JetS3t

Page 22: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 23: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 24: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 25: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 26: Cloud Architectures - Jinesh Varia - GrepTheWeb

26

Page 27: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 28: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 29: Cloud Architectures - Jinesh Varia - GrepTheWeb

CS290F : Scalable Internet Services

USCB Fall 2006

Prof created an app to manage team usage

Ruby on Rails

Complete Stack: From Load balancer, App Server to DB

Learn how to scale: Simulated load

Generated Graphs

All course contents, students assignments, lessons learned are on the Wiki

Page 30: Cloud Architectures - Jinesh Varia - GrepTheWeb

CS345a : Data Mining @ Stanford

Tools used:

Shell/Linux/Java

Hadoop on EC2

Data set on S3

Datasets :NetFlix, Alexa, IR datasets from TREC

Class organization:

Stanford Winter 2007

30-35 Students

Each Team spawns 10-15 Hadoop slave nodes

TA created Getting-Started AMIs (& scripts)

TA managed the students usage

Page 31: Cloud Architectures - Jinesh Varia - GrepTheWeb

Bioinformatics @ Northwestern University

31

• Using Hadoop to perform sequence alignments on large genomic datasets– Northwestern University (Flatow & Lin) presented

a talk at the Next-gen Sequencing Data Analysis meeting• “An understanding of the industrial strength map-

reduce paradigm will be invaluable to those looking to cope with the next-generation datasets. Combined with the power of elastic computing clouds, many of the potential barriers to dealing with such large-scale data can be completely eliminated.”

Page 32: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 33: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 34: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 35: Cloud Architectures - Jinesh Varia - GrepTheWeb

Cloud Architectures

Hardware

Infrastructure/Cost

time

Job execution time

Page 36: Cloud Architectures - Jinesh Varia - GrepTheWeb

Shrink your processing time

CPUs

time

Page 37: Cloud Architectures - Jinesh Varia - GrepTheWeb

Shrink your processing time

CPUs

time

Page 38: Cloud Architectures - Jinesh Varia - GrepTheWeb

Main Problems

• How to co-ordinate jobs between machines (distributed processing) ?

• What if a machine fails ?

• How will I Scale-out ?

Technical

• How do I get management signoff ?

• Resources to manage the infrastructure?

• How do I get rid of the Idle Infrastructure?

Business

Hadoop

Web Services

Cloud Computing

Page 39: Cloud Architectures - Jinesh Varia - GrepTheWeb

GrepTheWeb

Page 40: Cloud Architectures - Jinesh Varia - GrepTheWeb

What’s so cool about GrepTheWeb ?

RegExWWW

Page 41: Cloud Architectures - Jinesh Varia - GrepTheWeb

Examples of Patterns

Source Code

int x = 40 + i

Any thing with punctuation

“Hey!” he said, “Are you ok?”

Case Sensitive

Function CallOrderController()

Equations

f(x) = x^2

Other Patterns

(dis)integration of life, Email Address

Page 42: Cloud Architectures - Jinesh Varia - GrepTheWeb

Zoom Level 1

AlexaGrepTheWeb

Service

RegExGetStatus

Subset of document URLs that matched the RegEx

Input dataset (List of Document Urls)

Page 43: Cloud Architectures - Jinesh Varia - GrepTheWeb

Zoom Level 2

Amazon SQS

Controller

AmazonEC2

Cluster AmazonS3

AmazonSimpleDB

DB

User info, Job status info

Launch, Monitor, Shutdown

InputOutput

Manage phases

StartGrepRegEx

GetStatus

Input Files (Alexa Crawl)

Get Output

Amazon SQSDistributed TransientBuffer

Never Lose a message

Ideal for small short-lived messages

Access control

Message Locking

Amazon S3Infinitely Scalable Storage in the cloud

Highly Available, Durable and Reliable

Private and Public StoragePay by the GB

Amazon EC2Resizable Computing Capacity in the cloud

Spawn Server Instances using a Web Service call

Root Level Access

Pay by the hour

Amazon SimpleDBDatabase in the cloud

Lightweight Query-able Attribute Store

Distributed and Partitioned

Pay by GB, Pay per Query

Page 44: Cloud Architectures - Jinesh Varia - GrepTheWeb

Zoom Level 3

Amazon SimpleDB

Amazon SQS

Controller

Amazon S3

Master MSlaves N

HDFS

Hadoop Cluster on Amazon EC2

Launch Queue

Monitor Queue

Launch Controller

ShutdownQueue

Monitor Controller

Billing Queue

Shutdown Controller

StatusDB

Output

Billing Service

Billing Controller

launch

ping

Shutdown

Insert JobID, Status

Insert EC2 info

Get EC2 Info

Put File

InputGet File

Check for results

StartGrep

GetStatus

Input Files (Alexa Crawl)

Get Output

Page 45: Cloud Architectures - Jinesh Varia - GrepTheWeb

Zoom Level 4

Map

Map

Map

…..

Map

Reduce

Combine

Hadoop JobTasks

User1StartJob1 StopJob1

Service

Map

Map

Map

…..

Map

Reduce

Combine

Hadoop JobTasks

User2StartJob2

StopJob2

Store status and results

Get Result

Page 46: Cloud Architectures - Jinesh Varia - GrepTheWeb

SideTrack: WordCount Example

MAPPER: For each input record, extract

a set of key/value pairs that we care

about the each record

REDUCER: For each extracted

key/value pair, combine it with other

values that share the same key

“Hi Hadoop, Bye Hadoop”

(“Hi”, 1), (“Hadoop”, 1),

(“Bye”, 1), (“Hadoop”, 1)

(“Hadoop”, [1,1])

(“Hadoop”, 2)

Source: Doug Cutting’s Slide Deck on Hadoop

Input key

value pairs

key 1

Values..

AggregateKey 1

All Values..

key 3

Values..

Final Key 1

Values..

Input

Map

Reduce

Page 47: Cloud Architectures - Jinesh Varia - GrepTheWeb

Zoom Level 5 (Hadoop MapReduce)

Input key

value pairs

key 1

Values..

AggregateKey 1

All Values..

MAPPER: For each input record, extract a set of key/value pairs that we care about the each record

REDUCER: For each extracted key/value pair, combine it with other values that share the same key

(LineNumber, s3pointer)

(s3pointer, [matches])

Identity Function

key 3

Values..

Final Key 1 Values..

Source: Doug Cutting’s Slide Deck on Hadoop

Input

Map

Reduce

Page 48: Cloud Architectures - Jinesh Varia - GrepTheWeb
Page 49: Cloud Architectures - Jinesh Varia - GrepTheWeb