Top Banner
Amazon Elastic Map Reduce (EMR) Saturday, December 6, 2014
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Amazon EMR

Amazon Elastic Map Reduce (EMR)

Saturday, December 6, 2014

Page 2: Amazon EMR

Agenda

08:30 AM Breakfast09:00 AM Introduction and Strengths of Technologies10:00 AM Start an EMR Cluster

10:15 AM break + set up query tool10:30 AM Hadoop hands-on10:55 AM break11:10 AM Redshift hands-on11:40 AM Operationalizing your code12:00 PM adjourn

12/6/2014 2

Page 3: Amazon EMR

Session Goals

• Understand:

• When to use EMR?

• Do:

• Start Cluster

• Load Data from S3

• Transform Data

• Unload Data to S3

12/6/2014 3

Draw elements from Gil’s deckPattern

Page 4: Amazon EMR

When to use EMR?

• Some Boolean combination of the following:

• Ephemeral clusters

• Batch processing: daily, weekly, etc.

• User Defined Functions (UDF)

• File formats

• TB, PB data sets in S3

• Instant gratification

12/6/2014 4

Page 5: Amazon EMR

Let’s Do This!

12/6/2014 5

What do we need?

• Key (.pem file)

• SQL Workbench

What will we do?

• Start Cluster

• Load stock market data from S3

• Calculate Sharpe ratio

• Unload Sharpe ratio results to S3

The Sharpe Ratio characterizeshow well the return of an assetcompensates the investor for therisk taken. Roughly, the higher thebetter.

Page 6: Amazon EMR

AWS Console

12/6/2014 6

• Just google “aws console”

Page 7: Amazon EMR

12/6/2014 7

Click Here

Where’s EMR?

Page 8: Amazon EMR

Create Cluster

12/6/2014 8

Page 9: Amazon EMR

Cluster Options

12/6/2014 9

• Lots of them!• Cluster Configuration• Tags - Skip• Software Configuration• File System Configuration• Hardware Configuration• Security and Access• IAM Roles• Bootstrap Actions• Steps

Page 10: Amazon EMR

Cluster Configuration

12/6/2014 10

Page 11: Amazon EMR

Software Configuration

12/6/2014 11

More fun stuff in here

Page 12: Amazon EMR

File System Configuration

12/6/2014 12

Page 13: Amazon EMR

Hardware Configuration

12/6/2014 13

$ 0.28 / hour

Set Core and Task to 0

Page 14: Amazon EMR

Security and Access

12/6/2014 14

Finally we get to use our keys!

Page 15: Amazon EMR

IAM Roles

12/6/2014 15

Just defaults, please

More JSON in here

Page 16: Amazon EMR

Bootstrap Actions

12/6/2014 16

• Tweak configuration• Install custom application

(Apache Drill, Mahout, etc.)• Shell scripts

Page 17: Amazon EMR

Steps

12/6/2014 17

Page 18: Amazon EMR

Steps

12/6/2014 18

Page 19: Amazon EMR

Steps: Hive Program

12/6/2014 19

Page 20: Amazon EMR

Provisioning

12/6/2014 20

Page 21: Amazon EMR

Bootstrapping

12/6/2014 21

Here’s your hostname

SSH Info

Page 22: Amazon EMR

Monitor Startup Progress

12/6/2014 22

Page 23: Amazon EMR

SSH – Linux/Mac

12/6/2014 23

Page 24: Amazon EMR

SSH - Windows

12/6/2014 24

Page 25: Amazon EMR

Port Forwarding (Mac/Linux)

12/6/2014 25

ssh -i ~/.ec2/emr-training.pem -L 10000:localhost:10000

[email protected]

Page 26: Amazon EMR

Connect with SQL Workbench:

12/6/2014 26

• Localhost

• Autocommit

• Default URL

Page 27: Amazon EMR

Load Data from S3

12/6/2014 27

Familiar SQL

Describe file format

Pull from DK bucket

Page 28: Amazon EMR

Calculate Daily Returns

12/6/2014 28

Copy data into our new table

Create a table in HDFS

Hive has Windowing and Analytic Features

Daily Return =(adjclose[n] – adjclose[n-1]) -1

Page 29: Amazon EMR

Calculate Sharpe Ratio

12/6/2014 29

Page 30: Amazon EMR

Export Our Data

12/6/2014 30

Define CSV output

Write out data

Page 31: Amazon EMR

Terminate!

12/6/2014 31

Page 32: Amazon EMR

Links and Resources

• SQLWorkbench/J

• AWS EMR Documentation

• Hive Language Manual

12/6/2014 32