Top Banner
Thailand Hadoop Big Data Challenge #1 13-15 March 2015
43
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thailand Hadoop Big Data Challenge #1

Thailand Hadoop BigData Challenge #1

13-15 March 2015

Page 2: Thailand Hadoop Big Data Challenge #1

2

Special thanks to Amazon Web Servicesfor supporting AWS's credit to run

EMR Hadoop cluster

Page 3: Thailand Hadoop Big Data Challenge #1

3

Schedule13 March

– 16.00 - 18.00 Workshop / Demo on Big Data Analyticsusing Amazon EMR

– 18.00: Start registration for those who interested in runningthe cluster for 30 Hours & Account access to Amazon EMRwill be given

14 March

– 06.00 Amazon EMR Cluster will be opened

– Participant will be discussed via online / Social Media

15 March (@ EGA Office)

– 12.00 Amazon EMR will be closed

– 13.00 Presentation by each competitor on the result

– 15.30 Winner Announcement

Page 4: Thailand Hadoop Big Data Challenge #1

4

Architecture Overview of Amazon EMR

Page 5: Thailand Hadoop Big Data Challenge #1

5

Hadoop Cluster for the challenge

10 AWS’s m3.xlarge EC2 server each with4vCPU, 15 GByte Memory, 80 GB SSD Memory

A sample data set with more than 10 millionrecords will be given

Page 6: Thailand Hadoop Big Data Challenge #1

6

Challenge rules

A competitor can use a sample data to analysewith Hive, Pig or Map/Reduce

In addition, a competitor can use own large set ofdata.

A winner will be judged from those who have abest innovation / result from the analytics.

Those who are just would like to try using thecluster are also welcome

Page 7: Thailand Hadoop Big Data Challenge #1

7

Judging Criteria:

Complexity of the problem & Data Set 30%

Benefit to the society 20%

Innovation 30%

Presentation 20%

Page 8: Thailand Hadoop Big Data Challenge #1

8

Judges

Assoc.Prof. Dr.Jirapun Daengdej

Mr. Danairat Thanabodithammachari

Dr.Thanachart Numnonda

Ms.Nantawan Wongkachonkitti

Page 9: Thailand Hadoop Big Data Challenge #1

9

Awards

The best winner will receive an Apple TV.

Two winners will be selected for two free trainingcourses on– Big Data using Hadoop Workshop; 30-31 March 2015

– Business Intelligence Design and Process; 18-20, 25-26May 2015

Starbucks Card 200 Baht

Page 10: Thailand Hadoop Big Data Challenge #1

10

EMR Cluster Setup(This will be done by IMC Institute)

Page 11: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Select EMR

Page 12: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR

Page 13: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Name the cluster and also specify Log folder

Page 14: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Leave the Software Configuration as default

Page 15: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Leave the Hardware Configuration as default

Choose an exisitng EC2 key pair

Page 16: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Creating a cluster in EMR (cont.)

Leave the others as default

Select Create Cluster

Page 17: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

EMR Cluster Details

Note on the Master public DNS:

To see the details on how to connect to the Master Node using SSH click at SSH

Page 18: Thailand Hadoop Big Data Challenge #1

18

Running the cluster

Page 19: Thailand Hadoop Big Data Challenge #1

19

Set Up an SSH Tunnel to the Master Node

– See instruction at– http://docs.aws.amazon.com/ElasticMapReduce/latest/

DeveloperGuide/emr-ssh-tunnel.html

Page 20: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

SSH Instruction for Mac/Linux

Page 21: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

SSH Instruction for Windows

Page 22: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Connect to the master node

Page 23: Thailand Hadoop Big Data Challenge #1

23

Launch the Hue Web Interface

Set Up an SSH Tunnel to the Master Node

– See instruction at

– http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html

Configure Proxy Settings to View Websites

– See instruction at

– http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-proxy.html

Page 24: Thailand Hadoop Big Data Challenge #1

24

Launch the Hue Web Interface (Cont.)

http://master-public-dns-name:8888/

Page 25: Thailand Hadoop Big Data Challenge #1

25

Page 26: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Web Interface Host on EMR Cluster

Page 27: Thailand Hadoop Big Data Challenge #1

27

Running Hive Demo

Page 28: Thailand Hadoop Big Data Challenge #1

28

Movielen Data

http://grouplens.org/datasets/movielens/

MovieLens 10M

(http://files.grouplens.org/datasets/movielens/ml-10m.zip)

– ratings.dat

– users.dat

– movies.dat

Page 29: Thailand Hadoop Big Data Challenge #1

29

Transfer Data to Hadoop Cluster

wget http://files.grouplens.org/datasets/movielens/ml-10m.zip

Page 30: Thailand Hadoop Big Data Challenge #1

30

Change data format

Page 31: Thailand Hadoop Big Data Challenge #1

31

Upload Data to Amazon S3

hadoop fs -put movies.csv s3://imcinstitute/data

Page 32: Thailand Hadoop Big Data Challenge #1

32

Running Hive from CLI

Page 33: Thailand Hadoop Big Data Challenge #1

33

Running Hive from Hue

Page 34: Thailand Hadoop Big Data Challenge #1

34

Running Examplehttps://github.com/myui/hivemall/wiki/MovieLens-Dataset

Page 35: Thailand Hadoop Big Data Challenge #1

35

Data Challenge

Page 36: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Flight Details Data

http://stat-computing.org/dataexpo/2009/the-data.html

Page 37: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Data Description

Page 38: Thailand Hadoop Big Data Challenge #1

Thanachart Numnonda, [email protected] Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop

Snapshot of Dataset

Page 39: Thailand Hadoop Big Data Challenge #1

39

Register for thechallenge

Page 40: Thailand Hadoop Big Data Challenge #1

40

Registration

Provide your name, organization, mobile, e-mailaddress

On-site registartion at 17.00 pm, 13 March

E-mail: [email protected]

Facebook message to Thanachart Numnonda

Your username & password & key & public DNS willbe send to your e-mail by 6 am, 14 March

Page 41: Thailand Hadoop Big Data Challenge #1

41

On-line communication

Facebook Group: Hadoop-Thailand

Line group

Facebook message

E-mail to [email protected]

Page 42: Thailand Hadoop Big Data Challenge #1

42

www.facebook.com/imcinstitute

Page 43: Thailand Hadoop Big Data Challenge #1

43

Thank you

[email protected]/imcinstitutewww.slideshare.net/imcinstitutewww.thanachart.org