Top Banner
Hola Hadoop
22

Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

Dec 24, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

Hola Hadoop

Page 2: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

0. Clean-Up The Hard-disks

• Delete tmp/ folder from workspace/mdp-lab3• Delete unneeded downloads

Page 3: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

0. Peligro!

Please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please

Page 4: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

0. Peligro!

… please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please please

Page 5: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

Peligro!

… please

Page 6: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

Peligro!

… please be careful of what you are doing!

• Think twice before:rm mvcpkillemacs/vim/… configuration files

Page 7: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

Peligro!

… please.

Page 8: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

• cluster.dcc.uchile.cl

Page 9: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

1. Download tools

• http://aidanhogan.com/teaching/cc5212-1/tools/

• Unzip them somewhere you can find them

Page 10: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

2. Log-in PuTTy

1

2

3

Page 11: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

3. Open DFS Browser

http://cluster.dcc.uchile.cl:50070/

Page 12: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

3. PuTTy: Upload data to HDFS

• hadoop fs -ls /• hadoop fs -ls /uhadoop• hadoop fs -mkdir /uhadoop/[username]– [username] = first letter first name, last name (e.g.,

“ahogan”)• cd /data/hadoop/hadoop/data/• hadoop fs -copyFromLocal

/data/hadoop/hadoop/data/es-abstracts.txt /uhadoop/[username]/es-abstracts.txt

Page 13: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

Note on namespace

• If you need to disambiguate local/remote files

• HDFS file– hdfs://cm:9000/uhadoop/…

• Local file– file:///data/hadoop/...

Page 14: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

4. Let’s Build Our First MapReduce Job

• Hint: Use Monday’s slides for “inspiration”– http://aidanhogan.com/teaching/cc5212-1/

1. Implement map(.,.,.,.) method

2. Implement reduce(.,.,.,.) method

3. Implement main(.) method

Page 15: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

5. Eclipse: Build jar

Right Click build.xml > dist

(Might need to make a dist folder)

Page 16: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

6. WinSCP: Copy .jar to Master Server

Don’t save password!

1

2

3

4

Page 17: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

6. WinSCP: Copy .jar to Master Server

Page 18: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

6. WinSCP: Copy .jar to Master Server

• Create dir: /data/2014/uhadoop/[username]/• Copy your mdp-lab4.jar into it

Page 19: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

7. Putty: Run Job

• hadoop jar /data/2014/uhadoop/[username]/mdp-lab4.jar WordCount /uhadoop/[username]/es-abstracts.txt /uhadoop/[username]/wc/

All one command!

Page 20: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

8. Look at output

• hadoop fs -ls /uhadoop/[username]/wc/

• hadoop fs -cat /uhadoop/[username]/wc/part-00000 | more

• hadoop fs -cat /uhadoop/[username]/wc/part-00000 | grep -e "^de" | more

All one command!

Look for “de” … 4575144 occurrences in local run

Page 21: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.

9. Look at output through browser

http://cluster.dcc.uchile.cl:50070/

Page 22: Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.