1 Hands on Hadoop Daniel Templeton & Inyoung Cho Cloudera, Inc.
Jun 10, 2015
1
Hands on HadoopDaniel Templeton & Inyoung ChoCloudera, Inc.
2 ©2014 Cloudera, Inc. All rights reserved.2
Your Hosts
Daniel Templeton• Certification Developer• Crusty, old HPC guy• Likes Perl
Inyoung Cho• Certification Developer• Recovering Java
Evangelist• Invented JavaOne Hands-
on Labs
3 ©2014 Cloudera, Inc. All rights reserved.3
What is “Big Data”?
• Super-cool marketing buzz word• “Come see our new line of BIG DATA toasters…”
• “The Five V’s”• Any data that is difficult to store in a traditional
RDBMS• Too big, changes schemas too often, unstructured, …
4 ©2014 Cloudera, Inc. All rights reserved.4
What is Hadoop?
5 ©2014 Cloudera, Inc. All rights reserved.5
What is Hadoop?
6 ©2014 Cloudera, Inc. All rights reserved.6
HDFS in a Nutshell
• Distributed “file system” service• Highly scalable and fault resilient• Chunks files into “blocks” that are replicated and
distributed across the cluster
7 ©2014 Cloudera, Inc. All rights reserved.7
MapReduce in a Nutshell
• Embarrassingly parallel batch execution engine• Two phases: map and reduce
• https://www.youtube.com/watch?v=bcjSe0xCHbE• Tasks are scheduled to run where the data is• Jobs are written to Java API
8 ©2014 Cloudera, Inc. All rights reserved.8
Hive in a Nutshell
• SQL engine for Hadoop• Translates HiveQL into MapReduce jobs
9 ©2014 Cloudera, Inc. All rights reserved.9
Impala in a Nutshell
• Hive with the MapReduce
10 ©2014 Cloudera, Inc. All rights reserved.10
Pig in a Nutshell
• Script-like language for data operations• Translates into MapReduce jobs
11 ©2014 Cloudera, Inc. All rights reserved.11
The Lab
• Self-paced• Should take right about 2 hours• “Additional Exercises” if you finish early• Inyoung and I are here to answer questions• Have fun!
12 ©2014 Cloudera, Inc. All rights reserved.
Aaron Myers &Daniel Templeton