Top Banner
Marc Cluet – Lynx Consultants What’s behind Big Data
46
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to hadoop

Marc  Cluet  –  Lynx  Consultants  What’s  behind  Big  Data  

Page 2: Introduction to hadoop

What we’ll cover?

¡  Understand  Hadoop  components  ¡  Understand  different  technologies  involved  ¡  Embrace  Big  Data!  

Lynx  Consultants  ©  2013  

Page 3: Introduction to hadoop

What is Big Data?

Lynx  Consultants  ©  2013  

Page 4: Introduction to hadoop

What is Big Data?

¡   SQL  has  a  limited  ability  to  process  changing  data  §  SQL  schemas  are  the  truth,  data  needs  to  fit  that  

Lynx  Consultants  ©  2013  

Page 5: Introduction to hadoop

What is Big Data?

¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  

Lynx  Consultants  ©  2013  

Page 6: Introduction to hadoop

What is Big Data?

¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  

Lynx  Consultants  ©  2013  

Page 7: Introduction to hadoop

What is Big Data?

¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  §  Designed  for  fault  tolerance  and  securing  data  

Lynx  Consultants  ©  2013  

Page 8: Introduction to hadoop

What is Big Data?

¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  §  Designed  for  fault  tolerance  and  securing  data  §  Designed  around  exploiting  hardware  to  the  fullest  

Lynx  Consultants  ©  2013  

Page 9: Introduction to hadoop

What is Big Data?

¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  §  Designed  for  fault  tolerance  and  securing  data  §  Designed  around  exploiting  hardware  to  the  fullest  §  Designed  around  Map/Reduce  

Lynx  Consultants  ©  2013  

Page 10: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 11: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 12: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 13: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 14: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 15: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 16: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 17: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 18: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 19: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 20: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 21: Introduction to hadoop

Who runs Big Data?

¡  A  few  small  companies  

Lynx  Consultants  ©  2013  

Page 22: Introduction to hadoop

What is Hadoop?

Lynx  Consultants  ©  2013  

Page 23: Introduction to hadoop

What is Hadoop?

¡   Hadoop  is  one  of  the  big  players  for  Big  Data  §  Developed  as  an  Open  Source  implementation  to  implement  

Google  BigTable  

Lynx  Consultants  ©  2013  

Page 24: Introduction to hadoop

What is Hadoop?

¡   Hadoop  is  one  of  the  big  players  for  Big  Data  §  Developed  as  an  Open  Source  implementation  to  implement  

Google  BigTable  §  Mainly  developed  at  Yahoo!  

Lynx  Consultants  ©  2013  

Page 25: Introduction to hadoop

What is Hadoop?

¡   Hadoop  is  one  of  the  big  players  for  Big  Data  §  Developed  as  an  Open  Source  implementation  to  implement  

Google  BigTable  §  Mainly  developed  at  Yahoo!  §  Current  companies  behind  it:  Hortonworks  and  Cloudera  

Lynx  Consultants  ©  2013  

Page 26: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  §  HDFS  is  a  distributed  filesystem  across  many  nodes  §  Has  many  copies  of  your  data  (default:  3)  §  If  one  node  goes  down  makes  sure  all  the  data  is  rebalanced  

Lynx  Consultants  ©  2013  

Page 27: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  

Lynx  Consultants  ©  2013  

Page 28: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  

§  Schemaless  Key-­‐Value  storage  §  All  data  exportable  in  JSON  

Lynx  Consultants  ©  2013  

Page 29: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  

Lynx  Consultants  ©  2013  

Page 30: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  

§  This  was  invented  by  Google  §  Given  a  dataset  we  Map  all  that  match  a  criteria  §  Then  we  Reduce  this  to  a  result  

Lynx  Consultants  ©  2013  

Page 31: Introduction to hadoop

What are the features of Hadoop?

¡  Map/Reduce  –  The  key  to  it  all  

Lynx  Consultants  ©  2013  

Page 32: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  ¡   Hive  –  SQL  for  NoSQL  

§  Hive  provides  a  SQL  language  called  HiveSQL  §  Provides  a  good  entrance  for  SQL  users  :)  

Lynx  Consultants  ©  2013  

Page 33: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  ¡   Hive  –  SQL  for  NoSQL  ¡   Pig  –  Map/Reduce  made  easy  

§  Creates  data  results  given  a  reduced  language  §  Reinvents  SQL  somehow  

Lynx  Consultants  ©  2013  

Page 34: Introduction to hadoop

What are the features of Hadoop?

¡   Hive  

Lynx  Consultants  ©  2013  

Page 35: Introduction to hadoop

What are the features of Hadoop?

¡   Pig  

Lynx  Consultants  ©  2013  

Page 36: Introduction to hadoop

What are the features of Hadoop?

¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  ¡   Hive  –  SQL  for  NoSQL  ¡   Pig  –  Map/Reduce  made  easy  ¡   Flume  –  Fault  Tolerant  transport  

Lynx  Consultants  ©  2013  

Page 37: Introduction to hadoop

What are the features of Hadoop?

¡   Flume  §  Divides  in  Sources,  Channels,  Sinks  §  Can  have  multiple  of  everything,  makes  it  fault  tolerant  §  Many  sources!  ▪  Avro,  Exec,  JMS,  Syslog,  HTTP,  NetCat,  Your  Own  (Java)  

Lynx  Consultants  ©  2013  

Page 38: Introduction to hadoop

What are the features of Hadoop?

¡   Flume  §  Divides  in  Sources,  Channels,  Sinks  §  Can  have  multiple  of  everything,  makes  it  fault  tolerant  §  Many  sources!  §  Many  channels!  ▪  Memory,  File,  Your  Own  (Java)  

Lynx  Consultants  ©  2013  

Page 39: Introduction to hadoop

What are the features of Hadoop?

¡   Flume  §  Divides  in  Sources,  Channels,  Sinks  §  Can  have  multiple  of  everything,  makes  it  fault  tolerant  §  Many  sources!  §  Many  channels!  §  Many  sinks!  ▪  Avro,  HDFS,  Logger,  IRC,  File,  Hbase,  ElasticSearch,  S3,  Community  sinks,  Your  Own  (Java)  

Lynx  Consultants  ©  2013  

Page 40: Introduction to hadoop

What are the features of Hadoop?

¡   Flume  

Lynx  Consultants  ©  2013  

Page 41: Introduction to hadoop

How Hadoop looks like in a DC

¡   Components  §  Primary  Namenode  §  Secondary  Namenode  §  Data  Node  

Lynx  Consultants  ©  2013  

Page 42: Introduction to hadoop

How Hadoop looks like in a DC

¡   Components  §  Primary  Namenode  ▪  Controls  all  the  cluster,  knows  where  the  data  resides  ▪  Runs  the  job  tracker  to  keep  track  of  Map/Reduce  jobs  ▪  Biggest  point  of  failure,  shadowing  it  is  a  potential  option  

§  Secondary  Namenode  §  Data  Node  

Lynx  Consultants  ©  2013  

Page 43: Introduction to hadoop

How Hadoop looks like in a DC

¡   Components  §  Primary  Namenode  §  Secondary  Namenode  ▪  Performs  secondary  cleanup  options  

§  Data  Node  

Lynx  Consultants  ©  2013  

Page 44: Introduction to hadoop

How Hadoop looks like in a DC

¡   Components  §  Primary  Namenode  §  Secondary  Namenode  §  Data  Node  ▪  Stores  all  the  information  ▪  Runs  Map/Reduce  

Lynx  Consultants  ©  2013  

Page 45: Introduction to hadoop

How Hadoop looks like in a DC

¡   Components  

Lynx  Consultants  ©  2013  

Page 46: Introduction to hadoop

Questions?

Lynx  Consultants  ©  2013