Top Banner
Go DataDriven PROUDLY PART OF THE XEBIA GROUP @krisgeus [email protected] Bare metal Hadoop provisioning Kris Geusebroek Big Data Hacker With ansible and cobbler 1
22

Bare metal Hadoop provisioning

Aug 23, 2014

Download

Investor Relations

GoDataDriven

Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bare metal Hadoop provisioning

GoDataDrivenPROUDLY PART OF THE XEBIA GROUP

@[email protected]

Bare metal Hadoop provisioning

Kris GeusebroekBig Data Hacker

With ansible and cobbler

1

Page 2: Bare metal Hadoop provisioning

-- Big Data Borat

“Give man Hadoop cluster he gain insight for a day. Teach man build Hadoop cluster he soon leave for better job. #bigdata”

2

Page 3: Bare metal Hadoop provisioning

-- Kris Geusebroek

“We’re hiring”

3

Page 4: Bare metal Hadoop provisioning

GoDataDriven

Don’t want to...Manually install everything needed for a Hadoop cluster...

4

Page 5: Bare metal Hadoop provisioning

GoDataDriven

Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6, NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or Hortonworks Data Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)

5

Page 6: Bare metal Hadoop provisioning

GoDataDriven

Want...- Horizontal scaling: Effort for an extra machine is minimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for a specific goal - Think memory, hard disks, number of nodes

6

Page 7: Bare metal Hadoop provisioning

GoDataDriven

Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)

7

Page 8: Bare metal Hadoop provisioning

GoDataDriven

Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks

8

Page 9: Bare metal Hadoop provisioning

GoDataDriven

What we have done here...Nothing new, just another possibility

Nothing tool specific- demo installs Cloudera Manager, but works also with Hortonworks Data Platform.

Most important is:

9

Page 10: Bare metal Hadoop provisioning

GoDataDriven

Stack...

10

Page 11: Bare metal Hadoop provisioning

-- Big Data Borat

“Essentially, this solution is CoSSaaS.”

11

Page 12: Bare metal Hadoop provisioning

-- Big Data Borat

“Essentially, this solution is CoSSaaS. (Couple of Shell Scripts as a Service)”

12

Page 13: Bare metal Hadoop provisioning

GoDataDriven

Cobbler...

Cobbler used for - CMS- DHCP server- OS image hosting- OS kickstart

cobblerd.org

13

Page 14: Bare metal Hadoop provisioning

GoDataDriven

Ansible...

Ansible used for - Tying it all together

- Initial setup of network config- One time push of SSH key- Full software install

ansible.cc

14

Page 15: Bare metal Hadoop provisioning

GoDataDriven

Cloudera Manager...

Cloudera Manager used for - Cluster install software.

- Currently manual labour, can be automated using the API

cloudera.com

15

Page 16: Bare metal Hadoop provisioning

GoDataDriven

Show me the code...

Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64

Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01 --mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True

If needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True

16

Page 17: Bare metal Hadoop provisioning

GoDataDriven

Show me the code...

Ansible needs to know what goes where[cluster]node01node02node03

[cobbler]cobbler

[proxy]cobbler

[ganglia-master]node01

[ganglia-nodes:children]cluster

[cloudera-manager]node01

17

Page 18: Bare metal Hadoop provisioning

GoDataDriven

Show me the code...

For the rest it’s just a DSL thinghy with extra’s- hosts: - cloudera-manager - cluster user: root sudo: yes vars_files: - vars/common.yml tasks: - include: cloudera-manager/tasks/common.yml handlers: - include: cloudera-manager/handlers/main.yml

- name: Configure CM4 Repo copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root group=root

- name: Install CM4 common stuff yum: name=$item state=installed

18

Page 19: Bare metal Hadoop provisioning

Demo...

19

Page 20: Bare metal Hadoop provisioning

GoDataDriven

Shared problems...- No magic: Vendor specific hardware can screw things up (strange names for disk mounts for example)- Bios settings, different RAID settings are not handled (yet).- Large amount of initial network traffic with large clusters (N-times downloading the same software packages from yum repositories) => Repo mirroring to the rescue- MAC address of all nodes must be known

20

Page 21: Bare metal Hadoop provisioning

GoDataDriven

Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head starthttps://github.com/godatadriven/ansible_cluster- Our team will do the additional consultancy

21

Page 22: Bare metal Hadoop provisioning

GoDataDriven

We’re hiring / Questions? / Thank you!

@[email protected]

Kris GeusebroekBig Data Hacker

22