Bare metal Hadoop provisioning

Post on 23-Aug-2014

1206 Views

Category:

Investor Relations

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.

Transcript

GoDataDrivenPROUDLY PART OF THE XEBIA GROUP

@krisgeuskrisgeusebroek@godatadriven.com

Bare metal Hadoop provisioning

Kris GeusebroekBig Data Hacker

With ansible and cobbler

1

-- Big Data Borat

“Give man Hadoop cluster he gain insight for a day. Teach man build Hadoop cluster he soon leave for better job. #bigdata”

2

-- Kris Geusebroek

“We’re hiring”

3

GoDataDriven

Don’t want to...Manually install everything needed for a Hadoop cluster...

4

GoDataDriven

Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6, NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or Hortonworks Data Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)

5

GoDataDriven

Want...- Horizontal scaling: Effort for an extra machine is minimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for a specific goal - Think memory, hard disks, number of nodes

6

GoDataDriven

Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)

7

GoDataDriven

Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks

8

GoDataDriven

What we have done here...Nothing new, just another possibility

Nothing tool specific- demo installs Cloudera Manager, but works also with Hortonworks Data Platform.

Most important is:

9

GoDataDriven

Stack...

10

-- Big Data Borat

“Essentially, this solution is CoSSaaS.”

11

-- Big Data Borat

“Essentially, this solution is CoSSaaS. (Couple of Shell Scripts as a Service)”

12

GoDataDriven

Cobbler...

Cobbler used for - CMS- DHCP server- OS image hosting- OS kickstart

cobblerd.org

13

GoDataDriven

Ansible...

Ansible used for - Tying it all together

- Initial setup of network config- One time push of SSH key- Full software install

ansible.cc

14

GoDataDriven

Cloudera Manager...

Cloudera Manager used for - Cluster install software.

- Currently manual labour, can be automated using the API

cloudera.com

15

GoDataDriven

Show me the code...

Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64

Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01 --mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True

If needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True

16

GoDataDriven

Show me the code...

Ansible needs to know what goes where[cluster]node01node02node03

[cobbler]cobbler

[proxy]cobbler

[ganglia-master]node01

[ganglia-nodes:children]cluster

[cloudera-manager]node01

17

GoDataDriven

Show me the code...

For the rest it’s just a DSL thinghy with extra’s- hosts: - cloudera-manager - cluster user: root sudo: yes vars_files: - vars/common.yml tasks: - include: cloudera-manager/tasks/common.yml handlers: - include: cloudera-manager/handlers/main.yml

- name: Configure CM4 Repo copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root group=root

- name: Install CM4 common stuff yum: name=$item state=installed

18

Demo...

19

GoDataDriven

Shared problems...- No magic: Vendor specific hardware can screw things up (strange names for disk mounts for example)- Bios settings, different RAID settings are not handled (yet).- Large amount of initial network traffic with large clusters (N-times downloading the same software packages from yum repositories) => Repo mirroring to the rescue- MAC address of all nodes must be known

20

GoDataDriven

Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head starthttps://github.com/godatadriven/ansible_cluster- Our team will do the additional consultancy

21

GoDataDriven

We’re hiring / Questions? / Thank you!

@krisgeuskrisgeusebroek@godatadriven.com

Kris GeusebroekBig Data Hacker

22

top related