Top Banner
1 Gerrit and Jenkins for Big Data Continuous Delivery London, UK, June 2015
20

JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

Jan 15, 2017

Download

Technology

CloudBees
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

1

Gerrit and Jenkins for Big Data Continuous Delivery

London, UK, June 2015

Page 2: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

About GerritForge

•  Founded in 2009 in London •  Committed to OpenSource

2

Page 3: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

The Team

Luca Milanesio •  Co-founder and Director of GerritForge •  over 20 years in Agile Development and ALM •  OpenSource contributor to many projects

(BigData, Continuous Integration, Git/Gerrit)

3

Antonios Chalkiopulos •  Author of Programming MapReduce with Scalding •  Open source contributor to many BigData projects •  Working on the "land-of-Hadoop' (landoop.com)

Page 4: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

The Team (2)

Tiago Palma •  Data Warehouse & Big Data Development •  Senior Data Modeler •  Big Data infrastructure specialist

4

Stefano Galarraga •  20 years of Agile Development •  Middleware, Big Data, Reactive Distributed Systems. •  Open Source contributor to many BigData projects.

Page 5: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Agenda

•  Why continuous deployment on BigData? •  Our Development Lifecycle ingredients

–  Gerrit, Jenkins, Mesos, Marathon, CDH / Spark •  Topics to address in BigData development

–  Type of tests (Unit vs. Integration) –  Testing the "real thing" (aka the Cluster)

•  Our BigData virtualised infrastructure –  Marathon, Mesos and Dockers all around

•  Live (minimised) Demo 5

Page 6: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

WHY?

•  Early BigData had no process at all = may fail at any time •  Mature BigData is mission critical decision maker •  Need for more stable sw-engineering methodologies:

–  Test-Driven Development (Stefano's ScaldingUnit) –  Continuous Integration with Jenkins –  Integration & Performance testing –  Code review and validation

6

Page 7: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Code-Review BigData Lifecycle (1)

• GIT used by distributed teams (UK, Israel, India) • Topics and Code Review •  Jenkins build on every patch-set • Commits reviewed / approved via Gerrit Submit

7

Page 8: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Code-Review BigData Lifecycle (2)

8

Page 9: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Code-Review BigData Lifecycle (3)

•  Submitting a Topic automatically does: –  all patch-sets merged (semi-atomically) –  trigger a longer chain of CI steps –  automatically promote a RC if everything passes

•  Jenkins automation via Gerrit Trigger Plugin

9

Page 10: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Ingredients: Gerrit

• Git-based Code Review system

•  Pre-commit review •  Allows multiple validation steps

(pipeline) •  Validation + Integration flags

10

Page 11: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Ingredients: Jenkins

•  Plugins: –  Gerrit trigger –  Docker build step –  Post-build script plugin

11

Page 12: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Fitting CDH Into this Picture

•  Integration Test –  Running integration tests into an CDH-enabled docker

container –  Hadoop/local and Spark/standalone is not enough –  Need to test classes serialisation –  Validate package fat-jars (libs conflicts with CDH) –  Performance on a real cluster

12

Page 13: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Fitting CDH Into this Picture

•  Acceptance / performance test with short-lived CDHs •  Solution: Mesos, Marathon and Docker:

–  Ephemeral clusters with defined capacity –  Automatic cluster-config –  All controlled via Docker/Mesos

13

Page 14: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Mesos + Marathon

14

•  Apache Mesos –  Abstracts CPU, memory, storage, other compute

resources away from machines • Marathon Framework

–  Runs on top of Mesos –  Guarantees that long-running applications never

stop –  REST API for managing and scaling services

Page 15: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

CDH Components

•  CDH 5.4.1 distribution –  Apache Spark –  Hadoop HDFS –  YARN

15

Page 16: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf Slave Host

Integration Test Flow on CDH Cluster

16

Jenkins Master

Mesos Master Marathon Private

Docker Registry Mesos Slave Docker

POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents

Marathon Framework receives resource offers from Mesos Master and submits the tasks

The task is sent to the Mesos Slave

Mesos slave starts the docker container

Docker image is fetched from Docker registry if not present in Slave host W

aitin

g fo

r Doc

kers

Doc

kers

UP

Install Cloudera packages via Cloudera Manager API using Python

Deploy the ETL, run the ETL and the Integration Tests

Page 17: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

Unit and Integration Tests sample

•  Test project: –  Test Spark project –  ETL from Oracle to HDFS

•  Unit-test directly on Spark logic •  Integration tests for every patch-set:

–  VERY small dataset just for this demo –  CDH and Oracle Docker Images

17

Page 18: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

O

Unit and Integration Tests

18

Hadoop Pseudo-distributed mode

Spark Standalone

Jenkins

Build Job init

Submit job

Init/read HDFS

Page 19: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

#jenkinsconf

DEMO Small-scale of BigData Delivery Pipeline

19

Page 20: JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects

www.gerritforge.com

#jenkinsconf

References

•  Demo sources https://github.com/GerritForge

•  Blog: https://gitenterprise.me

•  Twitter: @GerritReview @GitEnterprise @GerritForge

•  Learn Gerrit Code Review book: GerritHub.io/book

•  Get in touch with GerritForge: [email protected]

20