Top Banner
Genetic Programming on Hadoop @DanRosanova Senior Architect West Monroe Partners
37

Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Jul 07, 2018

Download

Documents

phunganh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Genetic Programming on Hadoop@DanRosanova

Senior Architect

West Monroe Partners

Page 2: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

A little about me & West Monroe Partners15 years in technology consulting

5 time Microsoft Integration MVP

Author of BizTalk 2010 Patterns

Specialize in distributed computing

Business & Technology Consulting

450+ staffers

10 offices across North America

Partner of Bearing Point

Page 3: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

What is Genetic Programming?Biologically inspired computation

A type of Evolutionary Computation

A stochastic programming concept

Requires massive computation

Big Data?

Page 4: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Basic Steps of Evolutionary Program1. Setup

2. Create Initial Population

3. Assess Fitness

4. Breed & Create Next Generation

5. Repeat 3-5 until an optimal organism exists

Page 5: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating the EnvironmentEnvironment in which to run simulations

This needs to match the space we’re working in

It can be a fixed space or a formulaic / Computational space

Page 6: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Representing a Genetic ProgramArrays (Genetic Algorithms)

Trees / graphs (Genetic Programs)

Genetic algorithms vs. Generic programs

Fixed pieces vs executable

Page 7: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating an Initial Population Random is important

Really important!

Initial boundaries and weights

How big is an organism

How large is the population

Page 8: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Assessing FitnessFitness Measure is how we rank the population

Fitness measures should generally try to assess:

Effort

Cost

Max / Min limits or exposures (i.e. risk)

Source of optimization

Page 9: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

The Generational DivideKeep the fit

Cross breed a subset

Introduce random mutation

Create new organisms to mix into the population

Image from SHIVESH BHATIA http://shivesh-writerspoint.blogpost.in/

Page 10: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Today’s Genetic Program - AntsAnts is a simple demonstration of Genetic Algorithms

Made to be easy to understand

Is basically a search / optimization algorithm

It can be run on small data or big

Good starting point for other problems

- like swarm / colony solutions

Page 11: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating the Board (Environment)Simple Grid – 2D Array

Food is represented as 1’s

Ants will find food as they run

The board is not the target

There will be only one board

Page 12: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating the Board (Environment)Simple Grid – 2D Array

Food is represented as 1’s

Ants will find food as they run

The board is not the target

There will be only one board

Page 13: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating the Board (Environment)Simple Grid – 2D Array

Food is represented as 1’s

Ants will find food as they run

The board is not the target

There will be only one board

Page 14: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating the Board (Environment)Simple Grid – 2D Array

Food is represented as 1’s

Ants will find food as they run

The board is not the target

There will be only one board

Page 15: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Creating the Board (Environment)Simple Grid – 2D Array

Food is represented as 1’s

Ants will find food as they run

The board is not the target

There will be only one board

Page 16: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Representing our Programs Ants can make steps in specific directions

The directions are a fixed set:Up

Down

Left

Right

The length of the algorithm is variable

If four parts seems too simple remember G, A, T, C – the nucleobases

Page 17: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Cross Breeding OffspringSelect two parents

Split them each in half

Combine the two halves

Works almost like we do!

Page 18: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

The Genetic Process1. Start with a random population

2. Assess the population fitness

3. Combine to create new Generation

4. Repeat 2&3

Page 19: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

An Express Tour of Hadoop

Page 20: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

HDFS - the Core of HadoopSelf managing & self healing

Scale Linearly

Programs go to data – NOT the normal way

Simple core – modular and extensible

It’s a file system – think of basic I/O operations

Page 21: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

What you'll need to follow alongHortonworks Sandbox 2.0: http://hortonworks.com/products/hortonworks-sandbox/

Python – 2.6.6 is on the Sandbox

Web browser

Files from http://danrosanova.wordpress.com/ants/

A little patience and imagination – for Ants not the Sandbox

Page 22: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Map Reduce - How we will use HadoopBatch Based

Standard in/out (i.e. command line)

Lowest common approach / works with anything

Sends Key Value pairs between steps

Population In Map

Map

Reduce

Reduce

Map

Population Out

Implicit Sort

Page 23: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Map Reduce with PythonPython is easy to use*

Easy to test

Has lots of features

Is easily readable

Is part of the Hortonworks Hadoop distribution

Page 24: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Solution Structure/Ants

/Runtime/N (where N = 0-# of generations

/Final

/GenerateBoard.py

/GenPopMap.py

/RunMap.py

/ReproReducer.py

/RunGenetic.sh (yes, I know, that’s not python)

Page 25: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Logging onto the Sandbox1. Browse to: http://192.168.56.101/

2. Click “GO TO SANDBOX”

3. Some have the wrong IP in the linkif yours does, make sure it matcheswhat the VM says on the startupscreen

Page 26: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Take a look around

Page 27: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Go to Hue ShellClick on Bash

Start typing stuff

hadoop fs –mkdir /Ants

cd Ants

Page 28: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Upload filesClick /

Click Ants

Click Upload

Page 29: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Back in the Hue / Bash shellmkdir Ants

cd Ants

hadoop fs –get /Ants/*

python GenerateBoards.py 100 testboard

python GenPopMap.py 100 3 25 > testpop.txt

cat testpop.txt | python RunMap.py testboard 50 100

cat testpop.txt | python RunMap.py testboard 50 100 | python ReproReducer.py 100 3 25

Page 30: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Check PointWe have a basic genetic process working

We can see how the | operator allows us to pipe lines of the population

Now let’s look at Hadoop Streaming

Page 31: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Hadoop StreamingThis is how we could generate a population with Hadoop in HDFS

hadoop

jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2*.jar

-file 'GenPopMap.py'

-input testboard (this is the HDFS path)

-output (some HDFS path for output_

-mapper "python ./GenPopMap.py 100 3 25"

Page 32: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Running the whole processchmod +x RunGenetic.py

./RunGenetic.sh 4 500 200 100 5 30

Generations = 4

Population = 500

Board Size = 100

Min Gene = 5

Max Gene = 30

Results in /Ants/Runtime (in HDFS)

Page 33: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Why is this a good use case?WAY easier than MPI

We’re using it like MPI and distributing our files with the job

They’ve already been deleted so you wont see them

We get the intermediate files so we can see variations between generations

This is actually a big issue with Evolutionary Computation

This scales and the Ants program lets it

Try a board of 1000 and a population of 1,000,000+

Don’t try it on the Sandbox

Page 34: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Genetic Trading Algorithms & Programs

<

MA EMA

5 12

Page 35: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Q&A

Page 36: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

ConclusionGenetic Programs and algorithms are good for wide space problems

Undirected learning – perhaps when there are so many variable you just don’t know

Beware of local optimization and over fitting

Go play God for a while!

Page 37: Genetic Programming on Haddop - WordPress.com · What is Genetic Programming? Biologically inspired computation A type of Evolutionary Computation A stochastic programming concept

Genetic Programming on Hadoop@DanRosanova

Senior Architect

West Monroe Partners