Top Banner
1. [Start] Generate random population of N chromosomes (suitable solutions for the problem) 2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3. [New population] Create a new population by repeating following steps until the new population is complete 1. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected) 2. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents. 3. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome). 4. [Accepting] Place new offspring in a new population
46

[Start] Generate random population of N chromosomes (suitable solutions for the problem)

Feb 25, 2016

Download

Documents

Ms Hudson

[Start] Generate random population of N chromosomes (suitable solutions for the problem) [Fitness] Evaluate the fitness f(x) of each chromosome x in the population [New population] Create a new population by repeating following steps until the new population is complete - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

1. [Start] Generate random population of N chromosomes (suitable solutions for the problem)

2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3. [New population] Create a new population by repeating following steps

until the new population is complete 1. [Selection] Select two parent chromosomes from a population

according to their fitness (the better fitness, the bigger chance to be selected)

2. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.

3. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).

4. [Accepting] Place new offspring in a new population 4. [Replace] Use new generated population for a further run of algorithm 5. [Test] If the end condition is satisfied, stop, and return the best solution in

current population 6. [Loop] Go to step 2.

Page 2: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• Check the following applets– http://www.obitko.com/tutorials/genetic-algorithms/exam

ple-function-minimum.php

maximum x = 6.092 y = 7.799 f(x,y)max = 100

Page 3: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• The GA described so far is similar to Holland’s original GA.

• It is now known as the simple genetic algorithm (SGA).

• Other GAs use different:– Representations– Mutations– Crossovers– Selection mechanisms

Page 4: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Crossover enhancements

• Multipoint crossover

• Uniform crossover

Page 5: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Crossover OR mutation?

• Decade long debate: which one is better• Answer (at least, rather wide

agreement):– it depends on the problem, but– in general, it is good to have both– these two have different roles– a mutation-only-GA is possible, an xover-

only-GA would not work

Page 6: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• Crossover is explorative– Discovering promising areas in the search space.– It makes a big jump to an area somewhere “in

between” two (parent) areas.• Mutation is exploitative – Optimizing present information within an already

discovered promising region.– Creating small random deviations and thereby not

wandering far from the parents.

Page 7: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• They complement each other.– Only crossover can bring together information

from both parents.– Only mutation can introduce completely new

information.

Page 8: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Feature Selection, Feature Extraction

Page 9: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Why to do it?1. We’re interested in features – we want to

know which are relevant. If we fit a model, it should be interpretable.• facilitate data visualization and data understanding• reduce experimental costs (measurements)

2. We’re interested in prediction – features are not interesting in themselves, we just want to build a good predictor.• faster training• defy the curse of dimensionality

Page 10: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Selection vs. Extraction• In feature selection we try to find the best subset of the

input feature set.

• In feature extraction we create new features based on transformation or combination of the original feature set.

• Both selection and extraction lead to the dimensionality reduction.

• No clear cut evidence that one of them is superior to the other on all types of task.

Page 11: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Feature selection(FS)

Page 12: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Classification of FS methods• Filter– Assess the relevance of features only by looking at the intrinsic

properties of the data.– Usually, calculate the feature relevance score and remove low-

scoring features.• Wrapper– Bundle the search for best model with the FS.– Generate and evaluate various subsets of features. The

evaluation is obtained by training and testing a specific ML model.

• Embedded– The search for an optimal subset is built into the classifier

construction (e.g. decision trees).

Page 13: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Filter methods

• Two steps (score-and-filter approach)1. assess each feature individually for ist potential in

discriminating among classes in the data2. features falling beyong threshold are eliminated

• Advantages:– easily scale to high-dimensional data– simple and fast– independent of the classification algorithm

• Disadvantages:– ignore the interaction with the classifier– most techniques are univariate (each feature is considered

separately)

Page 14: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Scores in filter methods• Distance measures– Euclidean distance

• Dependence measures– Pearson correlation coefficient– χ2-test – t-test– AUC

• Information measures– information gain– mutual information

• complexity: O(d)

Page 15: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Wrappers

• Search for the best feature subset in combination with a fixed classification method.

• The goodness of a feature subset is determined using cross-validation (k-fold, LOOCV)

• Advantages:– interaction between feature subset and model selection– take into account feature dependencies– generally more accurate

• Disadvantages:– higher risk of overfitting than filter methods– very computationally intensive

Page 16: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Exhaustive search• Evaluate all possible subsets using exhaustive search

– this leads to the optimum subset.• For a total of d variables, and subset of size p, the

total number of possible subsets is

• complexity: O(2d) (exponential)• Various strategies how to reduce the search space.– They are still O(2d), but much faster (at least 1000-times)– e.g. “branch and bound”

!! !

dd p p

e.g. d = 100, p = 10 → ≈2×1013

Page 17: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Stochastic

• Genetic algorithms• Simulated Annealing

Page 18: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Deterministic

• Sequential Forward Selection (SFS)• Sequential Backward Selection (SBS)• “ Plus q take away r ” Selection• Sequential Forward Floating Search (SFFS)• Sequential Backward Floating Search (SBFS)

Page 19: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Sequential Forward Selection

• SFS• At the beginning select the best feature using

a scalar criterion function.• Add one feature at a time which along with

already selected features maximizes the criterion function.

• A greedy algorithm, cannot retract (also called nesting effect).

• Complexity is O(d)

Page 20: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Sequential Backward Selection

• SBS• At the beginning select all d features.• Delete one feature at a time and select the

subset which maximize the criterion function.• Also a greedy algorithm, cannot retract.• Complexity is O(d).

Page 21: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

“Plus q take away r” Selection

• At first add q features by forward selection, then discard r features by backward selection

• Need to decide optimal q and r• No subset nesting problems Like SFS and SBS

Page 22: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Sequential Forward Floating Search

• SFFS• It is a generalized “plus q take away r”

algorithm• The value of q and r are determined

automatically• Close to optimal solution• Affordable computational cost • Also in backward disguise

Page 23: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Embedded FS

• The feature selection process is done inside the ML algorithm.

• Decision trees– In final tree, only a subset of features are used

• Regularization– It effectively “shuts down” unnecessary features.– Pruning in NN.

Page 24: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Feature extraction(FE)

Page 25: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• FS – indetify and select the “best” features with respect to the target task.– Selected features retain their original physical

interpretation.• FE – create new features as a transformation

(combination) of original features. Usually followed by FS.– May provide better discriminatory ability than the

best subset.– Do not retain the original physical interpretation,

may not have clear meaning.

Page 26: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Principal Component Analysis(PCA)

Page 27: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

x1

x2

Page 28: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

x1

x2

Make datato have zero mean (i.e. move data into [0, 0] point).

centering

Page 29: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)
Page 30: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

x1

x2

This is a line given by equation w0 + w1x1 + w2x2

This is another linew’0 + w’1x1 + w’2x2

Page 31: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

x1

x2

The variability in data is highest along this line. It is called 1st principal component.

And this is 2nd principal component.

Page 32: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

x1

x2

Principal components (PC’s) are linear combinations of original coordinates.

The coefficients of linear combination (w0, w1, …) are called loadings.

In the transformed coordinate system, individual data points have different coordinates, these are called scores.

w0 + w1x1 + w2x2

w’0 + w’1x1 + w’2x2

Page 33: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• PCA - orthogonal linear transformation that changes the data into a new coordinate system such that the variance is put in order from the greatest to the least.

• Solve the problem = find new orthogonal coordinate system = find loadings

• PC’s (vectors) and their corresponding variances (scalars) are found by eigenvalue decompositions of the covariance matrix C = XXT of the xi variables.– Eigenvector corresponding to the largest eigenvalue is 1st PC.– The 2nd eigenvector (the 2nd largest eigenvalue) is orthogonal

to the 1st one. …• Eigenvalue decomposition is computed using standard

algorithms: eigen decomposition of covariance matrix (e.g. QR algorithm), SVD of mean centered data matrix.

Page 34: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Interpretation of PCA

• New variables (PCs) have a variance equal to their corresponding eigenvalue

Var(Yi)= i for all I = 1…p• Small i small variance data changes little

in the direction of component Yi

• The relative variance explained by each PC is given by i / i

Page 35: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

How many components?

• Enough PCs to have a cumulative variance explained by the PCs that is >50-70%

• Kaiser criterion: keep PCs with eigenvalues >1• Scree plot: represents the ability of PCs to

explain de variation in data

Page 36: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)
Page 37: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)
Page 38: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Difficulty in Searching Global Optima

startingpoint

descenddirection

local minima

global minima

barrier to local search

Introduction to Simulated Annealing, Dr. Gildardo Sánchez ITESM Campus Guadalajara

Page 39: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Consequences of the Occasional Ascents

Help escaping the local optima.

desired effect

Might pass global optima after reaching it

adverse effect

Introduction to Simulated Annealing, Dr. Gildardo Sánchez ITESM Campus Guadalajara

Page 40: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Simulated annealing

• Slowly cool down a heated solid, so that all particles arrange in the ground energy state.

• At each temperature wait until the solid reaches its thermal equilibrium.

• Probability of being in a state with energy E:

1 exp

B

EP EZ T k T

E … energyT … temperaturekB … Boltzmann constantZ(T) … normalization factor

Page 41: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Cooling simulation• At a fixed temperature T :

– Perturb (randomly) the current state to a new state– E is the difference in energy between current and new state– If E < 0 (new state is lower), accept new state as current state– If E 0 , accept new state with probability

• Eventually the systems evolves into thermal equilibrium at temperature T

• When equilibrium is reached, temperature T can be lowered and the process can be repeated

expB

EPk T

Metropolis, 1953Metropolis algorithm

Page 42: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Simulated annealing

• Same algorithm can be used for combinatorial optimization problems:– Energy E corresponds to the objective function C– Temperature T is parameter controlled within the

algorithm

Page 43: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Algorithm

initialize;REPEAT REPEAT perturb ( config.i config.j, Cij);

IF Cij < 0 THEN accept

ELSE IF exp(-Cij/T) > random[0,1) THEN accept;

IF accept THEN update(config.j); UNTIL equilibrium is approached sufficient closely; T := next_lower(T);UNTIL system is frozen or stop criterion is reached

Page 44: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

Parameters

• Choose the start value of T so that in the beginning nearly all perturbations are accepted (exploration), but not too big to avoid long run times

• At each temperature, search is allowed to proceed for a certain number of steps, L(k).

• The function next_lower (T(k)) is generally a simple function to decrease T, e.g. a fixed part (80%) of current T.

Page 45: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)

• At the end T is so small that only a very small number of the perturbations is accepted (exploitation).

• The choice of parameters {T(k), L(k)} is called the cooling schedule.

• If possible, always try to remember explicitly the best solution found so far; the algorithm itself can leave its best solution and not find it again.

Page 46: [Start]  Generate random population of  N  chromosomes (suitable solutions for the problem)