University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method for Training Autoencoders for Deep Learning Networks MASTER’S THESIS DEFENSE SEAN LANDER ADVISOR: YI SHANG
43
Embed
University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
University of Missouri, Department of Computer ScienceUniversity of Missouri, Informatics Institute
Sean Lander, Master’s Candidate
An Evolutionary Method for Training Autoencoders for Deep Learning NetworksMASTER’S THESIS DEFENSE
SEAN LANDER
ADVISOR: YI SHANG
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
OverviewDeep Learning classification/reconstructionoSince 2006, Deep Learning Networks (DLNs) have changed the landscape of classification problemsoStrong ability to create and utilize abstract featuresoEasily lends itself to GPU and distributed systemsoDoes not require labeled data – VERY IMPORTANToCan be used for feature reduction and classification
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
OverviewProblem and proposed solutionoProblems with DLNs:oCostly to train with large data sets or high feature spacesoLocal minima systemic with Artificial Neural NetworksoHyper-parameters must be hand selected
oProposed Solutions:oEvolutionary based approach with local search phaseo Increased chance of global minimumoOptimizes structure based on abstracted featuresoData partitions based on population size (large data only)oReduced training timeoReduced chances of overfitting
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundPerceptronsoStarted with Perceptron in 1950oOnly capable of linear separabilityoFailed on XOR
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundArtificial Neural Networks (ANNs)oANNs went out of favor until the Multilayer Perceptron (MLP) introducedoPro: Non-linear classificationoCon: Time consuming
oAdvance in training: BackpropagationoIncreased training speedsoLimited to shallow networksoError propagation diminishes anumber of layers increase
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundBackpropagation using Gradient DescentoProposed in 1988, based on classification erroroGiven m training samples:
oFor each sample where calculate its error:
oFor all m training samples the total error can be calculated as:
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundDeep Learning Networks (DLNs)oAllows for deep networks with multiple layersoLayers pre-trained using unlabeled dataoLayers are “stacked” and fine tunedoMinimizes error degradation for deepneural networks (many layers)
oStill costly to trainoManual selection of hyper-parametersoLocal, not global, minimum
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundAutoencoders for reconstructionoAutoencoders can be used forfeature reduction and clusteringo“Classification error” is the abilityto reconstruct the sample inputoAbstracted features – output fromthe hidden layer – can be used toreplace raw input for othertechniques
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Related WorkEvolutionary and genetic ANNsoFirst use of Genetic Algorithms (GAs) in 1989oTwo layer ANN on a small data setoTested multiple types of chromosomal encodings and mutation types
oLate 1990s and early 2000s introduced other techniquesoMulti-level mutations and mutation priorityoAddition of local search in each generationoInclusion of hyper-parameters as part of the mutationoIssue of competing conventions starts to appearo Two ANNs produce the same results by sharing the same nodes but in a permuted order
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Related WorkHyper-parameter selection for DLNsoMajority of the work explored using newer technologies and methods such as GPU and distributed (MapReduce) trainingoImproved versions of Backpropagation, such as Conjugated Gradient or Limited Memory BFGS were tested under different conditionsoMost conclusions pointed toward manual parameter selection via trial-and-error
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Method 1Evolutionary Autoencoder (EvoAE)oIDEA: Autoencoders’ power are in their feature abstraction, the hidden node outputoTraining many AEs willmake more potentialabstracted featuresoBest AEs will contain thebest featuresoJoining these featuresshould create a better AE
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Method 1Evolutionary Autoencoder (EvoAE)
x x’
A3
A4
A1
A2
h x
B3
C2
B1
B2
h
Initi
aliza
tion
Loca
l Sea
rch
x’
Cros
sove
rM
utati
on
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Method 1ADistributed learning and Mini-batchesoTraining of generic EvoAE increases in time linearly to the size of the populationoANN training time increases drastically with data sizeoTo combat this, mini-batches can be used where each AE is trained against a batch and updatedoBatch size << total data
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Method 1ADistributed learning and Mini-batchesoEvoAE lends itself to distributed systemoData duplication and storage now an issue due to data duplication
Train• Forward propagation• Backpropagation
Rank• Calculate error• Sort
GA• Crossover• Mutate
Batch 1
Batch 2
…
Batch N
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Method 2EvoAE Evo-batchesoIDEA: When data is large, small batches can be representativeoPrevents overfitting as nodes being trained are almost always introduced to new dataoScales well with large amounts of data even when parallel training is not possibleoWorks well on limited memory systems by increasing size of the population, thus reducing data per batchoQuick training of large populations, equivalent to training a single autoencoder using traditional methods
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Method 2EvoAE Evo-batches
Data A Data B Data C Data D
Data D
Data C
Data B
Data A
Original Data
Local SearchCrossoverMutate
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Performance and TestingHardware and testing parametersoLenovo Y500 laptopoIntel i7 3rd generation 2.4GHzo12 GB RAM
oFull data too slow torun on datasetoEvoAE w/ population30 trains as quickly asa single baseline AEwhen using Evo-batch
Parameter MNIST
Hidden Size 200
Hidden Std Dev 80
Hidden +/- NULL
Mutation Rate 0.1
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
ConclusionsGood for large problemsoTraditional methods are still preferred choice for small problems and toy problemsoEvoAE with Evo-batch produces effective and efficient feature reduction given a large volume of dataoEvoAE is robust against poorly-chosen hyper-parameters, specifically learning rate
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Future WorkoImmediate goals:oTransition to distributed system, MapReduce based or otherwiseoHarness GPU technology for increased speeds (~50% in some
cases)
oLong term goals:oOpen the system for use by novices and non-programmersoMake the system easy to use and transparent to the user for both
modification and training purposes
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
Thank you
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundBackpropagation with weight decayoWe use this new cost to update weights and biases given some learning rate α:
oCost is prone to overfitting - weight decay variable λ is added
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundConjugated Gradient DescentoThis can become stuck in a loop, however, so we add a momentum term β
oThis adds memory to the equation, as we use previous updates
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
BackgroundArchitecture and hyper-parametersoArchitecture and hyper-parameter selection usually done through trial-and-erroroManually optimized and updated by handoDynamic learning rates can beimplemented to correct forsub-optimal learning rate selection
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
ResultsSmall datasets – UCI IrisoThe UCI Iris dataset has 150 samples with 4 features and 3 classesoBest error-to-speed:oBaseline 1
oBest overall error:oFull data None
Parameter Iris
Hidden Size 32
Hidden Std Dev NULL
Hidden +/- 16
Mutation Rate 0.1
Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science
ResultsSmall datasets – UCI Heart DiseaseoThe UCI Heart Disease dataset has 297 samples with 13 features and 5 classesoBest error-to-time:oBaseline 1