Deep Learning
All purpose machine learning
Using Neural Networks:- Using large amounts of data- Learning very complex problems - Automatically learning features
A new era of machine learning
Deep learning wins all competitions- IJCNN 2011 Traffic Sign Recognition Competition- ISBI 2012 Segmentation of neuronal structures in EM stacks challenge- ICDAR 2011 Chinese handwriting recognition
ApplicationsA lot of state of the art systems use deep learning to some extent:- IBMs Watson: Jeopardy contest 2011- Google’s self-driving car- Google Glasses- Facebook face recognition- Facebook user modellingMostly image and sound recognition tasks (difficult)
Google Brain (2011)- 10 million youtube/imagenet images- 1 billion parameters- 16.000 processors- Largely unsupervised!- 20.000 categories- 15.8% accuracy
Bigger, betterDeep Learning:- The scope of what computers can learn has greatly been increased- Interaction with the real world
Linking neurons and training
- Initialize randomly- Sequentially give it data.- See what the difference is between network output and actual output.- Update the weights according to this error.- End result: give a model input, and it produces a proper output.
Quest for the weights. The weights are the model!
The Perceptron (1958)
“A machine which senses, recognizes, remembers, and responds like the human mind”“Remarkable machine… [was] capable of what amounts to thought” - The New Yorker
Criticism and downfall (1969)
- Perceptrons are painfully limited. They can not even learn a simple XOR function!
- No feasible way of learning networks with multiple layers
- Interest in neural networks close to fully disappeared
Renewed interest (90’s)
- Learning multiple layers- “Back propagation”- Can theoretically learn any function!
But…Very slow and inefficient
- Machine learning attention towards SVMs, random forests etc.
Deep learing (2006)
- Quest: Mimic human brain representations- Large networks- Lots of data
Problem:Simple back propagation fails on large networks.
Deep learning (2006)
- Exactly same networks as before, just BIGGER
- Combination of three factors:- (Big data)- Better algorithms- Parallel computing (GPU)
Better algorithms
Restricted Boltzmann machinePre-training: Learn the representation by parts!Very strong unsupervised learning
After pre-training, use back propagation
Parallel (GPU) power- Every set of weights can be stored as a matrix (w_ij)- GPUs are made to do common parallel problems fast!- All similar calculations done at the same time, huge performance boost.- CPU parallelizing
Future of Deep Learning- Currently an explosion of developments
- Hessian-Free networks (2010)- Long Short Term Memory (2011)- Large Convolutional nets, max-pooling (2011)- Nesterov’s Gradient Descent (2013)
- Currently state of the art but...- No way of doing logical inference (extrapolation)- No easy integration of abstract knowledge- Hypothetic space bias might not conform with reality
When to apply Deep Learning- Generally, vision and sound recognition, but...
- Works great for any other problem too!- A lot of data / features- Don’t want to make your own features- State of the art results
How to apply Deep LearningDeep learning is very difficult!- No easy plug and play software- Far too many different networks/options/additions- Mathematics and programming very challenging- Research is fast paced- Learning a network is both an art and a science
My advice:Cooperation university <=> business