Top Banner
Deep Learning and Representation Learning Discussion by: Piyush Rai (Some figures from Ruslan Salakhutdinov) August 01, 2014 Deep Learning () August 01, 2014
19

Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Mar 28, 2018

Download

Documents

tranque
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Deep Learning and Representation Learning

Discussion by: Piyush Rai

(Some figures from Ruslan Salakhutdinov)

August 01, 2014

Deep Learning () August 01, 2014

Page 2: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Deep Feature Learning

Deep Learning () August 01, 2014

Page 3: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Some Deep Architectures

Undirected graphical model based deep architectures (e.g., Deep Belief Nets):

Usually based on undirected models such as Restricted Boltzmann Machine(RBM) as building blocks

inference for the hidden variables is easy: P(h|x) =∏

iP(hi |x)

it’s possible to train the model in a layer-wise fashion. Training not so easybut possible via approximations such as Contrastive Divergence

Deep Learning () August 01, 2014

Page 4: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Today

Restricted Boltzmann Machine and its variants

Autoencoder and its variants

Building invariances: Convolutional Neural Networks

Deep architectures for supervised learning

Global training of deep architectures

Deep Learning () August 01, 2014

Page 5: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

A typical RBM

Binary visible v ∈ {0, 1}D, binary hidden units h ∈ {0, 1}F

Deep Learning () August 01, 2014

Page 6: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

RBM for real-valued data

Real-valued visible units v, binary-valued hidden units h

Deep Learning () August 01, 2014

Page 7: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

RBM for word counts

Count-valued visible units v, binary-valued hidden units h

Deep Learning () August 01, 2014

Page 8: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Conditional RBM

Traditional RBM Pθ(x, h) has observed variables x, hidden variables h andparameters θ = {W , b, c}

Often, we may have some context variables (or covariates) z

Conditional RBM Pθ(x, h|z) assumes that the parameters θ = f (z, ω) whereω are the actual “free” parameters

Example: hidden unit bias c = β +Mz; weights W could also depend on thecontext variables/covariates in some applications

Deep Learning () August 01, 2014

Page 9: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Autoencoder

Provides a direct parametric mapping from inputs to feature representation

Often used as a building block in deep architectures (just like RBMs)

Basic principle: Learns an encoding of the inputs so as to recover well theoriginal input from the encodings

Deep Learning () August 01, 2014

Page 10: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Autoencoder

Real-valued inputs, binary-valued encodings

Sigmoid encoder (parameter matrix W ), linear decoder (parameter matrixD), learned via:

arg minD,W

E (D,W ) =

N∑

n=1

||Dzn − xn||2 =

N∑

n=1

||Dσ(W xn)− xn||2

If encoder is also linear, then autoencoder is equivalent to PCA

Deep Learning () August 01, 2014

Page 11: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Autoencoder

Binary-valued inputs, binary-valued encodings

Similar to an RBM

Need constraints to avoid an identity mapping (e.g., by imposing sparsity onthe encodings or by “corrupting” the inputs)

Deep Learning () August 01, 2014

Page 12: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Sparse Autoencoders

Sparse binary encodings. Can impose L1 penalty on the codes

Predictive Sparse Decomposition (learns an explicit mapping from the inputto the encoding)

arg minD,W ,z

N∑

n=1

||Dzn − xn||2 + λ|zn|+ ||σ(W xn)− zn||

2

Deep Learning () August 01, 2014

Page 13: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Denoising Autoencoders

Idea: introduce stochastic corruption to the input; e.g.:

Hide some featuresAdd gaussian noise

Deep Learning () August 01, 2014

Page 14: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Stacked Autoencoders

Can be learned in a greedy layer-wise fashion

Deep Learning () August 01, 2014

Page 15: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Building Invariances: Convolutional Neural Network

Exploits topological structure in the data via three key ideas: local receptivefield, shared weights, and spatial or temporal sub-sampling

Ensures some degree of shift/scale/distortion invariance in the learnedrepresentation

Deep Learning () August 01, 2014

Page 16: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Building Invariances: Other ways

Generating transformed examples

.. via introducing random deformations that don’t change the target label

Temporal coherence and slow feature analysis

Deep Learning () August 01, 2014

Page 17: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Supervised Learning with Deep Architectures

Consider a Deep Belief Net trained in a supervisedfashion

Given labels y , train on the joint log-likelihood ofinputs and their labels logP(x, y)

Usually a two-step procedure is used

1 Unsupervised pre-training of DBN without labels

2 Fine-tuning the parameters by maximizing theconditional log-likelihood logP(y |x)

Deep Learning () August 01, 2014

Page 18: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Global Training of Deep Architectures?

Early successes were mainly attributed to layer-wise pre-training

Some recent successes with global training of deep architectures

Lots of labeled data, lots of tasks, artificially transformed examples

Proper initialization, efficient training (e.g., using GPUs), adaptive step-sizes

Choice of nonlinearities

Unsupervised layer-wise pre-training seems to act like a regularizer

.. less of a necessity when labeled training data is abundant

Deep Learning () August 01, 2014

Page 19: Deep Learning and Representation Learning - Duke …people.ee.duke.edu/~lcarin/Piyush8.1.2014.pdfSparse binary encodings. Can impose L ... representation Deep Learning August 01, 2014.

Other extensions of deep architectures

Hierarchical Deep Models: Deep + (NP)Bayes

Putting an HDP over the states of the top layer of a deep model

Allows sharing of statistical strengths across categories/classes and/or helpsgeneralize to novel/unseen categies by transfer learning

Deep models for multimodal data (text and images)

Thanks! Questions?

Deep Learning () August 01, 2014