Top Banner
Deep Learning for Vision Presented by Kevin Matzen Wednesday, April 9, 14
42

Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

May 24, 2018

Download

Documents

vutu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Deep Learning for Vision

Presented by Kevin Matzen

Wednesday, April 9, 14

Page 2: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Quick Intro - DNN

• Feed-forward

• Sparse connectivity (layer to layer)

• Different layer types

• Recently popularized for vision[Krizhevsky, et. al. NIPS 2012]

Wednesday, April 9, 14

Page 3: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

The Layers• Convolution

• Fully connected

• Pooling

• Neuron activation function

• Normalization

• Loss functions

• Image processing

Wednesday, April 9, 14

Page 4: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

deeplearning.net/tutorial/lenet.html

Wednesday, April 9, 14

Page 5: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

[Krizhevsky, NIPS 2012]

Wednesday, April 9, 14

Page 6: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Software

• code.google.com/p/cuda-convnet/[nvidia gpu]

• github.com/UCB-ICSI-Vision-Group/decaf-release/[deprecated; cpu-only]

• caffe.berkeleyvision.org[cpu; nvidia gpu]

• research.google.com/archive/large_deep_networks_nips2012.html[proprietary; distributed system]

Wednesday, April 9, 14

Page 7: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

DeepPose: Human Pose Estimation via Deep Neural NetworksAlexander Toshev, Christian Szegedy - CVPR 2014

DeepFace: Closing the Gap to Human-Level Performance in Face VerificationYaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf - CVPR 2014

Wednesday, April 9, 14

Page 8: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

DeepPose: Human Pose Estimation via Deep Neural NetworksAlexander Toshev, Christian Szegedy - CVPR 2014

DeepFace: Closing the Gap to Human-Level Performance in Face VerificationYaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf - CVPR 2014

Wednesday, April 9, 14

Page 9: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Input: Uncropped photoOutput: Joint locations

Wednesday, April 9, 14

Page 10: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Pipeline

1. Person detection

2. Joint position regression

3. Joint refinement

Wednesday, April 9, 14

Page 11: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

DatasetsLeeds Sports Pose (LSP) [Johnson, et. al. BMVC 2010]

Frames Labeled in Cinema (FLIC) [Sapp, et. al. CVPR 2013]

Image Parse [Ramanan NIPS 2006]

Buffy Stickmen

14 joint locations2000main person - 150 px

5003person detector every 10 frames of 30 movies20k candidatesmturk10 upperbody joints305 images

similar to leedsincludes casual photos

748 frames

Wednesday, April 9, 14

Page 12: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Person Detection

• Input: Uncropped image

• Output: Cropped image

• LSP dataset - No person detector

• FLIC dataset - Enlarged face detector

Wednesday, April 9, 14

Page 13: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Wednesday, April 9, 14

Page 14: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Main difference

Wednesday, April 9, 14

Page 15: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Wednesday, April 9, 14

Page 16: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Runtime

• 0.1s per image - 12 cores (SotA - 1.5s, 4s)

• Training stage 0 - 3 days

• Training refinement - 7 days each

Wednesday, April 9, 14

Page 17: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Evaluation

• Percentage of Correct Parts (PCP)

• Correct if predicted limb is within 1/2 of correct limb length

• Percentage of Detected Joints (PDJ)

• Predicted and correct joints are within some factor of torso diameter

Wednesday, April 9, 14

Page 18: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Wednesday, April 9, 14

Page 19: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Wednesday, April 9, 14

Page 20: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Wednesday, April 9, 14

Page 21: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Wednesday, April 9, 14

Page 22: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

DeepPose: Human Pose Estimation via Deep Neural NetworksAlexander Toshev, Christian Szegedy - CVPR 2014

DeepFace: Closing the Gap to Human-Level Performance in Face VerificationYaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf - CVPR 2014

Wednesday, April 9, 14

Page 23: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Pipeline

• Detect faces

• Correct out-of-plane rotation

• Generate features via CNN

• Classify

Wednesday, April 9, 14

Page 24: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Alignment

Wednesday, April 9, 14

Page 25: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Fiducial Detection

• LBP histograms

• Support Vector Regressor

• Iteratively transform and predict

• 6 fiducial points for 2D alignment

• 67 fiducial points for 3D alignment

Wednesday, April 9, 14

Page 26: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

3D Alignment

• Iterative affine camera PnP

• 3D reference - Average mesh of USF Human-ID dataset

• Considers fiducial covariance

• Residuals applied to reference mesh

• Affine warp texture

Wednesday, April 9, 14

Page 27: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

CNN Architecture

Wednesday, April 9, 14

Page 28: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

CNN Architecture

Features

Wednesday, April 9, 14

Page 29: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

CNN Architecture

weight sharing

no weight sharing

Wednesday, April 9, 14

Page 30: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Training

softmax cross-entropy loss -log pk

Wednesday, April 9, 14

Page 31: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Sparsity

• ReLU nonlinearly - rectified linear unit max(0, x)

• 75% model parameters = 0

• Dropout - first fully connected layer

Wednesday, April 9, 14

Page 32: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Normalization

• ReLU - unbounded

• Normalize features to [0, 1] based on holdout

Wednesday, April 9, 14

Page 33: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Verification Metrics

• Unsupervised - dot product

• χ2 similarity

• Siamese network

Wednesday, April 9, 14

Page 34: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Χ2 Similarity

• Χ2(f1,f2) = Σiwi(f1[i] - f2[i])2/(f1[i] + f2[i])

• weights learned via svm

Wednesday, April 9, 14

Page 35: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Siamese Network

-

FC 4

096-

to-1

Wednesday, April 9, 14

Page 36: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Datasets

• Social Face Classification (SFC)

• Presumably Facebook photos

• 4.4 mil faces; 4,030 people

• No overlap with other datasets

Wednesday, April 9, 14

Page 37: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Datasets

• Labeled Faces in the Wild (LFW)

• 13,323 faces; 5,749 celebs

• 6,000 pairs

• Restricted protocol - same/not same labels at training

• Unrestricted protocol - identities during training

• Unsupervised - no training on LFW

Wednesday, April 9, 14

Page 38: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Datasets

• YouTube Faces (YTF)

• 3,425 videos of 1,595 subjects

• Subset of celebs from LFW

Wednesday, April 9, 14

Page 39: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

SFC Training Perf

Reduce data by omitting people

Reduce data by omitting examples

Remove layers from network

Wednesday, April 9, 14

Page 40: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

LFW Perf

Wednesday, April 9, 14

Page 41: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Runtime

• 0.18 s - feature extraction (1 core; 2.2 GHz)

• 0.05 s - alignment

• 0.33 s - total

Wednesday, April 9, 14

Page 42: Deep Learning for Vision - Cornell University · Deep Learning for Vision ... CVPR 2014 DeepFace: Closing the Gap to Human-Level ... DeepPose: Human Pose Estimation via Deep Neural

Questions?

Wednesday, April 9, 14