Top Banner
CS 4803 / 7643: Deep Learning Zsolt Kira Georgia Tech Topics: Image Classification Supervised Learning view – K-NN Linear Classifier
100

L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

May 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

CS 4803 / 7643: Deep Learning

Zsolt Kira

Georgia Tech

Topics: – Image Classification

– Supervised Learning view

– K-NN

– Linear Classifier

Page 2: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Last Time• High-level intro to what deep learning is

• Fast brief of logistics– Requirements: ML, math (linear algebra, calculus), programming (python)

– Grades: 80% PS/HW, 20% Project, Piazza Bonus

– Project: Topic of your choosing (related to DL), groups of 3-4 with separated undergrad/grad

– 7 free late days

– 1 week re-grading period

– No Cheating

• PS0 out, due Tuesday 01/14 11:55pm– Graded pass/fail

– Intended to do on your own

– Don’t worry if rusty! It’s OK to need a refresher on various subjects to do it. Some of it (e.g. last question) is more suitable for graduate students.

– If not registered, email staff for gradescope account

• Look through slides on website for all details(C) Dhruv Batra & Zsolt Kira 2

Page 3: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Current TAs

(C) Dhruv Batra & Zsolt Kira 3

• New TAs: Zhuoran Yu. Manas Sahni, (in process) Harish Kamath• Official office hours coming soon (TA and instructor)• For this & next week:

• 11:30am-12:30pm Friday 01/09 (Zhuoan Yu)• 11:30-12:30am on Monday (Patrick)• 11:30 AM to 12:30 PM on Tuesdays. (Sameer)• 4-5pm Tuesday (Jiachen)• 1:30-2:30 pm on Wed. (Anishi)• 11:30 AM to 12:30 PM on Thursdays. (Rahul)

Rahul Duggal

2nd year CS PhD student

http://www.rahulduggal.com/

Patrick Grady

2nd year Robotics PhD student

https://www.linkedin.com/in/patrick-grady

Sameer Dharur

MS-CS student

https://www.linkedin.com/in/sameerdharur/

Jiachen Yang

2nd year ML PhD

https://www.cc.gatech.edu/~jyang462/

Yinquan Lu

2nd year MSCSE student

https://www.cc.gatech.edu/~jyang462/

Anishi Mehta

MSCS student

https://www.linkedin.com/in/anishimehta

Page 4: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Registration/Access

• Waitlist– Still a large waitlist for grad, still adding some capacity

• Canvas – Anybody not have access?

• Piazza– 110+ people signed up. Please use that for questions.

(C) Dhruv Batra & Zsolt Kira 4

Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

Piazza: https://piazza.com/gatech/spring2020/cs4803dl7643a/Staff mailing list (personal questions): [email protected]

Gradescope: https://www.gradescope.com/courses/78537Canvas: https://gatech.instructure.com/courses/94450/

Course Access Code (Piazza): MWXKY8

Page 5: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

http://cs231n.github.io/python-numpy-tutorial/

Prep for HW1: Python+Numpy Tutorial

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 6: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Plan for Today

• Reminder: – What changed to enable DL

• Some Problems with DL

• Image Classification

• Supervised Learning view

• K-NN

• (Beginning of) Linear Classifiers

(C) Dhruv Batra & Zsolt Kira 6

Page 7: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Reminder: What Deep Learning Is

(C) Dhruv Batra & Zsolt Kira 7

• We will learn a complex non-linear hierarchical (compositional) function in an end-to-end manner

• (Hierarchical) Compositionality– Cascade of non-linear

transformations

– Multiple layers of representations

• End-to-End Learning– Learning (goal-driven)

representations

– Learning to feature extraction

• Distributed Representations– No single neuron “encodes”

everything

– Groups of neurons work together

Page 8: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

What Changed?

• Few people saw this combination coming: gigantic growth in data and processing to enable depth and feature learning– Combined with specialized hardware (gpus) and open-

source/distribution (arXiv, github)

• If the input features are poor, so will your result be– If your model is poor, so will your result be

• If your optimizer is poor, so will your result be

• Now we have methods for feature learning that works (after some finesse)– Still have to guard against overfitting (very complex functions!)

– Still tune hyper-parameters

– Still design neural network architectures

– Lots of research to automate this too, e.g. via reinforcement learning!

Page 9: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Problems with Deep Learning

• Problem#1: Non-Convex! Non-Convex! Non-Convex!– Depth>=3: most losses non-convex in parameters

– Theoretically, all bets are off

– Leads to stochasticity• different initializations different local minima

• Standard response #1– “Yes, but all interesting learning problems are non-convex”

– For example, human learning• Order matters wave hands non-convexity

• Standard response #2– “Yes, but it often works!”

(C) Dhruv Batra & Zsolt Kira 9

Page 10: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Problems with Deep Learning

• Problem#2: Lack of interpretability– Hard to track down what’s failing

– Pipeline systems have “oracle” performances at each step

– In end-to-end systems, it’s hard to know why things are not working

(C) Dhruv Batra & Zsolt Kira 10

Page 11: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Problems with Deep Learning

• Problem#2: Lack of interpretability

(C) Dhruv Batra & Zsolt Kira 11End-to-EndPipeline

[Fang et al. CVPR15] [Vinyals et al. CVPR15]

Page 12: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Problems with Deep Learning

• Problem#2: Lack of interpretability– Hard to track down what’s failing

– Pipeline systems have “oracle” performances at each step

– In end-to-end systems, it’s hard to know why things are not working

• Standard response #1– Tricks of the trade: visualize features, add losses at different

layers, pre-train to avoid degenerate initializations…

– “We’re working on it”

• Standard response #2– “Yes, but it often works!”

(C) Dhruv Batra & Zsolt Kira 12

Page 13: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Problems with Deep Learning

• Problem#3: Lack of easy reproducibility– Direct consequence of stochasticity & non-convexity

• Standard response #1– It’s getting much better

– Standard toolkits/libraries/frameworks now available

– Caffe, Theano, (Py)Torch

• Standard response #2– “Yes, but it often works!”

(C) Dhruv Batra & Zsolt Kira 13

Page 14: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Yes it works, but how?

(C) Dhruv Batra & Zsolt Kira 14

Page 15: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Image Classification

Page 16: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Image Classification: A core task in Computer Vision

cat

(assume given set of discrete labels){dog, cat, truck, plane, ...}

This image by Nikita is licensed under CC-BY 2.0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 17: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

This image by Nikita is licensed under CC-BY 2.0

The Problem: Semantic Gap

What the computer sees

An image is just a big grid of numbers between [0, 255]:

e.g. 800 x 600 x 3(3 channels RGB)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 18: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Challenges: Viewpoint variation

All pixels change when the camera moves!

This image by Nikita is licensed under CC-BY 2.0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 19: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Challenges: Illumination

This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 20: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Challenges: Deformation

This image by Umberto Salvagninis licensed under CC-BY 2.0

This image by Tom Thai is licensed under CC-BY 2.0

This image by sare bear is licensed under CC-BY 2.0

This image by Umberto Salvagninis licensed under CC-BY 2.0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 21: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Challenges: Occlusion

This image is CC0 1.0 public domainThis image by jonsson is licensed

under CC-BY 2.0This image is CC0 1.0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 22: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Challenges: Background Clutter

This image is CC0 1.0 public domain This image is CC0 1.0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 23: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

An image classifier

24

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for recognizing a cat, or other classes.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 24: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Attempts have been made

25

John Canny, “A Computational Approach to Edge Detection”, IEEE TPAMI 1986

Find edges Find corners

?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 25: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

ML: A Data-Driven Approach

26

1. Collect a dataset of images and labels2. Use Machine Learning to train a classifier3. Evaluate the classifier on new images

Example training set

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 26: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Supervised Learning• Input: x (images, text, emails…)

• Output: y (spam or non-spam…)

• (Unknown) Target Function– f: X Y (the “true” mapping / reality)

• Data – (x1,y1), (x2,y2), …, (xN,yN)

• Model / Hypothesis Class– h: X Y– y = h(x) = sign(wTx)

• Learning = Search in hypothesis space– Find best h in model class.

(C) Dhruv Batra & Zsolt Kira 27

Page 27: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Procedural View

• Training Stage:– Training Data { (x,y) } f (Learning)

• Testing Stage– Test Data x f(x) (Apply function, Evaluate error)

(C) Dhruv Batra & Zsolt Kira 28

Page 28: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Statistical Estimation View

• Probabilities to rescue:– X and Y are random variables

– D = (x1,y1), (x2,y2), …, (xN,yN) ~ P(X,Y)

• IID: Independent Identically Distributed– Both training & testing data sampled IID from P(X,Y)

– Learn on training set

– Have some hope of generalizing to test set

(C) Dhruv Batra & Zsolt Kira 29

Page 29: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 30

Reality

Page 30: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 31

Reality

3x3 conv, 384

Pool

5x5 conv, 256

11x11 conv, 96

Input

Pool

3x3 conv, 384

3x3 conv, 256

Pool

FC 4096

FC 4096

Softmax

FC 1000

AlexNet

Page 31: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 32

Reality

Input

Softmax

FC HxWx3

Multi-class Logistic Regression

Page 32: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

(C) Dhruv Batra & Zsolt Kira 33

Reality

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 256

3x3 conv, 256

3x3 conv, 128

3x3 conv, 128

3x3 conv, 64

3x3 conv, 64

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

FC 4096

FC 1000

FC 4096

VGG19

Page 33: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

• Approximation/Modeling Error– You approximated reality with model

• Estimation Error– You tried to learn model with finite data

• Optimization Error– You were lazy and couldn’t/didn’t optimize to completion

• Bayes Error– Reality just sucks

(C) Dhruv Batra & Zsolt Kira 34

Page 34: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Guarantees

• 20 years of research in Learning Theory oversimplified:

• If you have:– Enough training data D

– and H is not too complex

– then probably we can generalize to unseen test data

• Note: Several ways to measure complexity– Vapnik–Chervonenkis dimension

– Rademacher complexity

(C) Dhruv Batra & Zsolt Kira 35

Page 35: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Learning is hard!

(C) Dhruv Batra & Zsolt Kira 36

Page 36: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Learning is hard!

• No assumptions = No learning

(C) Dhruv Batra & Zsolt Kira 37

Page 37: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

First classifier: Nearest Neighbor

38

Memorize all data and labels

Predict the label of the most similar training image

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 38: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Example Dataset: CIFAR10

39

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

10 classes50,000 training images10,000 testing images

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 39: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Nearest Neighbours

40

Page 40: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Nearest Neighbours

Page 41: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Instance/Memory-based Learning

Four things make a memory based learner:

• A distance metric

• How many nearby neighbors to look at?

• A weighting function (optional)

• How to fit with the local points?

(C) Dhruv Batra & Zsolt Kira 42Slide Credit: Carlos Guestrin

Page 42: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

1-Nearest Neighbour

Four things make a memory based learner:

• A distance metric– Euclidean (and others)

• How many nearby neighbors to look at?– 1

• A weighting function (optional)– unused

• How to fit with the local points?– Just predict the same output as the nearest neighbour.

(C) Dhruv Batra & Zsolt Kira 43Slide Credit: Carlos Guestrin

Page 43: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

k-Nearest Neighbour

Four things make a memory based learner:

• A distance metric– Euclidean (and others)

• How many nearby neighbors to look at?– k

• A weighting function (optional)– unused

• How to fit with the local points?– Just predict the average output among the nearest

neighbours.

(C) Dhruv Batra & Zsolt Kira 44Slide Credit: Carlos Guestrin

Page 44: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

1-NN for Regression

(C) Dhruv Batra & Zsolt Kira 45

x

y

Here, this is the closest datapoint

Figure Credit: Carlos Guestrin

Page 45: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Distance Metric to compare images

46

L1 distance:

add

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 46: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

47

Nearest Neighbor classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 47: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

48

Nearest Neighbor classifier

Memorize training data

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 48: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

49

Nearest Neighbor classifier

For each test image:Find closest train imagePredict label of nearest image

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 49: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

50

Nearest Neighbor classifier

Q: With N examples, how fast are training and prediction?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 50: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

51

Nearest Neighbor classifier

Q: With N examples, how fast are training and prediction?

A: Train O(1),predict O(N)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 51: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

52

Nearest Neighbor classifier

Q: With N examples, how fast are training and prediction?

A: Train O(1),predict O(N)

This is bad: we want classifiers that are fastat prediction; slow for training is ok

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 52: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

What does this look like?

53

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 53: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Nearest Neighbour

• Demo 1– http://vision.stanford.edu/teaching/cs231n-demos/knn/

• Demo 2– http://www.cs.technion.ac.il/~rani/LocBoost/

(C) Dhruv Batra & Zsolt Kira 55

Page 54: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Parametric vs Non-Parametric Models

• Does the capacity (size of hypothesis class) grow with size of training data?– Yes = Non-Parametric Models

– No = Parametric Models

(C) Dhruv Batra & Zsolt Kira 56

Page 55: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

K-Nearest Neighbors: Distance Metric

57

L1 (Manhattan) distance L2 (Euclidean) distance

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 56: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

K-Nearest Neighbors: Distance Metric

58

L1 (Manhattan) distance L2 (Euclidean) distance

K = 1 K = 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 57: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

K-Nearest Neighbors: Distance Metric

L1 norm (absolute)

Linfinity (max) norm

Scaled Euclidian (L2)

MahalanobisSlide by Andrew W. Moore

Page 58: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

More Distance Metrics

Slide by Andrew W. Moore

Other Metrics…• Mahalanobis, Rank-based, Correlation-based

(Stanfill+Waltz, Maes’ Ringo system…)

where

Or equivalently,

Page 59: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

61

What is the best value of k to use?What is the best distance to use?

These are hyperparameters: choices about the algorithm that we set rather than learn

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 60: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

62

What is the best value of k to use?What is the best distance to use?

These are hyperparameters: choices about the algorithm that we set rather than learn

Very problem-dependent. Must try them all out and see what works best.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 61: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

63

Idea #1: Choose hyperparameters that work best on the data

Your Dataset

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 62: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

64

Idea #1: Choose hyperparameters that work best on the data

BAD: K = 1 always works perfectly on training data

Your Dataset

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 63: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

65

Idea #1: Choose hyperparameters that work best on the data

BAD: K = 1 always works perfectly on training data

Idea #2: Split data into train and test, choose hyperparameters that work best on test data

Your Dataset

train test

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 64: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

66

Idea #1: Choose hyperparameters that work best on the data

BAD: K = 1 always works perfectly on training data

Idea #2: Split data into train and test, choose hyperparameters that work best on test data

BAD: No idea how algorithm will perform on new data

Your Dataset

train test

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 65: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

67

Idea #1: Choose hyperparameters that work best on the data

BAD: K = 1 always works perfectly on training data

Idea #2: Split data into train and test, choose hyperparameters that work best on test data

BAD: No idea how algorithm will perform on new data

Your Dataset

train test

Idea #3: Split data into train, val, and test; choose hyperparameters on val and evaluate on test

Better!

train testvalidation

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 66: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hyperparameters

68

Your Dataset

testfold 1 fold 2 fold 3 fold 4 fold 5

Idea #4: Cross-Validation: Split data into folds, try each fold as validation and average the results

testfold 1 fold 2 fold 3 fold 4 fold 5

testfold 1 fold 2 fold 3 fold 4 fold 5

Useful for small datasets, but not used too frequently in deep learning

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 67: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Setting Hyperparameters

69

Example of5-fold cross-validationfor the value of k.

Each point: singleoutcome.

The line goesthrough the mean, barsindicated standarddeviation

(Seems that k ~= 7 works bestfor this data)

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 68: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Scene Completion [Hayes & Efros, SIGGRAPH07]

(C) Dhruv Batra & Zsolt Kira 70

Page 69: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hays and Efros, SIGGRAPH 2007

Page 70: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

… 200 totalHays and Efros, SIGGRAPH 2007

Page 71: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Context Matching

Hays and Efros, SIGGRAPH 2007

Page 72: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Graph cut + Poisson blending Hays and Efros, SIGGRAPH 2007

Page 73: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hays and Efros, SIGGRAPH 2007

Page 74: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hays and Efros, SIGGRAPH 2007

Page 75: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hays and Efros, SIGGRAPH 2007

Page 76: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Problems with Instance-Based Learning

• Expensive– No Learning: most real work done during testing

– For every test sample, must search through all dataset –very slow!

– Must use tricks like approximate nearest neighbour search

• Doesn’t work well when large number of irrelevant features– Distances overwhelmed by noisy features

• Curse of Dimensionality– Distances become meaningless in high dimensions

(C) Dhruv Batra & Zsolt Kira 78

Page 77: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

79

k-Nearest Neighbor on images never used.

- Very slow at test time- Distance metrics on pixels are not informative

(all 3 images have same L2 distance to the one on the left)

Original Boxed Shifted Tinted

Original image is CC0 public domain

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 78: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

80

k-Nearest Neighbor on images never used.

- Curse of dimensionality

Dimensions = 1Points = 4

Dimensions = 3Points = 43

Dimensions = 2Points = 42

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 79: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Curse of Dimensionality

(C) Dhruv Batra and Zsolt Kira 81Figure Credit: Kevin Murphy

Page 80: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

K-Nearest Neighbors: Summary

In Image classification we start with a training set of images and labels, and must predict labels on the test set

The K-Nearest Neighbors classifier predicts labels based on nearest training examples

Distance metric and K are hyperparameters

Choose hyperparameters using the validation set; only run on the test set once at the very end!

82

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 81: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

This image is CC0 1.0 public domain

Neural Network

Linear classifiers

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 82: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Visual Question Answering

(C) Dhruv Batra 84

Convolution Layer+ Non-Linearity

Pooling Layer Convolution Layer+ Non-Linearity

Pooling Layer Fully-Connected MLP

4096-dim

Embedding (VGGNet)

Embedding (LSTM)

Image

Question“How many horses are in this image?”

Neural Network Softmax

over top K answers

Page 83: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

50,000 training imageseach image is 32x32x3

10,000 test images.

Recall CIFAR10

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 84: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Image

f(x,W) 10 numbers giving class scores

Array of 32x32x3 numbers(3072 numbers total)

parametersor weights

W

Parametric Approach

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 85: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Image

parametersor weights

W

f(x,W) 10 numbers giving class scores

Array of 32x32x3 numbers(3072 numbers total)

f(x,W) = Wx

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 86: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Image

parametersor weights

W

f(x,W) 10 numbers giving class scores

Array of 32x32x3 numbers(3072 numbers total)

f(x,W) = Wx10x1 10x3072

3072x1

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 87: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Image

parametersor weights

W

f(x,W) 10 numbers giving class scores

Array of 32x32x3 numbers(3072 numbers total)

f(x,W) = Wx + b3072x1

10x1 10x3072

10x1

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 88: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

(C) Dhruv Batra 90

Reality

3x3 conv, 384

Pool

5x5 conv, 256

11x11 conv, 96

Input

Pool

3x3 conv, 384

3x3 conv, 256

Pool

FC 4096

FC 4096

Softmax

FC 1000

AlexNet

Page 89: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Error Decomposition

(C) Dhruv Batra 91

Reality

Input

Softmax

FC HxWx3

Multi-class Logistic Regression

Page 90: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

92

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Input image

56

231

24

2

56 231

24 2

Stretch pixels into column

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 91: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

93

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

0.2 -0.5 0.1 2.0

1.5 1.3 2.1 0.0

0 0.25 0.2 -0.3

WInput image

56

231

24

2

56 231

24 2

Stretch pixels into column

1.1

3.2

-1.2

+

-96.8

437.9

61.95

=

Cat score

Dog score

Ship score

b

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 92: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

94

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

f(x,W) = Wx

Algebraic Viewpoint

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 93: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

95

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)Input image

0.2 -0.5

0.1 2.0

1.5 1.3

2.1 0.0

0 .25

0.2 -0.3

1.1 3.2 -1.2

W

b

f(x,W) = Wx

Algebraic Viewpoint

-96.8Score 437.9 61.95

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 94: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Interpreting a Linear Classifier

96

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 95: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Interpreting a Linear Classifier: Visual Viewpoint

97

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 96: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Interpreting a Linear Classifier: Geometric Viewpoint

98

f(x,W) = Wx + b

Array of 32x32x3 numbers(3072 numbers total)

Cat image by Nikita is licensed under CC-BY 2.0

Plot created using Wolfram Cloud

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 97: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Hard cases for a linear classifier

99

Class 1: First and third quadrants

Class 2: Second and fourth quadrants

Class 1: 1 <= L2 norm <= 2

Class 2:Everything else

Class 1: Three modes

Class 2:Everything else

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 98: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Linear Classifier: Three Viewpoints

100

f(x,W) = Wx

Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint

One template per class

Hyperplanes cutting up space

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 99: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

So far: Defined a (linear) score functionf(x,W) = Wx + b

Example class scores for 3 images for some W:

How can we tell whether this W is good or bad?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 100: L2 knn linear - cc.gatech.edu · ± . 11 ± /lqhdu &odvvlilhu /dvw 7lph +ljk ohyho lqwur wr zkdw ghhs ohduqlqj lv )dvw eulhi ri orjlvwlfv ... ± 6wloo wxqh k\shu sdudphwhuv ± 6wloo

1. Define a loss functionthat quantifies our unhappiness with the scores across the training data.

2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization)

TODO:

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

So far: Defined a (linear) score function

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n