Top Banner
Linear Models: Perceptron, Logistic Regression CMSC 470 Marine Carpuat Slides credit: Jacob Eisenstein
17

Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Sep 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Linear Models:Perceptron, Logistic Regression

CMSC 470

Marine Carpuat

Slides credit: Jacob Eisenstein

Page 2: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Linear Models for Multiclass Classification

Feature function

representation

Weights

Page 3: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Multiclass perceptron

Page 4: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Properties of Linear Models we’ve seen so far

Naïve Bayes

• Batch learning

• Generative model p(x,y)

• Grounded in probability

• Assumes features are independent given class

• Learning = find parameters that maximize likelihood of training data

Perceptron

• Online learning

• Discriminative model score(y|x)

• Guaranteed to converge if data is linearly separable

• But might overfit the training set

• Error-driven learning

Page 5: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Averaged Perceptron improves generalization

Page 6: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Differential Calculus Refresher

• Derivatives

• Chain rule

• Convex functions

• Gradients

Page 7: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Logistic Regressionfor Binary Classification

Examples & illustrations: Graham Neubig

Page 8: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Perceptron & Probabilities

• What if we want a probability p(y|x)?

• The perceptron gives us a prediction y• Let’s illustrate this with binary classification

Illustrations: Graham Neubig

Page 9: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

The logistic function

• “Softer” function than in perceptron

• Can account for uncertainty

• Differentiable

Page 10: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Logistic regression: how to train?

• Train based on conditional likelihood

• Find parameters w that maximize conditional likelihood of all answers 𝑦𝑖 given examples 𝑥𝑖

Page 11: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Stochastic gradient ascent (or descent)• Online training algorithm

• Update weights for every training example• Move in direction given by gradient• Size of update step scaled by learning rate

Page 12: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Gradient of the logistic function

Page 13: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Example: Person/not-person classification problem

Given an introductory sentence in Wikipedia

predict whether the article is about a person

Page 14: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Example: initial update

Page 15: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

Example: second update

Page 16: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

How to set the learning rate?

• Various strategies• decay over time

𝛼 =1

𝐶 + 𝑡

• Use held-out test set, increase learning rate when likelihood increases

ParameterNumber of

samples

Page 17: Linear Models: Perceptron, Logistic Regression · 2018. 12. 14. · Linear Models for Multiclass Classification Feature function representation Weights. Multiclass perceptron. Properties

What you should know about linear models

• Standard supervised learning set-up for text classification• Difference between train vs. test data

• How to evaluate

• 3 examples of linear classifiers• Naïve Bayes, Perceptron, Logistic Regression

• How to make predictions, how to train, strengths and weaknesses

• Learning as optimization: loss functions and their properties

• Difference between generative vs. discriminative classifiers

• General machine learning concepts• Smoothing, overfitting, underfitting, regularization