Top Banner
Optimization Methods in Machine Learning Lectures 13-14 Katya Scheinberg Lehigh University [email protected]
33

Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Oct 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Optimization Methods in Machine Learning

Lectures 13-14

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAA

Katya Scheinberg Lehigh University

[email protected]

Page 2: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

First Order Methods

Page 3: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Consider:

•  Linear lower approximation

•  Quadratic upper approximation

First-order proximal gradient methods

Page 4: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then

First-order proximal gradient method

Page 5: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 6: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Complexity bound derivation outline

Page 7: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then in O(L||x0-x*||/²) iterations finds solution

Complexity of proximal gradient method

Compare to O(log(L/²)) of interior point methods.

Can we do better?

Page 8: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then

Accelerated first-order method Nesterov, ’83, ‘00s,

Beck&Teboulle ‘09

Page 9: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then in iterations finds solution

Complexity of accelerated first-order method Nesterov, ’83, ‘00s,

Beck&Teboulle ‘09

This method is optimal if only gradient information is used.

Page 10: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete
Page 11: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 12: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an intermediate point.

•  If µ· 1/L then in iterations finds solution

FISTA method Beck&Teboulle ‘09

Page 13: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

µ is not a prox parameter here

Page 14: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 15: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html

Page 16: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Unconstrained formulation of the SVM problem

Page 17: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

SVM problem using Huber loss function

Page 18: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

First order methods for composite functions

Page 19: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Lasso or CS:

•  Group Lasso or MMV

•  Matrix Completion

•  Robust PCA

•  SICS

Examples

Page 20: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Consider:

•  Quadratic upper approximation

Prox method with nonsmooth term

Assume that g(y) is such that the above function is easy to optimize over y

Page 21: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation function Qf,µ(x,y) on each iteration

Example 1 (Lasso and SICS)

Closed form solution!

O(n) effort

Page 22: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete
Page 23: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Very similar to the previous case, but with ||.|| instead of |.|

Example 2 (Group Lasso)

Closed form solution!

O(n) effort

Page 24: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Example 3 (Collaborative Prediction)

Closed form solution!

O(n^3) effort

Page 25: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper approximation on each iteration

•  If µ· 1/L then in O(L/²) iterations finds solution

ISTA/Gradient prox method

Page 26: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize upper approximation at an “accelerated” point.

•  If µ· 1/L then in iterations finds solution

Fast first-order method Nesterov, Beck & Teboulle

Page 27: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Practical first order algorithms using backtracking search

Page 28: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper relaxation on each iteration

•  Using line search find µk such that

•  In O(1/µmin²) iterations finds ²-optimal solution (in practice better)

Nesterov, 07 Beck&Teboulle, Tseng, Auslender&Teboulle, 08

Iterative Shrinkage Threshholding Algorithm (ISTA)

Page 29: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  Minimize quadratic upper relaxation on each iteration

•  Using line search find µk · µk-1 such that

•  In iterations finds ²-optimal solution

Fast Iterative Shrinkage Threshholding Algorithm (FISTA)

Nesterov, Beck&Teboulle, Tseng

Very restrictive

Page 30: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

•  ISTA’s complexity is O(L/²) while FISTA’s is

•  However, FISTA’s condition µk · µk-1 often slows down practical performance and simply ignoring the condition does not help.

•  We want to modify FISTA algorithm to relax µk · µk-1, while maintaining complexity bound or maybe even improving it

FISTA with line search Goldfarb and S. 2010

Page 31: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Find µk · µk-1 such that

Cycle to find µk

Convergence rate:

Page 32: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Find µk such that

Goldfarb & S. 2011

This condition….

… gives this bound on the error

Page 33: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete

Find µk such that

Goldfarb & S. 2011

FISTA with full line search

Cycle to find µ and t