Transcript
Introduction to Optimization Theory
Lecture #7 - 10/6/20MS&E 213 / CS 2690
Aaron Sidfordsidford@stanford.edu
β
π
β
πβ
π₯β
π
1
1
00
Plan for Today
Recap β’ Accelerated Gradient Descent (AGD)
Proof β’ Approximately optimal AGD for smooth strongly convex functions.
β’ Non-strongly convexβ’ Optimal complexityβ’ Momentum
Extensions Thursday
Generalizations and applications
Recap
Regularity Oracle Goal Algorithm Iterations
π = 1, π π₯ β [0,1], π₯β β [0,1] value Β½-optimal anything βπ = 1, π₯β β [0,1], πΏ-Lipschitz value π-optimal π-net Ξ πΏ/π
π₯β β [0,1], πΏ-Lipschitz in β β β" value π-optimal π-net Ξ πΏ/π#
πΏ-smooth and bounded value, gradient π-optimal π-net exponential
πΏ-smooth gradient π-critical gradient descent ππΏ π π₯$ β πβ
π%
πΏ-smooth π-strongly convex gradient π-optimal gradient descent ππΏπlog
π π₯$ β πβπ
πΏ-smooth convex gradient π-optimal gradient descent ππΏ π₯$ β π₯β %
%
π
Problemmin!ββ&
π(π₯)
Today: prove and discuss improvements to πΆ π³π π₯π¨π
π ππ (πβπ and πΆ π³ ππ(πβ π
π
π
Recap
Theorem: π:β: β β is πΏ-smooth and π-strongly convex (with respect to β β β;) if and only if the following hold for all π₯, π¦
β’ π π¦ β€ πΌπ π β π(π₯) + βπ π₯ = π¦ β π₯ + >;π¦ β π₯ ;
;
β’ π π¦ β₯ π³π π β π(π₯) + βπ π₯ = π¦ β π₯ + ?;π¦ β π₯ ;
;
Goal #1: improve
π >?log @ A! B@β
Cto ~π >
?log @ A! B@β
C
β
π
x
Approach
Approachβ’ Maintain point (π₯- β β.)β’ Maintain lower boundβ’ πΏ-: β. β β s.t.β’ πΏ- π₯ β€ π(π₯) for all π₯ β β.
β’ Update both in each iteration
Progress Measureβ’ π π₯- β min
/ββ!πΏ-(π₯)
β’ β₯ π π₯- β πΏ- π₯β = π π₯- β πβ
β’ Are many to choose fromβ’ This one is intuitive and mechanical and will let us touch on ideas of other proofβ’ Will lose a logarithmic factor and will explain how to remove
Tools: Quadratic Lower Bounds
Lemma 1: πΏ* π₯ = π π¦ + βπ π¦ + π₯ β π¦ + ,-βπ₯ β π¦β-
- = π* +.- βπ₯ β π£*β-
-
for π* = π π¦ β /-.
βπ π¦ -- and π£* = π¦ β /
.βπ(π¦).
Lemma 2: If π/, π-: β0 β β are defined for all π₯ β β0 by π/ π₯ = π/ +
π2 π₯ β π£/ -
- and π- π₯ = π- +π2 π₯ β π£- -
-
Then for all πΌ β [0,1] we haveπ1 π₯ = πΌ β π/ π₯ + 1 β πΌ β π- π₯ = π1 +
π2π₯ β π£1 -
-
Whereβ’ π£1 = πΌ β π£/ + 1 β πΌ β π£-β’ π1 = πΌ β π/ + 1 β πΌ β π- +
.-πΌ 1 β πΌ π£/ β π£- -
-
Accelerated Gradient Descent (AGD)
β’ Initial π₯2 β β0, πΏ2 π₯ = π2 +.- π₯ β π£2 -
- s.t. π π₯ β₯ πΏ2(π₯) for all π₯β’ Repeat for π = 0,1,2, β¦
β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3 where πΌ β [0,1]β’ πΏ*) π₯ = π π¦3 + βπ π¦3 + π₯ β π¦3 + .
-π₯ β π¦3 -
-
β’ πΏ34/ π₯ = π34/ +.-π₯ β π£34/ -
- = π½πΏ3 π₯ + 1 β π½ πΏ*)*+(π₯) where π½ β [0,1]
β’ π₯34/ = π¦3 β/,βπ(π¦3)
Theorem: πΏ3 π₯ β₯ π(π₯) for all π β₯ 0 and π₯ β β0. If π = ,. , πΌ = 5
54/, and π½ = 1 β π (//-, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
Proof?
Plan for Today
Recap β’ Accelerated Gradient Descent (AGD)
Proof β’ Approximately optimal AGD for smooth strongly convex functions.
β’ Non-strongly convexβ’ Optimal complexityβ’ Momentum
Extensions Thursday
ΓΌ
Generalizations and applications
Accelerated Gradient Descent (AGD)
β’ Initial π₯2 β β0, πΏ2 π₯ = π2 +.- π₯ β π£2 -
- s.t. π π₯ β₯ πΏ2(π₯) for all π₯β’ Repeat for π = 0,1,2, β¦
β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3 where πΌ β [0,1]β’ πΏ*) π₯ = π π¦3 + βπ π¦3 + π₯ β π¦3 + .
-π₯ β π¦3 -
-
β’ πΏ34/ π₯ = π34/ +.-π₯ β π£34/ -
- = π½πΏ3 π₯ + 1 β π½ πΏ*)*+(π₯) where π½ β [0,1]
β’ π₯34/ = π¦3 β/,βπ(π¦3)
Theorem: πΏ3 π₯ β₯ π(π₯) for all π β₯ 0 and π₯ β β0. If π = ,. , πΌ = 5
54/, and π½ = 1 β π (//-, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
π£,-. = π½π£, + 1 β π½ π¦, β1πβπ(π¦,)
Analysis?
Some Intuition
β’ Initial π₯K β β:, πΏK π₯ = πK +?;π₯ β π£K ;
; s.t. π π₯ β₯ πΏK(π₯) for all π₯β’ Repeat for π = 0,1,2, β¦
β’ π¦L = πΌ β π₯L + 1 β πΌ β π£L where πΌ = MMNO
and π = >?
β’ π£LNO = π½π£L + 1 β π½ π¦L βO?βπ(π¦L) where π½ = 1 β O
M
β’ π₯LNO = π¦L βO>βπ(π¦L)
Noteβ’ β π β β πΌ (i.e. the more use gradient point)β’ β π β β π½ (i.e. the less use lower bound)β’ β π β β (1 β π½)/π (i.e. the bigger the βgradient stepβ for π£LNO)
Analysis?
Proof Plan
Theorem: πΏ3 π₯ β₯ π π₯ for all π β₯ 0 and π₯ β β0 and if π = ,. , πΌ = 5
54/, and π½ = 1 β /β5, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
Plan (since πΏ3 π₯ β₯ π(π₯) fact is immediate)β’ Upper bound π(π₯34/) (gradient descent step)β’ Lower bound π3 (lower bound combination analysis)β’ Leverage choice of π¦3 (algebra)β’ Pick πΌ and π½ so everything cancels (more algebra)
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Upper bound
β’ π π₯-OP β€ ? ? ?β’ Gradient descent!
β’ π π₯-OP β€ π π¦- β PQR
βπ π¦- QQ
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Proof Plan
Theorem: πΏ3 π₯ β₯ π π₯ for all π β₯ 0 and π₯ β β0 and if π = ,. , πΌ = 5
54/, and π½ = 1 β /β5, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
Plan (since πΏ3 π₯ β₯ π(π₯) fact is immediate)
β’ Upper bound: π π₯34/ β€ π π¦3 β /-, βπ π¦3 -
-
β’ Lower bound π3 (lower bound combination analysis)β’ Leverage choice of π¦3 (algebra)β’ Pick πΌ and π½ so everything cancels (more algebra)
ΓΌ
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Lower Bound
β’ Apply Tool #1β’ πΏR# π₯ = πR# +
?;βπ₯ β π£R#β;
;
β’ πR# = π π¦L β O;?
βπ π¦L ;; and π£R# = π¦L β
O?βπ(π¦L).
β’ Apply Tool #2β’ πLNO = π½πL + 1 β π½ πR# +
?;π½ 1 β π½ π£L β π£R# ;
;
β’ Algebraβ’ π£L β π£R# ;
;= π£L β π¦L ;
; + ;?βπ π¦L = π£L β π¦L + O
?$βπ π¦L ;
;
β’ More algebraβ’ πLNO β₯ π½πL + 1 β π½ π π¦L β OBS
;?βπ π¦L ;
; + π½βπ π¦L = π£L β π¦L
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Proof Plan
Theorem: πΏ3 π₯ β₯ π π₯ for all π β₯ 0 and π₯ β β0 and if π = ,. , πΌ = 5
54/, and π½ = 1 β /β5, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
Plan (since πΏ3 π₯ β₯ π(π₯) fact is immediate)β’ Upper bound: π π₯34/ β€ π π¦3 β /
-, βπ π¦3 --
β’ Lower bound: π34/ β₯ π½π3 + 1 β π½ π π¦3 β /(8-.
βπ π¦3 -- + π½βπ π¦3 + π£3 β π¦3
β’ Leverage choice of π¦3 (algebra)β’ Pick πΌ and π½ so everything cancels (more algebra)
ΓΌΓΌ
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Choice of ππβ’ Goal
β’ Lower bound βπ π¦" # π£" βπ¦"β’ Note
β’ 1 β πΌ π£" βπ¦" +πΌ π₯" βπ¦" = 0β’ π£" βπ¦" =
$%&$
(π¦" β π₯")β’ (note there is an πΌ β [0,1] s.t. $
%&$= πΎ for all πΎ > 0)
β’ Convexityβ’ π π₯" β₯ π π¦" +βπ π¦" #(π₯" βπ¦")β’ (note, this is the first time we have used convexity between two points where one of
the points is not π₯β)β’ Algebra
β’ βπ π¦" # π£" βπ¦" β₯ $%&$ π π¦" βπ(π₯")
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Proof Plan
Theorem: πΏ3 π₯ β₯ π π₯ for all π β₯ 0 and π₯ β β0 and if π = ,. , πΌ = 5
54/, and π½ = 1 β /β5, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
Plan (since πΏ3 π₯ β₯ π(π₯) fact is immediate)β’ Upper bound: π π₯34/ β€ π π¦3 β /
-, βπ π¦3 --
β’ Lower bound: π34/ β₯ π½π3 + 1 β π½ π π¦3 β /(8-.
βπ π¦3 -- + π½βπ π¦3 + π£3 β π¦3
β’ Choice of π¦3: βπ π¦3 + π£3 β π¦3 β₯ 1/(1
π π¦3 β π(π₯3)β’ Pick πΌ and π½ so everything cancels (more algebra)
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
ΓΌΓΌΓΌ
AlgebraSo Far (since πΏ3 π₯ β₯ π(π₯) fact is immediate)β’ Upper bound: π π₯34/ β€ π π¦3 β /
-,βπ π¦3 -
-
β’ Lower bound: π34/ β₯ π½π3 + 1 β π½ π π¦3 β /(8-. βπ π¦3 -
- + π½βπ π¦3 + π£3 β π¦3β’ Choice of π¦3: βπ π¦3 + π£3 β π¦3 β₯ 1
/(1 π π¦3 β π(π₯3)
Rearranging
β’ π π₯34/ β π34/ β€ π π¦3 β /-, βπ π¦3 -
-
βπ½π3 β 1 β π½ π π¦3 β /(8-.
βπ π¦3 --
βπ½(1 β π½) 1/(1 π π¦3 β π(π₯3)
= π½πΌ(1 β π½)1 β πΌ π π₯3 β π3 + π½ 1 β
πΌ 1 β π½1 β πΌ π π¦3 +
12
1 β π½ -
π β1πΏ βπ π¦3 -
-
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
Cancellations
Choice of π·
β’ %&( $
)β %*= 0
β’ β 1βπ½ + = π &%
β’ βπ½ = 1β π &%/+
Choice of πΆ
β’ T OBSOBT
= 1β TOBT
= OOBS
= π
β’ βπΌ = --.%
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
π = πΏ/π, πΌ = 33-.
, π½ = 1 β π 4./%
Pick π and π so extra Terms Cancel
π π₯34/ β π34/ β€ π½πΌ 1 β π½1 β πΌ π π₯3 β π3 + π½ 1 β
πΌ 1 β π½1 β πΌ π π¦3 +
12
1 β π½ -
π β1πΏ βπ π¦3 -
-
Proof Plan
Theorem: πΏ3 π₯ β₯ π π₯ for all π β₯ 0 and π₯ β β0 and if π = ,. , πΌ = 5
54/, and π½ = 1 β /β5, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
Plan (since πΏ3 π₯ β₯ π(π₯) fact is immediate)β’ Upper bound: π π₯34/ β€ π π¦3 β /
-, βπ π¦3 --
β’ Lower bound: π34/ β₯ π½π3 + 1 β π½ π π¦3 β /(8-.
βπ π¦3 -- + π½βπ π¦3 + π£3 β π¦3
β’ Choice of π¦3: βπ π¦3 + π£3 β π¦3 β₯ 1/(1
π π¦3 β π(π₯3)β’ Pick πΌ and π½ so everything cancels (more algebra)
β’ π¦, = πΌ β π₯, + 1 β πΌ β π£,β’ πΏ/! π₯ = π π¦, = βπ π¦, 0 π₯ β π¦, + 1
%π₯ β π¦, %
%
β’ πΏ,-. π₯ = π,-. +1%π₯ β π£,-. %
% = π½πΏ, π₯ + 1 β π½ πΏ/!"# π₯
β’ π₯,-. = π¦, β.2βπ(π¦,)
ΓΌΓΌΓΌΓΌ
Accelerated Gradient Descent (AGD)
β’ Initial π₯2 β β0, πΏ2 π₯ = π2 +.- π₯ β π£2 -
- s.t. π π₯ β₯ πΏ2(π₯) for all π₯β’ Repeat for π = 0,1,2, β¦
β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3 where πΌ β [0,1]β’ πΏ*) π₯ = π π¦3 + βπ π¦3 + π₯ β π¦3 + .
-π₯ β π¦3 -
-
β’ πΏ34/ π₯ = π34/ +.-π₯ β π£34/ -
- = π½πΏ3 π₯ + 1 β π½ πΏ*)*+(π₯) where π½ β [0,1]
β’ π₯34/ = π¦3 β/,βπ(π¦3)
Theorem: πΏ3 π₯ β₯ π(π₯) for all π β₯ 0 and π₯ β β0. If π = ,. , πΌ = 5
54/, and π½ = 1 β π (//-, then
π π₯34/ β π34/ β€ 1 β1π
π π₯3 β π3
and ~ π iterations suffices
π£,-. = π½π£, + 1 β π½ π¦, β1πβπ(π¦,)
How obtain πΏ<?
Initial Lower Bound?
β’ Goal: πΏS π₯ = πS +TQπ₯ β π£S Q
Q s.t. π π₯ β₯ πΏS(π₯)
β’ Idea: πΏ// π₯ + π π₯S = βπ π₯S U π₯ β π₯S + TQπ₯ β π₯S Q
Q
β’ πΏ// = πS +TQπ₯ β π£S Q
Q
β’ πS = π π₯S β PQT
βπ π₯S QQ
β’ π£S = π₯S βPTβπ(π₯S)
β’ One gradient evaluation!
A Proof!!
β’ For initial π₯2 β β0 compute π£2 = π₯2 β/.βπ(π₯2)
β’ Repeat for π = 0,1,2, β¦β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3 where πΌ = 5
54/ and π = ,.
β’ π£34/ = π½π£3 + 1 β π½ π¦3 β/.βπ(π¦3) where π½ = 1 β /
5
β’ π₯34/ = π¦3 β/,βπ(π¦3)
β’ Theorem: π π₯34/ β π34/ β€ 1 β /5 π π₯3 β π3 for all π β₯ 0 where each π3 β₯ π(π₯β) and
π2 = π π₯2 β /-. βπ π₯2 -
-
β’ Corollary: Can compute π-optimal point in π( π log π π π₯2 β πβ /π ) queries !!!β’ Proof: βπ π₯2 -
- β€ 2πΏ[π π₯2 β πβ] and π π₯3 β πβ β€ 1 β π (//- 3 β 2π π π₯2 β πβ
Plan for Today
Recap β’ Accelerated Gradient Descent (AGD)
Proof β’ Approximately optimal AGD for smooth strongly convex functions.
β’ Non-strongly convexβ’ Optimal complexityβ’ Momentum
Extensions Thursday
ΓΌ
ΓΌ
Generalizations and applications
A Proof!!
β’ For initial π₯2 β β0 compute π£2 = π₯2 β/.βπ(π₯2)
β’ Repeat for π = 0,1,2, β¦β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3 where πΌ = 5
54/ and π = ,.
β’ π£34/ = π½π£3 + 1 β π½ π¦3 β/.βπ(π¦3) where π½ = 1 β /
5
β’ π₯34/ = π¦3 β/,βπ(π¦3)
β’ Theorem: π π₯34/ β π34/ β€ 1 β /5 π π₯3 β π3 for all π β₯ 0 where each π3 β₯ π(π₯β) and
π2 = π π₯2 β /-. βπ π₯2 -
-
β’ Corollary: Can compute π-optimal point in π( π log π π π₯2 β πβ /π ) queries !!!β’ Proof: βπ π₯2 -
- β€ 2πΏ[π π₯2 β πβ] and π π₯3 β πβ β€ 1 β π (//- 3 β 2π π π₯2 β πβ
How to improve?
Improved Potential Functionβ’ For initial π₯2 β β0 let π£2 = π₯2β’ Repeat for π = 0,1,2, β¦
β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3 where πΌ = 554/ and π = ,
.β’ π£34/ = π½π£3 + 1 β π½ π¦3 β
/.βπ(π¦3) where π½ = 1 β /
5
β’ π₯34/ = π¦3 β/,βπ(π¦3)
β’ Theorem: π3 = π π₯3 β πβ +.- π£3 β π₯β -
- satisfies π34/ β€ 1 β π (//- π3 for all π β₯ 0
β’ Corollary: Can compute π-optimal point in π( π log π π₯2 β πβ /π ) queries !!!β’ Proof: .- π₯2 β π₯β -
- β€ π π₯2 β πβ
β’ Proof: π π₯3 β πβ β€ π3 β€ 1 β π (+63π2 β€ 1 β π (
+63β 2 π π₯2 β πβ
Momentum?
Algorithm 1 (initial π₯2 β β0)β’ Let π£2 = π₯2β’ Repeat for π = 0,1,2, β¦
β’ π¦3 = πΌ β π₯3 + 1 β πΌ β π£3β’ π£34/ = π½π£3 + 1 β π½ π¦3 β
/.βπ(π¦3)
β’ π₯34/ = π¦3 β/,βπ(π¦3)
Algorithm 2 (initial π₯2 β β0)
β’ Let π₯/ = π₯2 β/,βπ π₯2
β’ Repeat for π = 1,2, β¦
β’ π¦3 = π₯3 +5(/54/
π₯3 β π₯3(/β’ π₯34/ = π¦3 β
/,βπ(π¦3)
π = ,. , πΌ = 5
54/ , and π½ = 1 β /5
These algorithm are equivalent!
The π₯, are identical in each algorithm.
What if not strongly convex?Idea
β’ min0π π₯ = π π₯ + 1
+π₯ β π₯2 +
+
β’ π(π₯) is π-strongly convex
β’ Can compute π₯3 an 4+-optimal point in π *.1
1log 5 0! &5β
4steps
β’ π π₯ β€ π(π₯) so πβ β₯ πββ’ π π₯2 βπβ = π π₯2 βπβ β€ π π₯2 βπβ β€
*+ π₯2 β π₯β +
+
β’ π π₯3 β€ π π₯3 β€ πβ + π β€ π π₯β + 1+ π₯2 β π₯β +
+ + π
β’ If π = 4β0!&0ββ$$
have π optimal point in π * 0!&0β $$
4 log * 0!&0β $$
4 queries
Problemmin!ββ&
π(π₯)
Can remove the log factor by both a better reduction and a more direct algorithm (see notes)
Plan for Today
Recap β’ Accelerated Gradient Descent (AGD)
Proof β’ Approximately optimal AGD for smooth strongly convex functions.
β’ Non-strongly convexβ’ Optimal complexityβ’ Momentum
Extensions Thursday
ΓΌ
ΓΌ
ΓΌGeneralizations and applications
top related