Top Banner
Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory seminar, April 5, 2014
73

Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

Dec 18, 2015

Download

Documents

Amberly Daniels
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

Preconditioning in Expectation

Richard Peng

Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale)

MIT

CMU theory seminar, April 5, 2014

Page 2: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

RANDOM SAMPLING

• Collection of many objects• Pick a small subset of them

Page 3: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

GOALS OF SAMPLING

• Estimate quantities• Approximate higher dimensional objects•Use in algorithms

Page 4: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

SAMPLE TO APPROXIMATE

• ε- nets / cuttings• Sketches•Graphs•Gradients

This talk: matrices

Page 5: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

NUMERICAL LINEAR ALGEBRA

• Linear system in n x n matrix• Inverse is dense• [Concus-Golub-O'Leary `76]: incomplete Cholesky, drop entries

Page 6: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

HOW TO ANALYZE?

• Show sample is good• Concentration bounds• Scalar: [Bernstein `24]

[Chernoff`52]• Matrices: [AW`02][RV`07][Tropp

`12]

Page 7: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

THIS TALK

•Directly show algorithm using samples runs well• Better bounds• Simpler analysis

Page 8: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

OUTLINE

•Random matrices• Iterative methods• Randomized preconditioning• Expected inverse moments

Page 9: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

HOW TO DROP ENTRIES?

• Entry based representation hard• Group entries together• Symmetric with positive entries

adjacency matrix of a graph

Page 10: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

SAMPLE WITH GUARANTEES

• Sample edges in graphs•Goal: preserve size of all cuts • [BK`96] graph sparsification• Generalization of expanders

Page 11: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

DROPPING ENTRIES/EDGES

• L: graph Laplacian• 0-1 x : |x|L

2 = size of cut between 0s-and-1s

Unit weight case:|x|L

2 = Σuv (xu – xv)2Matrix norm: |x|P

2 = xTPx

Page 12: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

DECOMPOSING A MATRIX

• Sample based on positive representations•P = Σi Pi, with each Pi

P.S.D•Graphs: one Pi per edge

Σuv (xu – xv)2 1 -1

-1 1

u

u v

v

P.S.D. multi-variate version of positive

L = Σuv

Page 13: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MATRIX CHERNOFF BOUNDS

Can sample Q with O(nlognε-2) rescaled Pis s.t. P ≼ Q ≼ (1 +ε) P

≼ : Loewner’s partial ordering,A ≼ B B – A positive semi definite

P = Σi Pi, with each Pi P.S.D

Page 14: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

CAN WE DO BETTER?

• Yes, [BSS `12]: O(nε-2) is possible• Iterative, cubic time construction• [BDM `11]: extends to general matrices

Page 15: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

DIRECT APPLICATION

For ε accuracy, need P ≼ Q ≼(1 +ε) PSize of Q depends inversely on ε

ε-1 is best that we can hope for

Find Q very close to PSolve problem on QReturn answer

Page 16: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

USE INSIDE ITERATIVE METHODS

• [AB `11]: crude samples give good answers• [LMP `12]: extensions to row sampling

Find Q somewhat similar to PSolve problem on P

using Q as a guide

Page 17: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ALGORITHMIC VIEW

• Crude approximations are ok• But need to be efficient• Can we use [BSS `12]?

Page 18: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

SPEED UP [BSS `12]

• Expander graphs, and more• ‘i.i.d. sampling’ variant related to the Kadison-Singer problem

Page 19: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MOTIVATION

•One dimensional sampling:• moment estimation,• pseudorandom generators

• Rarely need w.h.p.•Dimensions should be disjoint

Page 20: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MOTIVATION

• Randomized coordinate descent for electrical flows [KOSZ`13,LS`13]• ACDM from [LS `13] improves various numerical routines

Page 21: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

RANDOMIZED COORDINATE DESCENT

• Related to stochastic optimization• Known analyses when Q = Pj

• [KOSZ`13][LS`13] can be viewed as ways of changing bases

Page 22: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

OUR RESULT

For numerical routines, random Q gives same performances as [BSS`12], in expectation

Page 23: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

IMPLICATIONS

• Similar bounds to ACDM from [LS `13]• Recursive Chebyshev iteration ([KMP`11]) runs faster• Laplacian solvers in ~ mlog1/2n time

Page 24: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

OUTLINE

• Random matrices• Iterative methods• Randomized preconditioning• Expected inverse moments

Page 25: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ITERATIVE METHODS

• [Gauss, 1823] Gauss-Siedel iteration• [Jacobi, 1845] Jacobi Iteration• [Hestnes-Stiefel `52] conjugate gradient

Find Q s.t. P ≼ Q ≼10 PUse Q as guide to solve problem on P

Page 26: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

[RICHARDSON `1910]

x(t + 1) = x(t) + (b – Px(t))

• Fixed point: b – Px(t) = 0• Each step: one matrix-vector multiplication

Page 27: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ITERATIVE METHODS

•Multiplication is easier than division, especially for matrices•Use verifier to solve problem

Page 28: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

1D CASE

Know: 1/2 ≤ p ≤ 1 1 ≤ 1/p ≤ 2

• 1 is a ‘good’ estimate• Bad when p is far from 1• Estimate of error: 1 - p

Page 29: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ITERATIVE METHODS

• 1 + (1 – p) = 2 – p is more accurate• Two terms of Taylor expansion• Can take more terms

Page 30: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ITERATIVE METHODS

Generalizes to matrix settings:

1/p = 1 + (1 – p) + (1 – p)2 + (1 – p)3…

P-1 = I + (I – P) + (I – P)2

+ …

Page 31: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

[RICHARDSON `1910]

x(0) = IbX(1) = (I + (I – P))bx(2) = (I + (I – P) (I + (I – P)))b

…x(t + 1) = b + (I – P) x(t)

• Error of x(t) = (I – P)t b•Geometric decrease if P is close to I

Page 32: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

OPTIMIZATION VIEW

•Quadratic potential function•Goal: walk down to the bottom•Direction given by gradient

Residue: r(t) = x(t ) – P-

1bError: |r(t)|2

2

Page 33: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

• Step may overshoot•Need smooth function

x(t) x(t+1

)

x(t) x(t+1

)

DESCENT STEPS

Page 34: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MEASURE OF SMOOTHNESS

x(t + 1) = b + (I – P) x(t)

Note: b = PP-1br(t + 1) = (I – P) r(t)

|r(t + 1)|2 ≤|I – P|2 |x(t)|2

Page 35: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MEASURE OF SMOOTHNESS

1 / 2 I ≼ P ≼ I |I – P|2 ≤ 1/2

• |I – P|2 : smoothness of |r(t)|22

•Distance between P and I• Related to eigenvalues of P

Page 36: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MORE GENERAL

• Convex functions• Smoothness / strong convexity

This talk: only quadratics

Page 37: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

OUTLINE

• Random matrices• Iterative methods•Randomized preconditioning• Expected inverse moments

Page 38: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ILL POSED PROBLEMS

• Smoothness of directions differ• Progress limited by steeper parts

.8 0

0 .1

Page 39: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

PRECONDITIONING

• Solve similar problem Q• Transfer steps across

QP P

Page 40: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

PRECONDITIONED RICHARDSON

•Optimal step down energy function of Q given by Q-1

• Equivalent to solvingQ-1Px = Q-1b

QP

Page 41: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

PRECONDITIONED RICHARDSON

x(t + 1) = b + (I – Q-1P) x(t)

Residue:r(t + 1) = (I – Q-1P)

r(t)

|r(t + 1)|P = |(I – Q-1P )r(t)|P

Page 42: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

CONVERGENCE

• If P ≼ Q ≼10 P, error halves in O(1) iterations•How to find a good Q?

QP

Improvements depend on |I – P1/2Q-1P1/2|2

Page 43: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MATRIX CHERNOFF

• Take O(nlogn) (rescaled) Pis with probability ~ trace(PiP-1)

•Matrix Chernoff ([AW`02],[RV`07]): w.h.p. P ≼ Q ≼ 2P

P = ΣiPi Q = ΣisiPi

s has small support

Note: Σitrace(PiP-1) = n

Page 44: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

WHY THESE PROBABILITIES?

• trace(PiP-1):• Matrix ‘dot product’

• If P is diagonal• 1 for all i• Need all entries

.8 0

0 .1

Overhead of concentration: union bound on dimensions

Page 45: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

IS CHERNOFF NECESSARY?

•P: diagonal matrix•Missing one entry: unbounded approximation factor

1 0

0 1

1 0

0 0

Page 46: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

BETTER CONVERGENCE?

• [Kaczmarz `37]: random projections onto small subspaces can work• Better (expected) behavior than what matrix concentration gives!

Page 47: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

HOW?

•Will still progress in good directions• Can have (finite) badness if they are orthogonal to goal

Q1P ≠

Page 48: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

QUANTIFY DEGENERACIES

• Have some D ≼ P ‘for free’• D = λmin (P)I (min

eigenvalue)• D = tree when P is a graph• D = crude approximation /

rank certificate

.8 0

0 .2

.2 0

0 .1P D

Page 49: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

REMOVING DEGENERACIES

• ‘Padding’ to remove degeneracy• If D ≼ P and 0.5 P ≼ Q ≼ P,

0.5P ≼ D + Q ≼ 2P

P D

Page 50: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ROLE OF D

• Implicit in proofs of matrix Chernoff, as well as [BSS`12]• Splitting of P in numerical analysis•D and P can be very different

P D

Page 51: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MATRIX CHERNOFF

• Let D ≤ 0.1P, t = trace(PD-1)• Take O(tlogn) samples with probability ~ trace(PiD-1)

•Q D + (rescaled) samples•W.h.p. P ≼ Q ≼ 2 P

P Q

Page 52: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

WEAKER REQUIREMENT

Q only needs to do well in some directions, on average

Q1

P

Q2

≈1/2

1/2

Page 53: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

EXPECTED CONVERGENCE

Exist constant c s.t. for any r,E[|(I – c Q-1P )r|P ≤ 0.99|r|P

• Let t = trace(PD-1)• Take rand[t, 2t] samples, w.p. trace(PiD-1)

• Add (rescaled) results to D to form Q

Page 54: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

OUTLINE

• Random matrices• Iterative methods• Randomized preconditioning• Expected inverse moments

Page 55: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

ASIDE

Goal: combine these analyses

Matrix Chernoff• f(Q)=exp(P-1/2(P-Q)P-

1/2)• Show decrease in

relative eigenvalues

Iterative methods:• f(x) = |x – P-1b|P

• Show decrease in distance to solution

Page 56: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

SIMPLIFYING ASSUMPTIONS

• P = I (by normalization)• tr(Pi D-1) = 0.1, ‘unit

weight’• Expected value of

picking a Pi at random: 1/t I

P0

D0

P

D

Page 57: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

DECREASE

Step: r’ = (I – Q-1P)r= (I – Q-1)r

New error: |r’|P = |(I – Q-1 )r|2

Expand:

Page 58: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

DECREASE:

• I ≼ Q ≼ 1.1 I would imply:• 0.9 I ≼ Q-1

• Q-2 ≼ I

• But also Q-3 ≼ I and etc.•Don’t need 3rd moment

Page 59: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

RELAXATIONS

•Only need Q-1 and Q-2

• By linearity, suffices to:• Lower bound EQ[Q-1]

• Upper bound EQ[Q-2]

Page 60: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

TECHNICAL RESULT

Assumption: Σi Pi = Itrace(PiD-1) = 0.1

• Let t = trace(D-1)• Take rand[t, 2t] uniform samples• Add (rescaled) results to D to form Q

• 0.9I ≼ E[Q-1]• E[Q-2] ≼ O(1) I

Page 61: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

Q-1

• 0.5I ≼ E[Q-1] follows from matrix arithmetic-harmonic mean inequality ([ST`94])•Need: upper bound on E[Q-2]

1/2 1/2-1( )

Page 62: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

E[Q-2] ≼ O(1) ?

•Q-2 is gradient of Q-1

•More careful tracking of Q-1 gives info on Q-2 as well!

Q-1

Q-2

j=t

j=2t

j=0

Page 63: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

TRACKING Q-1

•Q: start from D, add [t,2t] random (rescaled) Pis.

• Track inverse of Q under rank-1 perturbations

Sherman Morrison formula:

Page 64: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

BOUNDING Q-1: DENOMINATOR

Current matrix: Qj, sample: R

• D ≼ Qj Qj-1 ≼ D-1

• tr(Qj-1R) ≤ tr(D-1R) ≤ 0.1 for any

R,

ER[Qj+1-1] ≼ Qj

-1 – 0.9 Qj-1E[R]Qj

-

1E

Page 65: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

BOUNDING Q-1: NUMERATOR

• R: random rescaled Pi sampled

• Assumption: E[R] = P = I

ER[Qj+1-1] ≼ Qj

-1 – 0.9/t Qj

-2

ER[Qj+1-1] ≼ Qj

-1 – 0.9 Qj-1E[R]Qj

-1

Page 66: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

AGGREGATION

•Qj is also random

•Need to aggregate choices of R into bound on E[Qj

-1]

ER[Qj+1-1] ≼ Qj

-1 – 0.9/t Qj

-2

D = Q0

Q1

Q2

Page 67: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

HARMONIC SUMS

• Use harmonic sum of matrices• Matrix functionals• Similar to Steljes transform in

[BSS`12]• Proxy for -2th power• Well behaved under expectation:

EX[HrmSum (X,a)] ≤ HrmSum(E[X],a)

HrmSum(X, a) = 1/(1/x + 1/a)

Page 68: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

HARMONIC SUM

Initial condition + telescoping sum gives E[Qt

-1] ≼ O(1)I

ER[Qj+1-1] ≼ Qj

-1 – 0.9/t Qj

-2

Page 69: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

E[Q-2] ≼ O(1)I

•Q-2 is gradient of Q-1:

0.9/t Qj-2 ≼ Qj

-1 - ER[Qj+1-1]

• 0.9/tΣj=t2t-1 Qj

-2 ≼ E[Q2t-1] - E[Qt

-

1]• Random j from [t,2t] is good!

Q-1

j=t

j=2t

j=0

Q-2

Page 70: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

SUMMARY

Un-normalize:• 0.5 P ≼ E[PQ-1P]• E[PQ-1PQ-1P] ≼ 5P

One step of preconditioned Richardson:

Page 71: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

MORE GENERAL

•Works for some convex functions• Sherman-Morrison replaced by inequality, primal/dual

Page 72: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

FUTURE WORK

• Expected convergence of• Chebyshev iteration?• Conjugate gradient?

• Same bound without D (using pseudo-inverse)?• Small error settings• Stochastic optimization?• More moments?

Page 73: Preconditioning in Expectation Richard Peng Joint with Michael Cohen (MIT), Rasmus Kyng (Yale), Jakub Pachocki (CMU), and Anup Rao (Yale) MIT CMU theory.

THANK YOU!

Questions?