Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix Constructive regularization of the random matrix norm. Liza Rebrova University of California Los Angeles Structural inference in High Dimensional Models workshop, September 2018
45
Embed
Constructive regularization of the random matrix norm.erebrova/hdi_workshop_presentation-1.pdf · Matrix normsLocal norm regularizationConstructive regularizationSub-block regularizationAppendix
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Constructiveregularization of the random matrix norm.
Liza Rebrova
University of California Los Angeles
Structural inference in High Dimensional Models workshop,September 2018
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Non-asymptotic random matrix theory framework
A = (Aij)n×m. Aij are taken from some distribution.
Usually, we have
• no specific distribution assumption
• no symmetry assumption
• high probability results (hold with probability 1− o(1))
• for the dimensions large enough (all large matrices withn,m > N0)
Concentration on the sphere Convex set in Rn
(Right picture is taken from “Estimation in high dimensions” by R.Vershynin)
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Operator (spectral) norm
By definition,
‖A‖ := sup‖x‖2=1
‖Ax‖2 = supu,v∈Sn−1
|〈Au, v〉| = s1(A)
Norm of the inverse 1/‖A−1‖ = inf‖x‖2=1
‖Ax‖2 = sn(A)
Singular values – real spectrum of the matrix
s(A) =√
eig(ATA), s1 ≥ s2 ≥ . . . ≥ sn ≥ 0.
Key idea: spectrum stabilizes as the size of the matrices →∞
eigenvalues of a Wigner matrixReal vs complex component of gaussian matrixeigenvalues
(Left picture is taken from “Estimation in high dimensions by R.Vershynin”)
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
What is optimal norm order?
Let A = (Aij)n×n be a square random matrix with i.i.d. entries.
Gaussian Subgaussian
for any t ≥ 0 for any t ≥ C0
s1(A) s1 ≤ 2√n + t s1 ≤ t
√n
with prob 1− 2e−t2/2 with prob 1− e−ct
2n
from Gordon’s theorem from Bernstein’s inequality
Blue - gaussian, Red - subgaussian, Green - heavy-tailed
Def.: Aij are subgaussian if P{|Aij | > t} ≤ C1e−c2t2 for any t > 0
(Picture is taken from D.Mixon blog “Short, fat matrices”)
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Not an optimal order
Light tails ((sub)gaussian, 4 finite moments): with high probability,
‖A‖ = smax(A) ∼√n and smin(A) ∼ 1/
√n.
Heavy tails (2 finite moments): with high probability,
‖A‖ = smax(A) �√n and smin(A) ∼ 1/
√n.
Example (‖A‖ ∼ n�√n)
• Litvak-Spector: Constructive example of ‖A‖ ∼ O(n1−β) forany β ≥ 0 with probability at least 1/2.
• Bai-Silverstein-Yin: 4 moments are needed for ‖A‖ →√n.
•
sup‖x‖=1
‖Ax‖ ≥
1 1 1 11 1 1 11 1 1 11 1 1 1
·
n−1/2
n−1/2
n−1/2
n−1/2
= n
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Local norm regularization
Questions:1. Can we regularize the norm correcting just a small fraction ofthe entries of A?2. What in the structure of a heavy-tailed matrix causes norm toblow up from the “ideal” order O(
√n)?
Local regularization: A 7→ A, such that
• A differs from A in a small εn × εn sub-matrix
• ‖A‖ .√n
Theorem (with R. Vershynin, informal statement)
Let A be a large enough random square matrix with i.i.d. elements.Local regularization is possible with high probability ⇐⇒
EAij = 0 and EA2ij is bounded.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Local norm regularization
Theorem (Part 1: local obstructions)
Let A = (Aij)n×n has i.i.d. entries, such thatEAij = 0, EA2
ij = 1. For any ε ∈ (0, 1/6],
with probability ≥ 1− 11e−εn/12
there exists an εn × εn sub-matrix A0 ⊂ A:
‖A \ A0‖ ≤ Cε√n, Cε = C · ln(ε−1)√
ε
0 εn
n A \ A0A0
A \ A0 = zero outall entries in A0
• log-optimal dependence of on size ε
• can consider any ε < 1 in trade of larger constants
• inconstructive: does not identify A0
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
In every n2−k × n2−k submatrixof B there are at most n2−k−1
columns with > C1r non-zeros
Lemma (2, pn ≤ 4)
In every n2−k × 2−k−1 submatrixof B there are at most 2−k−1ncolumns with > C2r non-zeros
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
1. Bernoulli matrices: after decomposition
Recall:|Aij | ∼
∑k
2k1{|Aij |∈(2k−1,2k ]} =∑
2kBk
• For Apart1 =∑
BkB2 ∪ Bk
B3 use
Lemma (Norm of sparse matrices)
For any matrix Q and vectors u, v ∈ Sn−1, we have
‖Q‖ ≤ maxj‖colj(Q)‖2 ·
√max
i#(rowi (Q)).
↓ ↓√n
√const ·#(terms)
• For each Bkij ∈ B1 all rows and columns are bounded by
O(npk) =⇒ we can use results for Bernoulli matrices
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
2. Heavy and light indices: Bernoulli
Using definition ‖B‖ = supu,v∈Sn−1 |∑
ij Bijuivj |.Light indices := {(i , j) : |uivj | ≤
√p/n} for every u, v .
Split the sum
|∑ij
(Bij − EBij)uivj | ≤
|∑light
(Bij − EBij)uivj |+ |∑heavy
EBijuivj |+ |∑heavy
Bijuivj |
• Light part - bounded members - Bernstein’s concentration
• Expectation part - #(heavy indices) ≤ n/p - Cauchy-Swartz
• Heavy part - Feige-Ofek theorem (bound follows from tailestimate for e(S ,T ) = number of non-zero entries in someS × T sub-block)
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
2. Heavy and light indices: general case
Light indices := {(i , j) : |uivjAij | ≤√
4/n} for every u, v .Split the sum
|∑ij
Aijuivj | ≤ |∑light
Aijuivj |+∑heavy
|Aij |uivj
E|Aij | 6= 0, but we do not care, split into Bernoulli levels and useFeige-Ofek theorem at each level!∑
heavy
|Aij |uivj ≤∑ij
∑k
2kBkij uivj ≤
∑k
2k√npk
≤√n∑k
22kpk ·√
#(levels).
From second moment condition 1 ≥ EA2ij ≥ 0.25
∑k 22kpk .
Number of levels is an extra term - minimize it.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
3. Only average levels matter
• Large entries (&√ncε) are zeroed (they produce heavy rows)
• Small entries (.√
n/ ln n) are bounded separately byBandeira-van Handel theorem
Number of levels is at most
log2(Ccεn)− log2( cn
ln n
)≤ log2
Ccεn · ln nc1n
∼ log log n.
Note: symmetry is needed only to keep zero mean in varioustruncations.
Q.E.D.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
What is we want to zero out εn × εn block only?
Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.
A A
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
What is we want to zero out εn × εn block only?
Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.
0A
‖green‖ ≤√n ‖brown‖ ≤
√n
A
0
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
What is we want to zero out εn × εn block only?
Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.
0
‖green‖ ≤√n ‖brown‖ ≤
√n
0
0
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
What is we want to zero out εn × εn block only?
Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.
0 + =
‖green‖ ≤√n ‖brown‖ ≤
√n ‖dashed‖ ≤ 2
√n
0
0
0
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Algorithm idea
Idea: find εn columns to replace with zeros, such that all rows andcolumns have bounded L2-norms + apply Main Theorem.
Lemma (with K.Tikhomirov)
B is n × n matrix with 0-1 entries, EBij = p. Then for any L ≥ 10with probability 1− exp(−n exp(−Lpn)): if we define
Wij :=
{1, if #(rowj(B)) ≤ Lnp or Bij = 0,
Lnp#(rowj (B)) , otherwise.
and Vj :=∏n
i=1Wij , and J := {j : Vj < 0.1}, then
|J| ≤ n exp(−Lnp) and∑j∈Jc
Bij ≤ 10Lnp, for any i ∈ [n].
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Damping: Bernoulli example
Idea: we construct a diagonal matrix of weights that regularizeseach row
0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0
·
0δ1
00
δ1
=
0 δ1 0 0 δ10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0
1-st row: damping with the weight 0 < δ1 < 1
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Damping: Bernoulli example
Idea: we construct a diagonal matrix of weights that regularizeseach row
0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0
·
0δ1
00
δ1
=
0 δ1 0 0 δ10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0
2-nd row: all good
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Damping: Bernoulli example
Idea: we construct a diagonal matrix of weights that regularizeseach row
0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0
·
0δ21
δ10
δ1
=
0 δ1 0 0 δ10 0 0 0 00 δ1 δ1 0 01 1 0 0 11 0 0 0 0
3-rd row: damping with the weight 0 < δ1 < 1
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Damping: Bernoulli example
Idea: we construct a diagonal matrix of weights that regularizeseach row
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Lemma (with K.Tikhomirov)
B is n × n matrix with 0-1 entries, EBij ≤ p. Then for any L ≥ 10with probability 1− exp(−n exp(−Lpn)): if we define Wij as beforeand Vj :=
∏ni=1Wij , and J := {j : Vj < 0.1}, then
|J| ≤ n exp(−Lnp) and∑j∈Jc
Bij ≤ 10Lnp, for any i ∈ [n].
How do we use Lemma? Split
A2ij ≤
∑k
qk1{A2ij∈(qk−1,qk ]}, Ik := (qk−1, qk ].
Then∑
k qk−1P{A2ij ∈ Ik} ≤ EA2
ij = 1. Apply Lemma to Bk with
entries Bkij = A2
ij1{Aij∈Ik} ∼ qk1{Aij∈Ik} to get
‖rowj(AJc )‖22 .∑k
qknP{A2ij ∈ Ik} ≤ 2n.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Quantiles and regularization process
To pass from Bernoulli to general case now we need pk to be incontrol: not too small (probability estimate), not too large(cardinality estimate).
Definition (2−k quantiles)
Denote as 2−k -quantiles of |Aij | points qk , such that
P{|Aij | > qk} = 2−k .
LetAk := A · 1{Aij∈(qk−1,qk ]}
Note: quantiles qk can be approximated by order statistics of Aij
(it is a free set of samples from the distribution!) We useAk ∼ Ak . So, the algorithm is distribution-oblivious.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Submatrix norm regularization algorithm
1. delete too large entries
2. small entries are fine without regularization
3. for each average k construct weights for Ak : W kij and V k
j tofind an exceptional subset of columns Jk :
| ∪k Jk | ≤ εn/2 with high probability
4. J = J ∪ (∪kJk), where J is a subset of εn/2 columns withlargest norms
5. repeat the process for AT to find an exceptional row subset I
6. intersection of I and J gives a εn × εn exceptional matrix A0
=⇒ ‖A‖ = ‖A \ A0‖ ∼√n ln ln n by Main Theorem.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
THANKS FOR YOUR ATTENTION!
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Applications to community detection
A - adjacency matrix of Erdos-Renyi random graph G (n, pij).Stochastic block model:
EA =
p p s sp p s s
s s p ps s p p
Inhomogeneous Erdos-Renyi model G(n, (pij))
Edges are still independent, but can have di↵erent probabilities pij .
Allows to model networks with structure = communities (clusters).
Example. Stochastic block model with two communities G(n, p, q):Edges within each community: probability p; across communities: probability q < p.
Spectral method for community detection is based on the idea:
1. EA eigenstructure reveals communities,
2. eigenstructure[A] ∼ eigenstructure[EA].
So, eigenstructure of A (observed) reveals communities too.
Condition 2 is satisfied only for dense graphs. Idea for sparsegraphs: regularize graph locally to impose ‖A− EA‖ ∼ 0.
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Obstructions for random graphs
Sparse graphs: maximal expecteddegree d := max pijn� log n.
[Feige-Ofek] obstructions to‖A− EA‖ ∼ 0 are few highdegree vertices of the graph.
For the regularization it is enough to
• Feige-Ofek: delete all high-degree vertices (degree > 10d)
• Le-Levina-Vershynin: reweight or delete some of the edgesadjacent to high-degree vertices (to make all the degreesbounded by 10d)
• R.: enough to delete a small ne−d × ne−d subgraph; gotdescription of the “bad” subgraph (we can direct its edges s.t.every vertex has a finite number of the outcoming edges)
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix
Theorem (Part 2: global obstructions)
Let A is an n × n matrix with i.i.d. entries, such that
• EA2ij ≥ M,
• |Aij | ≤√n almost surely.
If M = M(C , ε) is a large enough constant, then any εn × εnsub-matrix A0 has large norm
‖A0‖ ≥ C√n,
with probability at least 1− exp(−εn).
So, if we were to cut some partfor regularization, we need to cutalmost everything! No εn × εnsub-matrix can survive.