Trace-Penalty Minimization for Large-scale Eigenspace Computation Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, USA (Co-authors: Zaiwen Wen, Chao Yang and Xin Liu) 1 CAAM VIGRE Seminar, January 31, 2013 1 SJTU (Shanghai), LBL (Berkeley) and CAS (Beijing) Yin Zhang (RICE) EIGPEN February, 2013 1 / 39
39
Embed
Trace-Penalty Minimization for Large-scale Eigenspace ...Optimization/L1/Optseminar/VIGRE2013_YZ.pdfNumerical Results: Time Table :A comparison of total wall clock time. “– –”
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Rayleigh-Ritz (RR): [V ,D] = eig(XTAX);X = X ∗ V ;
Yin Zhang (RICE) EIGPEN February, 2013 6 / 39
Section II. Motivation:
A Method for Larger Eigenspaces
with Richer Parallelism
Yin Zhang (RICE) EIGPEN February, 2013 7 / 39
What is Large Scale?
Ordinarily Large Scale:
A large and sparse matrix, say n = 1M
A small number of eigen-pairs, say k = 100
Doubly Large Scale:
A large and sparse matrix, say n = 1M
A large number of eigen-pairs, say k = 1% ∗ n
A sequence of doubly large scale problems
Change of characters as k jumps: X ∈ Rn×k
Cost of RR / orth(X)� AX
Parallelism becomes a critical factor
Low parallelism in RR/Orth =⇒ Opportunity for new methods?
Yin Zhang (RICE) EIGPEN February, 2013 8 / 39
Example: DFT, Materials Science
Kohn-Sham Total Energy Minimization
min Etotal(X) s.t. XT X = I, (2.1)
where, for ρ(X) := diag(XXT ),
Etotal(X) := tr(XT (
12
L + Vion)X)
+12ρTL†ρ + ρTεxc(ρ) + Erep .
Nonlinear eigenvalue problem: up to 10% smallest eigen-pairs.
A Main Approach: SCF — a sequence of linear eigenvalue problems
Yin Zhang (RICE) EIGPEN February, 2013 9 / 39
Avoid the Bottleneck
Two Types of Computation: AX and RR/orth
As k becomes large, AX is dominated by RR/orth — bottleneck
Parallelism
AX −→ Ax1 ∪ Ax2 ∪ ... ∪ Axk . Higher.
RR/orth contains sequentiality. Lower.
Avoid bottleneck?
Do fewer RR/orth
No free lunch?
Do more BLAS3(higher parallelism than AX )
Yin Zhang (RICE) EIGPEN February, 2013 10 / 39
Section III. Trace-Penalty Minimization:
Free of Orthogonalization
BLAS3-Dominated Computation
Yin Zhang (RICE) EIGPEN February, 2013 11 / 39
Basic Idea
Trace Minimization
minX∈Rn×k
{tr(XTAX) : XTX = I} (3.1)
Trace-penalty Minimization
minX∈Rm×k
f(X) :=12
tr(XTAX) +µ
4‖XTX − I‖2F. (3.2)
It is well known that µ→ ∞, (3.2) =⇒ (3.1)
Quadratic Penalty Function (Courant 1940’s)This idea appears old and unsophisticated. However, ......
Yin Zhang (RICE) EIGPEN February, 2013 12 / 39
“Exact” Penalty
However, µ→ ∞ is unnecessary.
Theorem (Equivalence in Eigenspace)Problem (3.2) is equivalent to (3.1) if and only if
µ > λk . (3.3)
Under (3.3), all minimizers of (3.2) have the SVD form:
X = Qk (I − Λk/µ)1/2VT , (3.4)
where Qk consist of k eigenvectors associated with a set of k smallesteigenvalues that form the diagonal matrix Λk , and V ∈ Rk×k is anyorthogonal matrix.
Yin Zhang (RICE) EIGPEN February, 2013 13 / 39
Fewer Saddle Points
Original Model: min{tr(XTAX) : XTX = I, X ∈ Rn×k }
One minimum/maximum subspace (discounting multiplicity).
All k -dimensional eigenspaces are saddle points.
However, for the penalty model:
TheoremLet f(X) be the penalty function associated with parameter µ > 0.
1 For µ ∈ (λk , λn), f(X) has a unique minimum, no maximum.2 For µ ∈ (λk , λk+p) where λk+p is the smallest eigenvalue > λk , a
rank-k stationary point must be a minimizers, as defined in (3.4).
In a sense, the penalty model is much stronger.
Yin Zhang (RICE) EIGPEN February, 2013 14 / 39
Error Bounds between Optimality Conditions
First order condition
Our penalty model: 0 = ∇f(X) , AX + µX(XTX − I);
Original model: 0 = R(X) , AY(X) − Y(X)(Y(X)TAY(X)),where Y(X) is an orthonormal basis of span{X}.
LemmaLet ∇f(X) (with µ > λk ) and R(X) be defined as above, then
‖R(X)‖F ≤ σ−1min(X)‖∇f(X)‖F , (3.5)
where σmin(X) is the smallest singular value of X. Moreover, for any globalminimizer X and any ε > 0, there exists δ > 0 such that whenever‖X − X‖F ≤ δ,
‖R(X)‖F ≤1 + ε√1 − λk/µ
‖∇f(X)‖F . (3.6)
Yin Zhang (RICE) EIGPEN February, 2013 15 / 39
Condition Number
Condition Number of the Hessian at Solution
κ(∇2f(X)) , λmax(∇2f(X))/λmin(∇2f(X))
Determining factor for asymptotic convergence rate of gradient methods
Lemma
Let X be a global minimizer of (3.2) with µ > λk . The condition number ofthe Hessian at X satisfies
κ(∇2f(X)
)≥
max (2(µ − λ1), (λn − λ1))
min (2(µ − λk ), (λk+1 − λk )). (3.7)
In particular, the above holds as an equality for k = 1.
Gradient methods may encounter slow convergence at the end.
Yin Zhang (RICE) EIGPEN February, 2013 16 / 39
Generalizations
Generalized eigenvalue problems: XTX = I → XTBX = I
Keep out undesired subspace: UTX = 0 (UT U = I)
Trace Minimization with Subspace Constraint
minX∈Rn×k
{tr(XTAX) : XTBX = I, UTX = 0}
Trace-Penalty Formulation
minX
12
tr(XTQTAQX) +µ
4‖XTQTBQX − I‖2F
where Q = I − UUT (QX = X − U(UT X)).
With changes of variables, all results still hold.
Yin Zhang (RICE) EIGPEN February, 2013 17 / 39
Algorithms for Trace-Penalty Minimization
Gradient Methods:
X ← X − α∇f(X). ∇f(X) = AX + µX(XTX − I)
First Order Condition:
∇f(X) = 0 ⇔ AX = X(I − XTX)µ
2 Types of Computations for ∇f(X):1 AX : O(k nnz(A))
2 X(XT X): O(k 2n) — BLAS3
(2) dominates (1) whenever k � nnz(A)/n
Gradient methods requires NO RR/Orth
Yin Zhang (RICE) EIGPEN February, 2013 18 / 39
Gradient Method
Preserve Full Rank
LemmaLet X j+1 be generated by
X j+1 = X j − αj∇f(X j)
from a full rank X j . Then X j+1 is rank deficient only if 1/αj is one of the kgeneralized eigenvalues of the problem:
[(X j)T∇f(X j)]u = λ[(X j)T (X j)]u.
On the other hand, if αj < σmin(X j)/||∇f(X j)||2, X j+1 is full rank.
Combined with previous results, there is a high probability of gettinga global minimizer by using gradient type methods.
Yin Zhang (RICE) EIGPEN February, 2013 19 / 39
Gradient Methods (Cont’d)
X j+1 = X j − αj∇f(X j)
Step Size α
Non-monotone line search (Grippo 1986, Zhang-Hager 2004)
Initial BB step:
αj = arg minα||S j − αY j ||2F =
tr((S j)TY j)
||Y j ||2F
where S j = X j − X j−1, Y j = ∇f(X j) − ∇f(X j−1).
Many other choices
Yin Zhang (RICE) EIGPEN February, 2013 20 / 39
Current Algorithm
Framework:1 Pre-process — scaling, shifting, preconditioning2 Penalty parameter µ — dynamically adjusted3 Gradient iterations — main operations: X(XT X) and AX4 RR Restart — computing Ritz-pairs and restarting
(Further steps possible, but NOT used in comparison)5 Deflation — working on desired subspaces only6 Chebychev Filter — improving accuracy
Yin Zhang (RICE) EIGPEN February, 2013 21 / 39
Enhancement: RR Restarting
RR Steps return Ritz-pairs for given subspaces
1 Orthogonalization: Q ∈ orth(X)
2 Eigenvalue decomposition: QTAQ = VTΣV3 Ritz-paires: QV and diag(Σ)
RR Steps ensure accurate terminations
RR Steps can accelerate convergence
Very few RR Steps are used
Yin Zhang (RICE) EIGPEN February, 2013 22 / 39
Section IV. Numerical Results and Conclusion
Yin Zhang (RICE) EIGPEN February, 2013 23 / 39
Pilot Tests in Matlab
Matrix: delsq(numgrid(’S’,102)); size: n = 10000; tol = 1e-3
CPU Time in Seconds
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
CP
U S
eco
nd
Number of Eigenvalues
eigs
lobpcg
eigpen
(a) with “-singleCompThread”
50 100 150 200 250 300 350 400 450 500
20
40
60
80
100
120
CP
U S
eco
nd
Number of Eigenvalues
eigs
lobpcg
eigpen
(b) without “-singleCompThread”
Yin Zhang (RICE) EIGPEN February, 2013 24 / 39
Experiment Environment
Running PlatformA single node of a Cray XE6 supercomputer (NERSC)
Two 12-core AMD ‘MagnyCours’ 2.1-GHz processors32 GB shared memory
System and language:Cray Linux Environment version 3Fortran + OpenMP
All 24 cores are used unless otherwise specified
Solvers Tested
ARPACK
LOBPCG
EIGPEN
Yin Zhang (RICE) EIGPEN February, 2013 25 / 39
Relative Error Measurements
Let x1 x2 · · · xk be computed Ritz vectors, and θi Ritz values.
Eigenvectors:
resi =‖Axi − θixi‖2
max(1, |θi |)
Eigenvalues:
eθ = maxi
|θi − λi |
max(1, |λi |),
Trace:
etrace =|∑k
i θi −∑k
i λi |
max(1, |∑k
i λi |)
Yin Zhang (RICE) EIGPEN February, 2013 26 / 39
Test Matrices
UF Sparse Matrix Collection: PARSEC group (Y.K. Zhou et al.)
Matrix size n, sparsity nnz (#eigen-pairs nev for Test 1)