ROVIBRATIONAL SPECTROSCOPY CALCULATIONS USING A WEYL-HEISENBERG WAVELET BASIS AND CLASSICAL PHASE SPACE TRUNCATION by RICHARD LUZI LOMBARDINI, B.S., M.S. A DISSERTATION IN PHYSICS Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Approved Bill Poirier Chairperson of the Committee Greg Gellene Wallace Glab Thomas Gibson Accepted John Borrelli Dean of the Graduate School August, 2006
129
Embed
Richard Luzi Lombardini- Rovibrational Spectroscopy Calculations Using a Weyl-Heisenberg Wavelet Basis and Classical Phase Space Truncation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ROVIBRATIONAL SPECTROSCOPY CALCULATIONS USING
A WEYL-HEISENBERG WAVELET BASIS AND
CLASSICAL PHASE SPACE TRUNCATION
by
RICHARD LUZI LOMBARDINI, B.S., M.S.
A DISSERTATION
IN
PHYSICS
Submitted to the Graduate Faculty of Texas Tech University in
Partial Fulfillment of the Requirements for
the Degree of
DOCTOR OF PHILOSOPHY
Approved
Bill Poirier Chairperson of the Committee
Greg Gellene
Wallace Glab
Thomas Gibson
Accepted
John Borrelli Dean of the Graduate School
August, 2006
Copyright 2006, Richard Luzi Lombardini
ACKNOWLEDGMENTS
This was my first attempt exploring the unknowns of theoretical physics and
chemistry. Any successes that I achieved were only possible through the patience and
guidance of my advisor Bill Poirier. Encouragement was given to me unconditionally
by my parents, Barry and Leila Lombardini, that was vital to the completion of this
degree. Peers and colleagues in the chemistry department, Jason McAfee, Justin
Rajesh Rajian, and Buddhadev Maiti, especially those who were my cellmates in the
Poirier lab, Sean Xiao, Junkai Xie, Wenwu Chen, Jason Montgomery, Akbar Salam,
and Corey Trahan, made the experience enjoyable and less frustrating since we were
“all in the same boat”. Some of these calculations (Chapter IV) were done on “Jazz,”
a 350-node computing cluster operated by the Mathematics and Computer Science
Division at Argonne National Laboratory, and I am grateful for the staff at Argonne
for their technical help. Some of the staff (Srirangam Addepalli and David Chaffin)
at the Texas Tech High Performance Computing Center were very helpful with issues
involving parallel programming. Last of all, I would like to recognize and thank
my committee members (the three G’s), Greg Gellene, Wallace Glab, and Thomas
By themselves, weakly bound or “floppy” molecular systems are very interesting in
that they defy notions of “typical” molecular behavior. However, the particular study
of clusters—i.e., 2 − 1000 monomers (either atoms or molecules) bound together by
van der Waals forces or hydrogen bonding—opens new doors of exploration of physical
chemistry at the fundamental level, with respect to the ability to control and vary the
cluster size. With the advancement of experimental techniques involving the synthesis
of clusters at variable sizes,1 in conjunction with high-resolution spectroscopy at
frequencies ranging from the microwave to the ultraviolet realm, experimentalists
have been able to work closely with theorists whose struggles are, obviously not size
control per se, but instead, severe size limitations for accurate calculations due to
the inherent anharmonicities coming from the weak bonds (to be addressed in more
detail later).
Within the realm of cluster research, one field of study involves vibrationally
induced dynamical phenomena (in hydrogen-bonded molecular complexes) which is
important for gaining a better understanding of energy transfer from reactants to
products in chemical reactions. Such investigations allow one to gain insight on the
fine line that separates time-independent spectroscopy and time-dependent chemical
dynamics. The hydrogen fluoride dimer (HF)2 is a good example; here, the main
focus lies on vibrational predissociation, more specifically, the transfer of energy from
the excited HF monomer vibration to the weak intermolecular bond, which eventually
breaks. To gain the full picture of the process, both theoretical2 and experimental3
methods traditionally used for bound states are needed in conjunction with scattering
calculation techniques4 and experimental setups5,6 conducive to measuring dissocia-
tion probabilities.
Another area of interest, for which clusters are ideally suited, is the study of
how microscopic properties evolve into macroscopic behavior, since clusters serve as a
1
type of transitional form of matter, intermediate between independent units and bulk
matter.7,8 One example is the study of first order phase transitions. At the cluster
level, systems go through phase “changes” rather than phase transitions. One key
difference is that melting and freezing points are not identical.9 For the range of tem-
peratures in between, the system can hop back and forth between phases, similar to
the dynamics of coexisting isomers. For rare gas clusters, especially those comprised
of lighter atoms (such as Ne), quantum effects can have a great influence on these
unusual “phases,” sparking the development of efficient theoretical quantum statisti-
cal techniques for many-body systems.10 In principle, as one increases the size of the
cluster, the phase changes should approach the familiar first order phase transitions
of statistical mechanics—although this bridge has never been theoretically or exper-
imentally traveled upon since typical bulk sizes (or sizes at which these macroscopic
physical properties start to appear) are so much larger than clusters which have sizes
that one can directly manipulate. The same can be said for ion solvation, which
is examined at the molecular level with cluster size solvent shells in the gas phase
(very different than in solution). Stace,11 however, argues that one can extract use-
ful thermodynamical information for the bulk counterpart from studying individual
ion-solvent intermolecular interactions.12
Strictly on the theoretical front, the anharmonic behavior of the large amplitude
motion of cluster systems, due to multiple shallow minima separated by low isomeriza-
tion and dissociation barriers on the potential surface, renders accurate spectroscopic
calculations of energy levels extremely challenging. Despite the simplification offered
by the Born-Oppenheimer approximation, in which the electronic degrees of free-
dom (DOF’s) can be effectively removed, the numerical solution of the nuclear time-
independent Schrodinger equation becomes computationally expensive very quickly
with respect to increasing cluster size, since, for the most part, the nuclear DOF’s
must be treated as being coupled. The traditional picture of treating the system
approximately as decoupled single-DOF vibrational oscillators, and subsequent nor-
2
mal mode analysis, breaks down, due to the large anharmonicity which cannot be
regarded as a small perturbation.
Using basis set methods, which is the method of choice throughout this disser-
tation, the coupling of all DOF’s translates to the need for a rather large matrix
representation (or equivalently, basis set of large size N) of the nuclear Hamilto-
nian operator, H, in order to numerically calculate accurate energy values and wave
functions of H. Diagonalization of the corresponding N × N matrix representation
H—ordinarily, the computational bottleneck—directly provides the linear expansion
coefficients of the eigenfunctions of H, in terms of the N basis functions. With suffi-
ciently large N , these numerical solutions approximate the true wave functions closely,
at least towards the bottom of the spectrum; the corresponding eigenvalues of H, in
the same sense, should be close to, yet always larger than,13 the actual H. If there is
any separability in the Hamiltonian or decoupling of DOF’s, the Hamiltonian can be
broken up into smaller parts, each of which can be represented by smaller matrices
(since there are less DOF’s) and solved separately.
The worst, but most typical, case of no decoupling may require one to be very
innovative when choosing an appropriate coordinate system and basis set, with the
purpose of making the computations tractable. For optimal coordinate systems, Bacic
and Light14 suggest four crucial criteria to consider: the coordinates need (1) bound-
ary conditions that cover all of the configuration space that the system can possibly
span, (2) to exploit the highest symmetry of the system, (3) to be orthogonal such
that the kinetic energy operator has the least possible number of cross terms, (4)
and finally, if possible, to be carefully chosen such that there is the least amount of
coupling between vibrational modes (or DOF’s). Throughout this dissertation, Carte-
sian coordinates are used (there is a slight exception in the 1 DOF case of chapter
II) which fully satisfy conditions (1) and (3); the requirements of (2) and (4) are not
dealt with, although these are worth exploring in future works. Although curvilinear
coordinates could successfully address (4), they are, in general, more difficult to use
and lack the universality of Cartesian coordinates. As discussed in Ref. [15], one does
3
not have to deal with variable coordinate limits and boundary conditions, cross terms
in the kinetic energy operator, and other system and/or coordinate specific adjust-
ments when using Cartesian coordinates. Despite the benefits, the foremost reason
these are used here is that the Weyl-Heisenberg wavelets or “weylets”, which are the
basis functions of choice throughout this dissertation, are at present only defined in
Cartesian coordinates.
This dissertation mainly focuses on basis sets and the computational techniques
that accompany them, with the expectation that the combined effort can lead to
successful calculations of weakly bound cluster systems that would otherwise not be
possible, using conventional methodologies. A key metric in any discussion of basis
set methods is the basis set efficiency K/N , where N is the number of basis functions
needed to compute K eigenvalues to a desired level of accuracy. The next section
provides a discussion of the basis set efficiency for the present basis set ideas, leading
up to a justification for the use of phase space truncated weylets.
1.1 Basis Sets
Despite their nonorthogonality, one popular choice of basis is the real Gaussian
functions in configuration (position) space. Gaussians of 1 DOF are simple in that
each function depends upon only two parameters, width and center. Also, contribut-
ing to their popularity is the convenience they provide in calculating Hamiltonian
matrix elements: the kinetic and overlap matrix elements have analytical representa-
tions, and the potential matrix elements may always be obtained using Gauss-Hermite
quadrature16—typically requiring very few quadrature points, due to the good local-
ization and lack of oscillations of the Gaussian function. The localization property
allows for two further benefits. First, in 1 DOF at least, one can effectively target
eigenstates of an arbitrary Hamiltonian below a chosen energy value. Second, the lo-
calization results in effectively sparse (most matrix elements nearly zero) Hamiltonian
and overlap matrices, allowing for the application of fast computational techniques.
4
In addition to the above inherent characteristics, the literature discusses improve-
ments upon the efficiency of the Gaussian basis set, i.e., basis optimization schemes
for selectively choosing the center and width of each Gaussian function, sometimes
guided by physical arguments. Earlier works17,18 developed accurate algorithms for
choosing parameters of a single DOF Gaussian basis, in accordance with a balance
of semiclassical (SC) arguments and basis overlap criteria. The ultimate goal was
to achieve optimal efficiency while avoiding linear dependency issues. The SC meth-
ods produce a basis of narrow Gaussian functions densely distributed in regions of
high momentum (low potential) and wide functions sparsely distributed in regions
of low momentum (high potential) in configuration space. Unfortunately, the strict
SC-Gaussian methods do not work well in more than 1 DOF, due to a lack of either
straightforward applicability17 or optimal convergence.18 Refs. [17] and [19]-[22] ad-
dress this issue and propose new approximate SC methods with19,20,22 or without17,21
additional non-SC techniques. Other sources23,24 report alternatives to SC thought
in Gaussian basis development.
The latter techniques are critical for achieving respectable efficiencies on systems
involving high energy, heavy particle dynamics. Other methods have been developed
using different basis functions such as nonlocal plane waves,25,26 configuration space
grid functions,27–29 or wave functions of solvable Hamiltonian systems.30 The cor-
responding methodologies involve either point transformations of the coordinates of
the basis functions,27,28,30 the variational principle,29 or a combination of both.25,26
Although impressive efficiencies have been achieved at low DOF’s, all of these ap-
proaches have compromising issues at high DOF’s. The methods of Refs. [25]-[29],
for instance, exhibit exponential decay of the efficiency as the number of DOF’s in-
crease, which is an inherent problem of direct product basis sets (DPB’s).14,24,29,31
Cargo and Littlejohn show that their proposed canonical transformation does not
produce efficiencies at higher DOF’s tantamount to the results obtained in their 1
DOF Morse oscillator example.30
5
In this dissertation, a different basis methodology is used—the only one currently
known that formally defeats exponential scaling. The method hinges upon a phase
space picture that is applied to both the representational basis set, and the desired
eigenstates of the target application system. It exploits the fact that the simple
phase space picture used becomes exact in the Wigner-Weyl (WW) sense,32,33 in the
large basis limit. More specifically, in order to decide upon an optimal basis set, one
must have a clear picture of how the wave functions, |θi〉, of H are represented on
phase space. The set of K orthonormal |θi〉’s that lie below some energy Emax span
a subspace of the Hilbert space that can be represented by the projection operator
ρK =K∑
i=1
|θi〉〈θi| . (1.1)
The WW phase space representation of this operator (labeled “ρK”) is a probability
distribution function that is well-contained within the classically allowed region, R,
i.e. the region enclosed by the energy surface H = Emax, where H is the classical phase
space Hamiltonian function corresponding to H, under the WW mapping. The phase
space picture presumes that Emax is quantized such that the possible volumes of Rare K(2πh)f , where f is the number of DOF’s. In other words, each |θi〉 corresponds
to a non-overlapping region of volume equal to that of a Planck cell. This picture
becomes more accurate as K increases, in which the limit of ρK approaches a uniform
value of one within R, and zero outside.34
The phase space picture, as discussed above, can be applied to any type of repre-
sentational basis set, provided the basis is orthogonal. It can also be generalized for
nonorthogonal basis sets, although in this case, the picture must be modified in subtle
ways. For instance, complex-valued or phase space Gaussians (PSG’s), at first glance,
seem to be very good candidates for basis functions, especially under the guidance of
the phase space picture. First, their average momentum and position values, which
are also the parameters that distinguish one PSG from another, are simply the cen-
ters of their real-valued WW Gaussian representations. Each PSG can be mapped to
another by Weyl-Heisenberg phase space translation operators, hence they are also
6
referred to as Weyl-Heisenberg coherent states. Second, the WW representation of
each function is well-localized within a Planck cell region around the center. How-
ever, this property does not necessarily extend to a finite collection of PSG’s on phase
space. This can be best explained by introducing the projection operator ρN which
represents the finite set of PSG’s, |φi〉, i.e.,
ρN =N∑
i,j=1
[S−1]ij|φi〉〈φj| (1.2)
where each element of the overlap matrix, S, is given by Sij = 〈φi|φj〉. Unlike
Eq. (1.1), Eq. (1.2) has cross terms due to the nonorthogonality of the PSG’s. The
magnitudes of these terms are directly related to the degree of nonorthogonality of
the PSG’s and also signify the degree of “collective” delocalization. Consequently,
ρN can have significant probability far from the centers of all N individual PSG’s,
depending upon how closely these are bunched together.
Thus, although individual PSG’s are well-localized, collectively they need not be.
This has important ramifications for the phase space truncation scheme,35–37 i.e. the
method used to restrict the representational basis in order to achieve the highest
possible efficiency. This method is extremely simple, i.e. retain only those basis
functions whose centers lie within or near ρK . In order to be effective however, this
requires that ρN be well-localized about the basis phase space centers, which in turn
requires that the basis |φi〉 be orthogonal. The ideal basis should therefore be both
localized and orthogonal—precisely the two defining properties of weylets,35–37 as will
be discussed later.
First though, we find it useful to continue our discussion of PSG’s. The set of all
PSG’s, a.k.a. “coherent states,” comprises an infinitely overcomplete family of vectors
in the Hilbert space, that nevertheless satisfies a certain resolution of the identity.38
The completeness aspect reinforces the idea of using PSG’s for the representation of
arbitrary quantum states; however, overcompleteness leads to a new drawback, i.e.
linear dependence, quite distinct from the issue of nonorthogonality and collective
nonlocality. To remedy this, one work39 suggests the use of subsets of PSG’s (in the
7
1 DOF case) with centers lying on a line or circle in phase space. The goal is to
limit the basis expansion choosing from a pool of functions grouped together by some
simple criterion. Other ideas, within the confines of single dimensional manifolds, have
ranged from the random placement of PSG centers forming an irregular polygon,40
to their selective placement along the classical energy surface H = E of the system in
question, in order to get efficient representations of wave functions at energies close
to E.41
On the other hand, a discrete rectilinear lattice of PSG’s41 would be easier to
handle than the constructs mentioned above, particularly for multi-DOF applica-
tions. The most popular lattice arrangement is the von Neumann lattice42 where
there is one PSG per rectangular (or “hypercubical”) Planck cell. This density is
“critical” with respect to providing completeness without linear dependence, and is
denoted “d=1”. Use of the von Neumann lattice has spread to the fields of condensed
matter,43,44 quantum optics,45–47 and molecular physics41 regarding the representa-
tion of arbitrary quantum states. They have also gained notice outside of quantum
mechanics, in the realm of communication theory involving signal decomposition and
transmission.48 The completeness of these functions has been well-established,49–51
and minimal expansions using truncated sets of lattice functions have been shown to
be robust45 and extremely efficient46 in representing harmonic oscillator and squeezed
states.
These findings support the use of von Neumann lattice functions as a basis for
the calculation of energy eigenvalues and wave functions. Unfortunately, Davis and
Heller41 found that the convergence of the eigenvalues was slow with respect to the
number of basis functions for arbitrary systems not resembling that of the above-
mentioned harmonic oscillator or squeezed state (which are special cases), in effect,
because of the collective nonlocality problem discussed earlier. They showed that
improved performance could be achieved by increasing the lattice density, i.e., d > 1,
but this introduces near linear dependencies into the basis, and further delocalizes ρN .
8
Most importantly, however, d > 1 implies exponential scaling of basis set efficiency
with respect to the number of DOF’s.
Poirier35,36 applied the Lowdin canonical orthogonalization procedure52 to the
set of single DOF (d = 1) von Neumann lattice PSG’s in the hopes of improving
efficiency, by introducing orthogonality into the basis. Unfortunately, the resultant
orthgonalized basis functions are no longer individually well-localized, even when
the localization is optimized. In fact, Balian53 and Low54 proved that all critically
dense lattices of states supported by the Weyl-Heisenberg group cannot satisfy the
properties of orthogonality and good phase space localization, simultaneously.
Wilson55 developed a simple trick for eluding the Balian-Low “no go” theorem. By
using bimodal basis functions (in the single DOF case), consisting of both positive
and negative momentum components in a symmetric fashion, and simultaneously
working on a doubly dense lattice, d = 2, he was able to construct a complete,
orthonormal lattice representation for which all basis functions decay exponentially
in phase space. Daubechies et. al.56 applied Wilson’s idea to a set of tight frame
functions, each composed of an expansion of doubly dense PSG’s. In this dissertation,
we label these types of functions as Weyl-Heisenberg wavelets, or weylets, “Weyl-
Heisenberg” because they are transformed into each other via the operators of the
Weyl-Heisenberg group, and “wavelet” because, in the 1 DOF case, there are two
parameters or quantum numbers (signifying the center of the weylet on 2-dimensional
phase space) needed to label each function in the set.
Using the ideas of Wilson and Daubechies, Poirier35–37 then derived the optimally
localized weylet basis in 1 DOF, as well as, an efficient numerical scheme for their
construction, rendering them computationally practical for multi-DOF bound state
calculations. The Poirier weylets are extremely well-suited to the phase space trunca-
tion scheme35–37 mentioned previously. The weylets are easily extended to f DOF’s,
where approximately, one can think of the weylet ρN as a group of 2f−dimensional
“blocks” (or Planck cells) that are not overlapping and are concentric with the N
individual weylets. This uniform region, R′ [with a volume of N(2πh)f ], becomes
9
more of an accurate picture of ρN as N increases, following the same principle as
that described in Ref. [34]. In practice, the truncation scheme involves keeping those
N blocks whose centers on phase space have classical energies less than some chosen
Ecut parameter which is usually chosen to be slightly larger than, if not the same as,
the energy Emax, the maximum cutoff of the K wave functions of interest. Wasted
space/inefficiency (N > K) is only manifested by those lattice blocks on the periphery
which only partially overlap R. Overall, the efficiency does not exponentially decay
as DOF’s increase, unlike DPB methods, and approaches perfection (K/N −→ 1) in
the large K and N limit since R and R′ begin to resemble each other on phase space.
Phase space truncation and maximal phase space localization are what distin-
guish weylets from the very popular DPB’s, which warrants further discussion. Like
DPB’s, the multi-DOF weylets are products of single DOF functions in each of the
coordinates, i.e.,
ϕi(q) =f∏
j=1
ϕij(qj) (1.3)
where i = (i1, i2, . . . , if ) and q = (q1, q2, . . . , qf ) (although in general, DBP’s do not
necessarily use identical functions for each DOF). For the DPB case, because basis
truncation is applied to each DOF independently, the corresponding R′ (approximate
WW region on phase space representative of the DPB set) adopts a “cylindrical”
shape, i.e.,
R′ = R′(1) ⊗R′(2) ⊗ . . .⊗R′(f) (1.4)
where eachR′(j) corresponds to 2-dimensional phase space regions representing sets of
Nj single-DOF basis functions, i.e., N =∏f
j=1 Nj. If one were to attempt to mold R′
to resemble the ρK region R which is not cylindrical, then there would be significant
wasted space in the “corners,” corresponding to extra basis functions in ρN . The
result is exponential scaling of K/N with f ,24,29 even if one optimally determines
the individual R′(j) to produce a product region R′ that most efficiently covers R.29
On the other hand, the phase space truncation scheme35–37 precisely removes those
problematic regions.
10
Despite the poor efficiency at many DOF’s, DPB’s are still very popular because
they are convenient, and in some cases necessary, e.g., for the discrete variable rep-
resentation method (DVR).57–62 The DVR method uses a configuration space grid
representation which has the tremendous advantage that potential matrix elements
can be computed without the need for costly integrations. For DPB’s that give sparse
matrices H, iterative eigensolvers can be used such as the Lanczos method—although,
as pointed out by Dawes and Carrington,63 nobody has been able to advance beyond
four-atom molecules using DPB iterative methods, which has been an ongoing area
of study for over ten years.64
The “build and prune” approach,63 an idea that has been around for over 20
years,65–67 represents a substantial improvement to DPB’s, vis-a-vis advancing theo-
retical spectroscopic analysis beyond DPB limits. Functions within the DPB set that
negligibly contribute to the target states are pruned away, resulting in a correlated
truncation of DPB functions (i.e. truncations for individual DOF’s are no longer
independent of each other). The present weylet approach with phase space trunca-
tion is thus another build and prune method. However, unlike all other strategies,
the weylet version is the only one currently known that defeats exponential scaling.
Proving their worth, the weylet calculations have already been applied successfully to
model systems up to 15 DOF’s and beyond,37 which is a record using direct matrix
diagonalization techniques.
1.2 Organization of Dissertation
There are three parts to the body of this dissertation that are each complete in
that they contain all necessary explanations and background sufficient for them to
be independent works. The first part, i.e. chapter II, describes the first application
of the phase space truncated weylets to a real molecular system, the weakly bound
neon dimer in its ground electronic state. The majority of the chapter addresses
the technical details needed to efficiently calculate the matrix representation of an
arbitrary potential up to 3 DOF’s in the weylet basis; the development of the kinetic
11
matrix is comparably insignificant because the matrix is sparse and the elements can
be found via analytical expressions.
The subsequent part (Chapter III) uses projection operators that are reflective of
the system to customize the individual weylet functions themselves (as opposed to just
their truncation), resulting in a new nonorthogonal basis representation with large
improvements in accuracy and efficiency. The mathematical ideas for the implemen-
tation of the operators are inspired by works of Bracken, Doebner, and Wood.68,69 As
a preliminary work, the new method is only applied to model systems up to 4 DOF’s
in the hopes that, with future developmental efforts, it could be used for real systems.
Chapter IV, the last part of this dissertation, presents a new iterative method for
diagonalizing large sparse symmetric matrices. We test the approach by applying it
to harmonic oscillator systems up to 6 DOF’s represented in the truncated weylet
basis. The method uses a subspace iteration method that is very suitable for paral-
lelization (more so than Lanczos) if a simple preprocessing procedure is performed
on the matrix beforehand. The ideas for the preprocessing come from single-particle
density matrix purification schemes used in ground-state electronic-structure calcu-
lations.70,71 In addition to addressing the computational performance and scalability
of the method, this chapter also reports the tremendous efficiency of the truncated
weylet basis at very large N .
12
CHAPTER II
ROVIBRATIONAL SPECTROSCOPY CALCULATIONS OF Ne2∗
2.1 Introduction
Weakly bound molecular systems exhibiting large amplitude motion or “floppy”
behavior have been of longstanding amongst theorists and experimentalists, despite
technical challenges caused by low dissociation energies, large bond lengths, and sub-
stantial anharmonicities. For theoreticians interested in computing exact rovibra-
tional spectra for such systems, these challenges manifest as extremely costly “direct”
matrix diagonalizations.16 Two basic strategies have been developed to deal with this
situation: (1) optimize the representational basis set for the particular system, in
effect directly reducing the matrix size, N ; (2) apply iterative methods which, due
to sparsity or other reasons, need not store the complete matrix. Both of these ap-
proaches are directly amenable to the most convenient and commonly used choice of
basis representation, i.e., direct product basis sets (DPB’s) where the basis functions
are separable products in the coordinates.14,31 In particular, the discrete variable
representation method (DVR)57–62 is a popular configuration space grid represen-
tation based on DPB’s. The potential-optimized DVR methods,72,73 including the
maximally-efficient variety, the phase space optimized DVR,29,74 are, as implied by
the name, examples of (1) above, whereas the sparsity of the multidimensional DVR
matrix representation of the Hamiltonian implies that these are also ideally suited to
(2) above.64,75,76
Despite such advances, all DPB and associated DVR methods are still charac-
terized by exponential scaling with respect to the number of degrees of freedom
∗Reproduced with permission of the American Institute of Physics from ”Rovibrational spec-
troscopy calculations of neon dimer using a phase space truncated Weyl-Heisenberg wavelet basis”
by R. Lombardini and B. Poirier. Journal of Chemical Physics, Vol. 124, pp. 144107 (with minor
alterations and additions). Copyright 2006 by the American Institute of Physics.
13
(DOF’s).24,29 In other words,N
K∝ ecf (2.1)
where K is the number of eigenvalues computed to a given accuracy level, and f is the
number of DOF’s. The positive exponent c can be minimized via DPB optimization,
but never reduced to 0.
Eliminating exponential scaling requires a non-DPB method. In a recent series
of papers,35–37 one of the authors has introduced a promising new non-DPB method
based on symmetrized orthogonal Weyl-Heisenberg wavelets, or “weylets.” Though
offering many of the advantages of a DPB, when combined with a phase space trun-
cation scheme (Sec. 2.2.1), the weylet representation can be shown not only to defeat
exponential scaling, but also to approach perfect efficiency (K/N → 1) in the large N
limit, regardless of dimensionality. It has already been used to extend direct matrix
eigenvalue calculations for model systems to 15 DOF’s and beyond.37 Borrowing from
the basis truncation scheme used here, another exact quantum method has recently
been developed and applied to similar model systems;63 however it does not satisfy
perfect asymptotic efficiency.
Neither of the two methods described above has been previously applied to real
molecular applications, due primarily to difficulties in representing an arbitrary po-
tential energy operator in the truncated basis representation. In this chapter, we
present an efficient numerical method for achieving this important goal in the case of
a weylet representation. Although formally the resultant weylet potential matrix is
in general dense, in practice, the optimized phase space locality of the weylet basis
ensures that many matrix elements are essentially zero. Since the multidimensional
weylet kinetic energy matrix is also formally sparse, this enables the use of sparse
iterative matrix techniques,77,78 thus greatly increasing the matrix sizes N that may
be considered. This is especially important for higher dimensional calculations, for
which it has been observed that K increases much greater than linearly with N .37 For
simplicity, however, only direct matrix diagonalization methods are employed here.
14
In this chapter, we apply the weylet method for the first time to a real molecular
application, using the new potential matrix element evaluation technique (discussed
in more detail below). In particular, the bound rovibrational energy levels of Ne2 in
its ground electonic state ( 1Σ+g )—the simplest neon cluster system—are computed
in the full 3 DOF Cartesian coordinate representation. The dimensionality is not
especially large; however, the goal is to demonstrate feasibility of the weylet approach,
in anticipation of future, more challenging applications. The calculation is, at any
rate, of non-trivial difficulty, owing to the weakly-bound and long-range character of
the interaction.
Van der Waals complex systems such as rare gas clusters have gained much atten-
tion in recent years. Statistically, clusters (even with as few as seven atoms)79 exhibit
a coexistence of phases over a range of system temperatures and energies, serving as
a prototype for bona fide bulk matter phase transitions, and also solvation. Dynam-
ically, the long-range but small-magnitude van der Waals and dispersion forces in-
volved result in very interesting behavior totally unlike traditional covalently-bonded
molecules—although they are now known to play an important ancillary role even for
covalent systems. In particular, serving as the ultimate floppy/anharmonic molecules,
clusters are not well described by the conventional equilibrium geometry/normal mode
analysis, and thus require exact quantum treatment for their elucidation. This has
motivated the development of accurate ab initio80,81 and semi-empirical82–87 poten-
tials, and a number of experimental studies.82,88,89 In particular, the Aziz potentials
(semi-empirical) have had much success in reproducing macroscopic and microscopic
properties close to experiment for dilute neon gas,85,86 as well as highly compressed
neon solid.87
From a dimensional scaling standpoint, neon clusters are also useful as a purely
computational benchmark. In particular, the near-pair-wise nature of the interaction
renders it quite convenient to expand the dimensionality simply by adding more neon
atoms without the need to develop a full-blown potential energy surface (PES) beyond
Ne2, or perhaps Ne3. However, as the main focus of the present work is to establish
15
feasibility of the weylet method for real molecular systems, only the dimer will be
considered here, using a simple Lennard-Jones (LJ) model. The resultant computed
rovibrational energy levels may be directly compared with those of a previous Carte-
sian calculation.15 Though not quite as accurate as the Aziz potential, the simple LJ
model does provide semi-quantitative accuracy for rare gas cluster systems, as amply
demonstrated by previous theoretical investigations.90–93
The Cartesian coordinate aspect of the present study bears discussion. Despite
the reduction in dimensionality obtained by separating vibrational and rotational mo-
tions, the Cartesian rovibrational approach offers many computational advantages,15
particularly with respect to dimensional scaling investigations relevant to clusters.
However, the primary motivation for the present weylet study is simply that weylet
basis sets have not yet been defined for non-Cartesian coordinates35–37 (although pre-
vious work by Johnson and coworkers94 strongly suggests that such a generalization
can be achieved). Note that the weylet approach is not limited to cluster systems;
indeed, in many respects, such systems constitute a “worst-case scenario” for a weylet
representation, owing to concave phase space regions that favor a more traditional
affine wavelet approach,94–96 and to shallow potential wells that present relatively
small regions of available phase space.
The new potential matrix element evaluation method is essentially a Gauss-
Hermite quadrature scheme,16 exploiting the fact that each weylet basis function
can be explicitly decomposed as a sum of Gaussians. Since the product of two Gaus-
sians is also a Gaussian, one can use standard Gauss-Hermite quadrature techniques
to evaluate the requisite potential energy integrations. Since a potentially large num-
ber of operations is involved, especially for the multidimensional case, a number of
“tricks” are introduced to reduce to a bare minimum the computational (CPU) ef-
fort required to set up the matrices. For simplicity, and because the dimensionality
considered here is only three, one trick that is not implemented is the sequential
summation and truncation idea described in Ref. [37].
16
The remainder of this chapter is organized as follows. Section 2.2 will briefly sum-
marize the development of the phase space truncated weylet approach, as documented
more thoroughly in Refs. [36] and [37]. Section 2.3 discusses in detail the application
of the method to both a 1 DOF radial implementation, and a full 3 DOF Carte-
sian version for dimer systems. A majority of the description involves the explicit
recipe used to generate Hamiltonian matrices in the weylet representation. Section
2.4 presents results, followed by discussion.
2.2 Theoretical Background
2.2.1 Phase Space Truncation
The starting point of the phase space truncation scheme is the uniformly mixed
ensemble—the projection operator ρK spanned by the K lowest energy eigenstates,
|φi〉, of the system Hamiltonian, i.e.,
ρK =K∑
i=1
|φi〉〈φi| . (2.2)
It can be shown29,34 that the Wigner-Weyl phase space representation32,33,97,98 of ρK
terms. The multidimensional generalization of diamond truncation would reduce this
to two correlated summations,37 one each for the primed and unprimed indices. How-
ever, the exponential decay of the cmn indices, together with the product form of the
summand in Eq. (2.28), imply that a further correlation across primed and unprimed
indices may also be applied. This results in the above “simplicial” summation scheme,
for which the number of summand terms is reduced by something like 12! [or (4f)!
in general], thus avoiding exponential scaling.
Equation (2.19) generalizes to a sum of eight terms,
[V ψ
]u,v,u′,v′
=(
1√π
)3
e−π4u∆·u∆
1∑
k1,k2,k3=0
(−1)k1+k2+k3 b(u,v,u′, (−1)kv′), (2.29)
where (−1)kv′ = ((−1)k1v′1, (−1)k2v′2, (−1)k3v′3), and
b(u,v,u′,v′) =∫ ∞
−∞
∫ ∞
−∞
∫ ∞
−∞e−ρ·ρ
3∏
j=1
cos[v∆
j
√πρj − ζj
] V
ρ +
√π
2u+
a
dρ1dρ2dρ3.
(2.30)
The other quantities above are defined as follows: u∆ = (u − u′); v∆ = (v − v′);
u+ = (u + u′); ρ = (ax−√
π2
u+); ζj = (π/2)(u∆j v+
j + v∆j ).
2.3.5 Numerical Implementation: 3 DOF Case (Cartesian Ne2)
For the most part, the 1 DOF numerical recipe provided in Sec. 2.3.3 generalizes
in straightforward fashion to the 3 DOF case. However, insofar as the there are
substantial differences, these will be addressed here.
Regarding Step 1, there are in principle not one, but three different aspect ratio
parameters, ax, ay, and az. Given the spherical symmetry of the rovibrational Ne2
system however, it is clear all three should have the same value in this case. Similar
comments apply to the position offset parameter, ε. Regarding weylet basis trunca-
tion, the simple Hmid < Ecut would be suitable for most applications, but does not
29
work especially well in the present case, for reasons described in Sec. 2.4.2, where an
alternate prescription is described.
For Step 2, all of the time-saving tricks from Sec. 2.3.3 may be applied, as well
as additional symmetry considerations. For instance, the Eq. (2.30) integration is
invariant with respect to permutations of the three components, (x, y, z), which can
be exploited to reduce CPU time and storage. This is particularly important, given
that the 3 DOF integrations now involve Q3, rather than Q, quadrature points. Note
that each of the ζj quantities in Eq. (2.30) independently determines whether its
corresponding sinusoidal factor is a sine or a cosine. Consequently, there are in
principle not two, but 2f = 8 different integration tables analogous to Eqs. (2.21) and
(2.22). However, permutation symmetry enables us to reduce these to just (f +1) = 4
tables, Bc3 , Bc2s, Bcs2 , and Bs3 , with obvious notation, e.g.
[Bc2s]u+,v∆=
∫ ∞
−∞
∫ ∞
−∞
∫ ∞
−∞e−ρ·ρ cos
(v∆
1
√πρ1
)cos
(v∆
2
√πρ2
)sin
(v∆
3
√πρ3
)
×V
ρ +
√π
2u+
a
dρ1dρ2dρ3. (2.31)
Permutation symmetry can also be applied to the individual tables themselves, re-
sulting in a f ! = 6-fold reduction in the number of Bc3 and Bs3 table elements that
must be computed, and a (f − 1)! 1! = 2-fold reduction for the Bc2s and Bcs2 tables.
Symmetry can also be used to restrict the range of relevant table values beyond
Eq. (2.23), although in this context, it is rotational rather than permutation symmetry
that is responsible. Thus, in addition to
0 ≤ u+j ≤ u+
max ; 0 ≤ v∆j ≤ v∆
max for all j, (2.32)
one may also apply “spherical” truncation,
|u+| ≤ u+max ; |v∆| ≤ v∆
max, (2.33)
where the vector lengths are now computed in the usual 2-norm sense. The validity
of Eq. (2.33) is established in Appendix A. Note that as in the 1 DOF case, the
30
u+j = 0 and/or v∆
j = 0 table elements are identically zero when j corresponds to a
sine function in the Eq. (2.30) integrand.
Finally, we comment that as in the 1 DOF case, the Eq. (2.29) integral decreases
extremely rapidly with u∆ · u∆ = |u∆|2. As discussed in Appendix A, the result is
that[V ψ
]u,v,u′,v′
may be taken to be zero except when
|u∆| ≤ u∆max. (2.34)
As in the 1 DOF case, this leads to reduced computation and increased sparsity,
although the effect is much more pronounced for higher dimensionalities.
2.4 Results and Discussion
2.4.1 Results for Radial Ne2 (1 DOF Case)
As discussed previously, the simple H(qs, pt) ≤ Ecut basis truncation criterion
works extremely well for most molecular systems,35–37 but is expected to be less ef-
ficient for Ne2. In other respects as well, weakly-bound systems with long-range
interactions present a “worst-case scenario” for the present weylet approach—thus
providing another solid (if slightly perverse) motivation for the present study. The
reasons for this are two-fold. First, the weakly-bound aspect implies that K is small
even up to the dissociation threshold, thus placing us far from the large K limit where
K/N → 1. Indeed, Ne2 has only two vibrational levels. Second, the long-range inter-
action implies concave, rather than convex, phase space regions for sufficiently high
Emax—thus favoring conventional affine wavelets over weylets.35–37,94 As mentioned
before, with Emax at the dissociation threshold as is the case here, any Ecut > Emax
results in a phase space region in the continuum, with infinite extent (although in
practice, this need not cause a difficulty.)
To ameliorate the above situation somewhat, and because the 1 DOF calculations
are so inexpensive, we have opted for a more labor-intensive, but rigorously optimal
basis truncation scheme. Specifically, a weylet is discarded if the resultant computed
vibrational eigenvalues for J = 0 agree with those of a more accurate reference cal-
31
culation15 to within some desired accuracy. Weylet block pairs are thus “whittled
away” from an initial large rectangular lattice, starting from the top and bottom
rows and working inwards towards the q axis. For the particular aspect ratio param-
eter value a = 1.6 a.u. (choice explained below), this procedure was applied to Ne2
at an accuracy level equal to 2% of the well depth, ε (≈ 0.4949 cm−1), and again for
.2% and .02% of ε. The resultant truncated basis sets are indicated schematically in
Fig. 2.5. Note that even for the present worst-case application, the resultant pattern
of blocks conforms roughly to the H(q, p) = 0 region, thus validating the phase space
truncation idea. The resultant computed eigenvalues are presented in Table 2.2.
To optimize with respect to the other two weylet basis parameters, a and ε, one
can repeat the above procedure for many different values, and determine which yields
the smallest basis size N . We have performed such an optimization for a, but have
simply taken ε = 1/2 throughout (resulting in half-integer values for both s and t
indices). For the 2% accuracy calculation, this led to the optimal choice a = 1.6 a.u.,
resulting in a basis size N = 10.
The above studies were performed primarily to assess basis efficiencies associated
with a given level of accuracy (the three benchmark values chosen above correspond
to those used in previous calculations35–37). However, we have also performed a much
more accurate reference calculation, using N = 133 weylets with regions lying within
the phase space rectangle 0 ≤ x ≤ 21 a.u., −20 ≤ p ≤ 20 a.u. This basis was used
for all J values that support bound levels (J ≤ 9), resulting in a determination of
all rovibrational bound states to an accuracy of 10−4 cm−1 or better. The results are
presented in Table 2.4, Column 3.
The remaining convergence parameters were chosen with a view towards ensuring
that these do not contribute appreciably to numerical error, even for the highest
accuracy reference calculation. In particular, the parameter values u+max = 30, v∆
max =
8, and u∆max = 5 were found to be converged to 10−7 cm−1 or better for both computed
eigenvalues. Similarly, Q = 100 led to 10−6 cm−1 convergence. The dependence on
rcut was also found to be very insensitive, with both eigenvalues changing by only
32
2× 10−4 cm−1 over the range 3.8 ≤ rcut ≤ 5.0. The value rcut = 4.6 a.u. was used for
all subsequent calculations.
2.4.2 Results for Cartesian Ne2 (3 DOF Case)
The 3 DOF Cartesian calculations of the Ne2 rovibrational states make no attempt
to exploit rotational symmetry or degeneracy—unlike a previous calculation.15 As a
consequence, the required basis size N is much too large for the optimal quantum
truncation scheme to be applied. Nevertheless, we can still exploit the results of
Sec. 2.4.1 for the 3 DOF calculation, as described below.
The first step is to define a trial phase space region in 1 DOF, |p| ≤ pmax(r), which
encloses the centers of only the non-discarded weylet pair blocks corresponding to the
optimal, 2% accurate, N = 10 basis from Fig. 2.5. The functional form
pmax(r) = χ
√1− (r − η)2
ω2e−τ |r−ξ|, (2.35)
where the five parameters allow for flexibility of the shape: χ provides the vertical
height, η the location of the ellipse center along the position axis, ω the horizontal
axis length, τ the degree of exponential decay, ξ the position of the start of the
exponential decay. The numerical assignments χ = 6.2, η = 8.6, ω2 = 24.6, τ =
0.186, and ξ = 5.4 (all in a.u.), is found to fill the bill, i.e., the above parameter
choice has been somewhat optimized to minimize the volume of the resultant phase
space region, which is presented in Fig. 2.6.
Once a suitable pmax(r) is constructed as above, radial symmetry is used to extrude
this region into the full six-dimensional phase space of the Cartesian system, by
replacing p → |p|, and r → |x|. Only those multidimensional weylets whose centers
lie within the new region are retained for the 3 DOF calculation—i.e., those that
satisfy |pt| ≤ pmax(|rs|). Using the same a and ε values as in the 1 DOF case, this
results in a 3 DOF basis of N = 3480 weylet functions.
The above basis was anticipated to yield rovibrational energy levels within the
desired 2% accuracy level, but in fact fell somewhat short of this goal. To improve
33
accuracy, the phase space region was enlarged via simultaneous variation of χ, ω, and
τ . Also, in practice it was found that including the interior weylet functions (with
centers in the potential cap region) substantially improves accuracy, at the cost of
adding only around 1200–2500 additional basis functions. Table 2.3 indicates the
resulting convergence of the two pure vibrational level energies. The largest of these
calculations (N = 24 392) computed all 125 rovibrational states to within the desired
2% tolerance (using the fully converged 1 DOF calculations of Sec. 2.4.1 and Ref.
[15] as reference). The computed energies are presented in Table 2.4. For the above
calculations, all of the remaining parameters were converged to within 10−2 cm−1—
i.e., still substantially smaller than the basis truncation error. The values u∆max = 4,
v∆max = 6, Q = 17, and rcut = 4.9 a.u. were used for all calculations (except for the
last row where u∆max = 5 and Q = 18 were increased by one for better convergence),
whereas u+max was varied as per Table 2.3, Column 5.
Of the four computational steps described in Sec. 2.3.3, Step 3 was generally found
to be the bottleneck, due to the Eq. (2.28) summation. The choice mmax = 6 results
in 2625 summand terms per potential matrix element. In comparison, mmax = 4
and mmax = 2 require 313 and 25 summand terms, respectively. As the computed
eigenvalues were found to differ only by around 0.02cm−1 between mmax = 6 and
mmax = 4, the latter was used for the results presented in Tables 2.3 and 2.4. To
improve numerical performance, some effort was made towards finding the “leanest”
(i.e., least time consuming) calculation that computes all vibrational levels to within
the 2% tolerance. Using region parameters χ = 7.5, η = 8.6, ω2 = 35.0, τ =
0.160, and ξ = 5.4, and shaving off some of the high-momentum weylets in the
potential cap area, a suitable N = 10 896 basis was obtained. With the additional
parameter choices u+max = 19, v∆
max = 4, u∆max = 3, Q = 10, and mmax = 2, the total
time (for all steps) required on a Compaq Alpha 1200 MHz CPU was found to be 1.2
hours, with 99/125 rovibrational levels computed to within the strict 2% tolerance,
and the remaining 26 to a comparable level of accuracy.
34
2.4.3 Discussion
For reasons discussed in Secs. 2.1 and 2.4.1, the present Ne2 application presents
a “worst-case scenario” (apart from the low dimensionality) for the weylet basis ap-
proach. By any measure, even using more established optimized methodologies,15 the
3 DOF Cartesian Ne2 system presents a very challenging numerical calculation. It is
therefore reassuring to discover that the weylet approach is nevertheless competitive.
In comparing with Ref. [15] for instance, one finds that the basis sizes and especially
CPU times required to compute most of the 125 rovibrational states are greatly re-
duced; however, the computed eigenvalue errors are also substantially larger, to the
extent that two-to-three digits of accuracy are lost.
For the lean 3 DOF calculation, the resultant basis efficiency K/N = 99/10 896 ≈0.01 is not especially large. The corresponding efficiency value for the 3 DOF isotropic
harmonic oscillator system, for instance (Table IV. of Ref. [37]), is around 0.25—
although the comparison is somewhat biased against the Ne2 case because of the
way that accurate eigenvalues are counted. In any event, there are two important
causes that underlie this “efficiency gap”: (1) small and concave-shaped phase space
region; (2) large singularity “hole” in the potential cap region, due to use of Carte-
sian coordinates, and to large Ne2 equilibrium separation. Note that quadrature error
can definitely be ruled out as a major cause (Sec. 2.4). In practice, very few molec-
ular applications should exhibit such low weylet efficiencies in 3 DOF’s, since any
modification to the above (i.e. deeper well-depth, shorter-range interaction, smaller
equilibrium separation, or use of non-Cartesian coordinates) would greatly improve
performance. Efficiencies for typical systems—especially deeply-bound systems at
energies substantially below dissociation—are likely to be much closer to harmonic
oscillator values.37
Future efforts will investigate other molecular systems more amenable to the
weylet approach, including those at higher dimensionalities. In this regard, even
larger neon clusters such as Ne3 are much more favorable than Ne2, apart from the
increased dimensionality. The reason is that for purposes of studying solvation, or
35
the liquid-solid phase change, the energy range of interest extends only up to the first
isomerization threshold—i.e., the energy of just one bond—which for NeN>2, is far
below the dissociation threshold.
Note that the simple Gaussian quadrature scheme as employed here—though
shown to be remarkably effective for the three-dimensional application considered—
nevertheless reintroduces exponential scaling, in that the total number of quadrature
points grows as Qf . The present procedure, however, is unnecessarily wasteful, in that
no attempt is made to remove the “corner” region quadrature points. This could eas-
ily be achieved via spherical truncation, as can be justified using an argument similar
to that of Appendix A. We did not bother to do so here because the integral evalua-
tions comprised only a small fraction of the total CPU time. At high dimensionalities,
quadrature integral evaluation could in principle become the computational bottle-
neck, although the spherical truncation remedy described above—which incidentally,
is applicable even for non-spherically-symmetric potentials—reduces CPU effort ex-
ponentially with increasing f . Alternatively, if the desired accuracy level is not too
high, Monte Carlo integration techniques could be used—applied in phase space so
as to replace oscillatory “modulated” Gaussians with ordinary Gaussians. Thus, the
CPU cost associated with matrix initialization need not become the bottleneck at
large dimensionalities.
Further improvements and modifications to the weylet method will also be ex-
plored, e.g. non-Cartesian coordinates, sparse matrix methods, and those discussed
in Sec. IV C of Ref. [37]. Clearly, however, the most important area for improvement
is increasing the level of accuracy that can be obtained with the weylet approach.
Using projection operator methods (to customize weylet basis functions for given ap-
plications), we have recently taken large strides in this area, as will be reported in
future publications.
36
Table 2.1. Parameter values for the Gaussian cap of Eq. (2.13), with rcut = 4.6 a.u.
J αJ (a.u.) βJ (a.u.) γJ (10−4 a.u.)
0 3.1600 0.3758 -1.0607
1 3.1562 0.3757 -1.0406
2 3.1485 0.3756 -1.0005
3 3.1370 0.3753 -0.9404
4 3.1218 0.3750 -0.8603
5 3.1029 0.3746 -0.7601
6 3.0805 0.3741 -0.6400
7 3.0547 0.3736 -0.4998
8 3.0257 0.3729 -0.3397
9 2.9935 0.3722 -0.1597
Table 2.2. Ground and first excited vibrational (J = 0) level energies for 1 DOFradial Ne2, computed from quantum truncated weylet basis (Fig. 2.5)for three different target accuracy levels, measured in units of the welldepth. The literature energy values (Ref. [15]) are −14.0245 cm−1 and−2.6834 cm−1, respectively.
% well depth basis size N ground (cm−1) excited (cm−1)
2% 10 -13.8710 -2.3519
0.2% 26 -13.9789 -2.6346
0.02% 36 -14.0200 -2.6786
37
Table 2.3. Convergence of computed ground and first excited vibrational (J = 0)level energies for 3 DOF Cartesian Ne2, with respect to increasing phasespace region volume. Columns 1–3: parameter values used to specifyphase space region in Eq. (2.35) (the values η = 8.6 and ξ = 5.4 are heldconstant). Column 4: resultant 3 DOF phase-space-truncated basis size,N (including weylets in potential cap region). Column 5: u+
max value re-quired for convergence to within 10−2 cm−1. Columns 6 and 7: computedlevel energies; ‘*’ indicates those lying within desired 2% error tolerance.
χ ω2 τ N u+max ground (cm−1) excited (cm−1)
6.2 24.6 0.186 5200 18 −13.0457 −0.1094
6.8 29.4 0.174 7464 19 −13.6977∗ −1.4897
7.4 34.2 0.162 10 992 20 −13.8204∗ −1.8764
7.7 36.6 0.156 12 768 20 −13.8581∗ −2.0180
8.0 39.0 0.150 15 120 20 −13.8718∗ −2.1568
8.9 46.2 0.132 24 392 22 −14.0288∗ −2.3599∗
38
Table 2.4. All 125 rovibrational bound states of Ne2, as computed using the 3 DOFweylet basis of Table 2.3, Row 6. Parentheses indicate numerical degen-eracies. Last two columns: corresponding reference 1 DOF results fromSec. 2.4.1 (Column 3) and Ref. [15] (Column 4).
6? -� r r r r rr r r r r r r r rr r r r r r r rr r r r r r r r rr r r r r r r r rr r r r r r r r rr r r r r r r rr r r r r r r r rr r r r r r r r r qp
(b)??
6? -� rrrrrr
rrrrrrrrrr
rrrrrrrr
rrrrrrrr
rrrrrrrr
rrrrrr
rrrrrrrr
rrrrrrrr q
p(c)
??Figure 2.1. Schematic indicating phase space partitionings associated with various
weylet basis sets in 1 DOF. Dots represent phase space centers (q, p) forunsymmetrized [i.e., f(x)-type] weylet functions. A single symmetrized[ϕ(x)-type] weylet is indicated by the ‘?’: (a) critically dense weylets,
(c) doubly-dense weylets of present work, ? = ϕ 32
32(x).
40
- 10 - 5 5 10 15 20 25
- 1.5
- 1.0
- 0.5
0.5
1.0
1.5
PSfrag replacements�
x
- 10 - 5 5 10 15 20 25
- 1.5
- 1.0
- 0.5
0.5
1.0
1.5
PSfrag replacements�
x
- 10 - 5 5 10 15 20 25
- 1.5
- 1.0
- 0.5
0.5
1.0
1.5
PSfrag replacements�
x
t = 1=2
t = 3=2
t = 5=2
Figure 2.2. Plots of six different symmetrized 1 DOF weylets, ϕst(x) vs. x, fora = 0.5 a.u. Each plot corresponds to a different t value as indicated,with both s = 1/2 (left) and s = 11/2 (right) weylets represented. Largert values are associated with increasingly oscillatory behavior.
41
6 7 8 9 10 11
10
30
- 20
- 30
- 10
20PSfrag replacementsV
r(cm�1)
(a.u.)
Figure 2.3. The Lennard-Jones potential used for the neon dimer.
Figure 2.5. Schematic indicating optimal quantum truncation of a = 1.6 a.u. weyletbasis functions used to compute bound vibrational (J = 0) level energiesfor 1 DOF radial Ne2 system. Thick solid/dashed/dotted lines enclosebasis used to achieve 2%/0.2%/0.02% well-depth error tolerance. Thethin solid line encloses the phase space region corresponding to all vibra-tional states up to the dissociation threshold, i.e. H(q, p) ≤ Emax = 0.
44
2.5 5 7.5 10 12.5 15 17.5- 2.5
- 5
- 7.5
- 10
2.5
5
7.5
10
PSfrag replacements (a.u.)(a.u.)
rp
Figure 2.6. Schematic indicating phase space regions used to truncate 3 DOF a =1.6 a.u. weylet basis functions to compute rovibrational bound states ofNe2. Innermost solid line encloses the phase space region correspond-ing to the dissociation threshold (Fig. 2.5). Next concentric solid linecorresponds to Eq. (2.35) with parameters from Table 2.3, Row 1, andoutermost concentric solid line to Table 2.3, Row 6. Inclusion of the capregion weylets corresponds to the dashed line extensions.
45
CHAPTER III
CUSTOMIZED PHASE SPACE REGION OPERATORS APPLIED TO BASIS
SETS
3.1 Introduction
For molecular bound state calculations, the choice of basis directly determines
the computational effort in solving the quantum Hamiltonian. More specifically,
the efficiency, K/N , where N represents the number of basis functions needed to
calculate K eigenvalues at a desired accuracy, has a strong dependence on the degree
of correlation between the basis functions and the target system. This relationship
has prompted research in basis optimization which focuses on the maximization of
the efficiency and thus, ultimately, the reduction of CPU effort and memory usage.
Symmetrized Weyl-Heisenberg wavelets, or “weylets,” comprise a type of univer-
sal orthonormal basis that can be effectively used to represent any bound molecular
system with an efficiency approaching perfection (K/N −→ 1) in the large N and
K limit, regardless of system dimensionality. Although the weylet basis functions
themselves are universal, the truncation of the basis set, achieved using a phase
space truncation scheme,35–37 is what tailors the method to individual systems. In
contrast, all direct product basis sets (DPB’s),14,31 even those for which individual
basis functions are optimized via self-consistent field or other techniques,29,99–101 ex-
hibit exponential reduction in efficiency with f , the number of degrees of freedom
(DOF’s).24,29 Consequently, the weylet approach has been applied to direct matrix
eigenvalue calculations for model systems of 15 DOF’s and beyond,37 far beyond what
would be feasible using a DPB. Recently, the method was successfully applied to a
real molecular system, Ne2 (in Cartesian coordinates).102 Although Ne2 presents a
“worst-case scenario” for the weylet method, in that f and K are small, and includes
states near the dissociation threshold, it is still competitive with other state-of-the-art
exact quantum dynamics methods.15
46
Phase space truncation of the weylet basis is effective because individual weylet
functions have good phase space localization, and are orthogonal. Achieving both
properties together is nontrivial,53,54 but can be achieved using a momentum-symmetrization
modification first introduced by Wilson55 and Daubechies et al.56 In 1 DOF, one starts
with a 2x overcomplete set of coherent states (CS’s) which are derived from phase
space Gaussians and arranged on a doubly-dense lattice (i.e., two CS’s per Planck
cell) on phase space. Provided the lattice of CS’s constitutes a tight frame, a partic-
ular linear combination of positive and negative momentum CS pairs then yields a
complete 1 DOF orthonormal weylet basis, |ϕi〉 (the general f DOF case is addressed
in Sec. 3.2.1). Poirier35–37 later refined the approach by constructing maximally phase
space localized weylets, in a computationally tractable manner for quantum dynamic
calculations.
To understand phase space truncation, we must first introduce two projection
operators:
ρN =N∑
i=1
|ϕi〉〈ϕi| (3.1)
and
ρK = Θ[Emax − H(q, p)] . (3.2)
The Wigner-Weyl (WW) phase space representation32,33,97,98 of ρK (simply labeled
as “ρK”) [see Fig. 3.2(a)] is a function that oscillates about unity in the classically
allowed region (QC) of phase space where H ≤ Emax, and is damped exponentially
outside this region. It can be shown that
ρK ≈ ρQCK = Θ[Emax −H(q, p)] (3.3)
in the large K limit, i.e., ρK can be associated with the classically allowed region of
phase space, R.34
Similarly, each single DOF weylet |ϕi〉 can be associated with a (momentum-
symmetrized) pair of blocks (corresponding to the CS pairs mentioned ealier), cen-
tered on the lattice sites, which on truncation, comprise a region R′ [see Fig. 3.1(a)].
47
If the block size is small (i.e., N and K large), then R′ closely resembles R and
N ≈ K, as the region volumes are proportional to basis size. In any event, it is the
blocks in the boundary of R′ that are the leading cause of inefficiency, since they
overlap R only partially. This effect is more pronounced at larger dimensionalities
for a given K value, so that in practice, the limiting difficulty of the weylet method
is the level of computed accuracy that can be achieved, rather than dimensionality
per se.
The main purpose of this chapter is to address this limitation by customizing the
individual weylet functions (i.e., not just their truncation) for particular applications.
Consider that the basis ρK |ϕi〉 = |ϕ′i〉, rather than |ϕi〉 itself, can in principle result
in an exact calculation of the lowest K eigenvalues. In the phase space picture,
this projection effectively transforms Fig. 3.1(a) to Fig. 3.1(b) which is seen to yield
R = R′ even when N and K are not large. Note that the peripheral basis functions
are most affected by the projection transformation.
In practice, the above picture is complicated by additional concerns. First, the
|ϕ′i〉 are not orthogonal—though this is easily remedied via orthogonalization52 or
direct solution of the generalized eigenvalue problem
H~v = ES~v , (3.4)
where H and S are the Hamiltonian and overlap matrices, respectively, in the nonorthog-
onal basis representation, and (E,~v) is the (eigenvalue,eigenvector) pair. More im-
portantly, |ϕ′i〉 might in principle be linearly dependent or nearly so, and in fact this
is almost certain to cause numerical instabilities if |ϕi〉 is a “random” basis or even
DPB, at sufficiently large f or N . Use of phase space truncated weylets as the start-
ing basis, |ϕi〉, almost completely alleviates this difficulty, however. Finally, ρK is not
known a priori—though ρQCK is known, and is likely to constitute a worthy substitute.
Fortunately, the mathematical development of ρQCK , or what we call the “phase space
region operator” (PSRO), and its action on arbitrary functions, has been extensively
studied by Bracken, Doebner, and Wood.68,69 With their insights, we have developed
48
very efficient projected basis sets, ρQCK |ϕi〉 = |ϕ(1)
i 〉 (the “(1)” superscript will be ex-
plained later), which are shown to greatly increase the accuracy levels that can be
obtained in weylet calculations.
A second idea explained in this chapter involves the use of momentum-symmetrized
Gaussians (SG’s), |ψi〉, rather than weylets, |ϕi〉, as the initial basis. SG’s centered on
the same lattice sites as their weylet counterparts span nearly the same subspace, and
individual SG’s are nearly orthogonal due to the momentum symmetrization.35–37,41
Moreover, the PSRO projected subspace of the SG’s and weylets are much closer still,
as compared to the unprojected case. Although |ϕ(1)i 〉 is still anticipated to be more
efficient than |ψ(1)i 〉, the latter are far more convenient to work with numerically.
Our results indicate impressive improvements in efficiency for PSRO-modified
weylets, |ϕ(1)i 〉, and SG’s, |ψ(1)
i 〉, on a wide range of model systems and dimension-
alities. The most noticeable improvements are in cases where N and K are small,
as expected. Moreover, the efficiencies of the two projected basis sets are nearly the
same, with SG’s actually more efficient than weylets in some cases, making them a
competitive basis if one can develop inexpensive techniques for the PSRO modifica-
tion. We have also found that multiple applications of the PSRO results in further
increases in efficiency for both basis sets, up to a point. The rationale here is that
higher powers of ρQCK are more nearly idempotent, and therefore presumably closer to
the exact projection operator ρK .
The remainder of this chapter is organized as follows. The next section presents a
brief description of the weylets and SG’s in the single and general f DOF cases (3.2.1).
Also, the theoretical development of the PSRO is discussed (3.2.2) and applied to the
special case of the harmonic oscillator (3.2.3). Section 3.3 provides the details of
the numerical application of the PSRO to both basis sets. Section 3.4 presents the
results, followed by a discussion (3.4.4) of all the data presented, and possible future
developments.
49
3.2 Theoretical Background
3.2.1 Weylets and Momentum-Symmetrized Gaussians (SG’s)
A complete analysis of the construction of the weylets and SG’s are documented
in Refs. [35]-[37] and Ref. [102], respectively; thus, only the mathematical form of the
basis functions will be presented in this chapter, along with a brief description. First,
the SG’s for the single DOF case in h = 1 units (as will be presumed throughout this
chapter) have the form:
ψst(q) =
(4a2
π
)1/4
cos[ta√
π(q − (s + 1/2)
√π/a
)]e−a2(q−s
√π/a)2/2 . (3.5)
The previous index i is replaced with the two indices, s and t, signifying lattice sites
(qs,±pt) where the unsymmetrized pair of phase space Gaussians are centered. The
lattice sites are specified by qs = (s/a)√
π and pt = ta√
π, with the parameter a
related to the “aspect ratio” of the lattice. Momentum symmetrization requires t to
be restricted to positive half integers, i.e., t = {0.5, 1.5, 2.5, . . .}, but s = {. . . , ε −1, ε, ε + 1, . . .} for any real ε.
The 1 DOF weylet functions can themselves be expanded into SG’s,
ϕst(q) =mmax∑
m,n=−mmax
(−1)(n2+mt)cmnψs+m,t+n(q) (3.6)
where m and n are even integers and cmn are coefficients listed in Refs. [35] and [36].
The cmn decay exponentially with respect to |m|+ |n|; thus, in practice, one can apply
a correlated “diamond” summation,∑|m|+|n|≤mmax
.35,36 In this chapter, the bound on
the summation is chosen to be mmax ≤ 6 outside of which all cmn have magnitudes
less than 10−6.
For the general f DOF case, the SG’s and weylets are products of the 1 DOF
functions:
ψs,t(q) =f∏
j=1
ψsjtj(qj) (3.7)
50
ϕs,t(q) =f∏
j=1
ϕsjtj(qj) (3.8)
where q = (q1, q2, . . . , qf ), s = (s1, s2, . . . , sf ), and t = (t1, t2, . . . , tf ). Each weylet
of Eq. (3.8) is approximately represented by groups of 2f blocks, with centers at
(qs1 , p±t1 , qs2 , p±t2 , . . . , qsf, p±tf ), each of volume (π)f [see Fig. 3.1(a)]. Thus, the set
of 2fN blocks, R′, has a total volume of N(2π)f , and similarly, the set of K target
eigenstates, R, has a total volume of K(2π)f . The SG’s of Eq. (3.7) follow the same
design, except that individual functions correspond to phase space “spheres”, rather
than blocks. The spheres overlap slightly, reflecting nonorthogonality of the SG basis,
and also leading to a somewhat lower efficiency of the SG’s compared to the weylets.
3.2.2 Phase Space Region Operator (PSRO)
In the f DOF case, the action of an arbitrary smooth operator A on the basis
function |ϑs,t〉 is
(Aϑs,t)(q) =∫〈q|A|q′〉〈q′|ϑs,t〉dfq′ (3.9)
where ϑs,t represents either the weylets, ϕs,t, or SG’s, ψs,t. The term 〈q|A|q′〉 in
Eq. (3.9) is known as the “configuration kernel” of A, and can be represented by the
expression:
〈q|A|q′〉 =1
(2π)f
∫A [(q + q′)/2,p] eip·(q−q′)dfp (3.10)
where A is the result of the WW mapping of A.32,33,97,98
The observable of interest is the PSRO, i.e.,
A(q,p) = ρQCK (q,p) = Θ [Emax −H(q,p)]
= Θ
Emax −
f∑
j=1
p2j/(2mj)− V (q)
(3.11)
for a system with a quantum Hamiltonian in the kinetic-plus-potential form (H =
T + V ) in Cartesian coordinates. Using Eqs. (3.9) and (3.10), the PSRO-modified
51
basis functions become
(ρQCK ϑs,t)(q) =
1
(2π)f
∫ρQC
K [(q + q′)/2,p] eip·(q−q′)〈q′|ϑs,t〉dfp dfq′ . (3.12)
Further simplification is possible for f = 168,69 as shown below. First, Eq. (3.11)
can be rewritten as
ρQCK (q, p) = Θ
[Emax − p2/(2m)− V (q)
]
=
1 xmin ≤ q ≤ xmax and − pmax(q) ≤ p ≤ pmax(q)
0 otherwise(3.13)
where pmax(q) =√
2m [Emax − V (q)]. The parameters xmin and xmax define the bound-
aries where pmax(q) is real and pmax(xmin) = pmax(xmax) = 0. At Emax equal to the
dissociation of the bound system, one or both of the parameters extend to infinity,
and, in practice, finite bounds need to be chosen when using ρQCK in computations.
Plugging Eq. (3.13) into (3.10), the 1 DOF configuration kernel of the PSRO can be
reduced to:
〈q|ρQCK |q′〉 =
1(2π)
∫ pmax[(q+q′)/2]−pmax[(q+q′)/2] e
ip(q−q′)dp xmin ≤ q+q′2≤ xmax
0 otherwise
=
sin
[(q−q′) pmax( q+q′
2)
]
π(q−q′) 2xmin ≤ (q + q′) ≤ 2xmax
0 otherwise. (3.14)
Finally, placing Eq. (3.14) into Eq. (3.9), the 1 DOF version of Eq. (3.12) simplifies
to
(ρQCK ϑst)(q) =
∫ 2xmax−q
2xmin−q
sin[(q − q′) pmax(
q+q′2
)]
π(q − q′)ϑst(q
′)dq′. (3.15)
52
3.2.3 PSRO for the Harmonic Oscillator (HO)
Consider the multidimensional isotropic harmonic oscillator (HO), where the masses
and frequencies are all equal to unity in atomic units, i.e., mj = ωj = 1 a.u. for
j = 1, . . . , f so that H(q, p) = (1/2)∑f
j=1(p2j + q2
j ). The spherical symmetry of this
system renders an exact analytical solution of Eq. (3.12) possible. The QC phase
space region R = {(q,p) | 0 ≤ ∑fj=1
[p2
j + q2j
]≤ 2Emax} [where ρQC
K (q, p) = 1] is a
2f−dimensional hypersphere centered at the phase space origin.
The operator ρQCK can always be written in the form
ρQCK =
∑
i
wi|Φi〉〈Φi| (3.16)
where wi are the eigenvalues of ρQCK and |Φi〉 the corresponding eigenfunctions. We
then have
(ρQCK ϑs,t)(q) =
∑
i
wi〈Φi|ϑs,t〉Φi(q) . (3.17)
For the isotropic HO system, the wi’s and corresponding |Φi〉’s are known analytically,
as are the overlaps 〈Φi|ϑs,t〉.As shown in Refs. [68] and [69], the eigenfunctions of ρQC
K are also those of ρK ,
i.e., the HO eigenstates (see Appendix B),
|Φi〉 = |n〉 = |n1〉 ⊗ |n2〉 ⊗ . . .⊗ |nf〉 (3.18)
where nj is a nonnegative integer representing the quantum excitation of the jth DOF
of the HO. The eigenvalues can be determined by
wn(Emax) = 〈n|ρQCK |n〉 . (3.19)
Using the WW formalism, Eq. (3.19) becomes
wn(Emax) =∫
ρQCK (q,p) Wn(q,p)dfqdfp
=∫
RWn(q,p)dfqdfp (3.20)
53
where Wn(q,p) represents the WW phase space representation32,33,97,98 of the pure
state density operator of each HO eigenfunction, i.e., |n〉〈n|. The analytical expression
is
Wn(q,p) =f∏
j=1
Wnj(qj, pj) (3.21)
where Wnj(qj, pj) =
(−1)nj
πLnj
[2(q2j + p2
j)]e−(q2
j +p2j ) (3.22)
and Lnjis a Laguerre polynomial of degree nj.
The above equations imply that the eigenvalues wn(Emax) depend only upon nS =∑f
j=1 nj instead of n itself, i.e., wn(Emax) = wnS(Emax) which is proven in Refs. [68]
and [69] (see Appendix C). Upon integration of Eq. (3.20), a recurrence relationship
can be derived:69
wnS+1(Emax)− wnS(Emax) =
(−1)nS+1 (nS)!
2f−1(nS + f)!e−2Emax(4Emax)
fLfnS
(4Emax) (3.23)
and w0(Emax) =1
(f − 1)!
∫ 2Emax
0tf−1e−tdt
= P (f, 2Emax) (3.24)
where LfnS
is the associated Laguerre polynomial, and P (f, 2Emax) is the incomplete
gamma function. The closed form of Eq. (3.23) is
wnS(Emax) = P (f, 2Emax) +
nS−1∑
k=0
(−1)k+1 k!
2f−1(k + f)!e−2Emax(4Emax)
fLfk(4Emax) (3.25)
for nS > 0. As shown previously, the radius of the hyperspherical regionR is√
2Emax.
Given that R has both a volume V2f = K(2π)f and
V2f =(2πEmax)
f
f !, (3.26)
54
one can derive a useful direct relation between Emax and K:
Emax = (Kf !)1/f (3.27)
which in practice is more useful than working with Emax.
Finally, the PSRO used in the calculation of the K HO eigenvalues and eigenfunc-
tions in the decomposed form of Eq. (3.16) is
ρQCK =
∑
nS ≤ nmax
wnS(Emax)|n〉〈n| (3.28)
where the sum above includes all states |n〉 that have nS ≤ nmax. The nonnegative
integer parameter nmax is chosen such that a desired level of convergence is achieved
in the final calculation.
3.3 Numerical Implementation
3.3.1 Morse Oscillator (1 DOF)
For 1 DOF realistic potentials, we resort to explicit numerical integration of
Eq. (3.15). Although the application to large K and N is computationally expensive,
the small problem sizes of this study are appropriate as the focus is to determine
whether this method can achieve significant increases in efficiency. Future projects
may involve the development of new time and memory saving techniques or approxi-
mations (similar to Sec. 3.3.2) to enhance application of this PSRO method to larger
problems.
We choose to examine the Morse oscillator, for which
H(q, p) =p2
2+ D(e−2κq − 2e−κq) . (3.29)
The parameters, D = 12.000 and κ = 0.2041241 are chosen so that R has a volume
of 48π a.u. at dissociation energy Emax = 0, thus signifying that there are 24 bound
states (K = 24). For comparison, the eigenvalues wi or energy values of the bound
states can be analytically determined:103
wi = −D + κ√
2D(i− 1
2
)− κ2
2
(i− 1
2
)2
(3.30)
55
where i = 1, . . . , 24.
The PSRO-modified weylets and SG’s are numerically computed using Mathemat-
ica. A set of points for q spaced at 0.02 a.u. increments between boundaries X(1)min
and X(1)max are used to define each of the N modified basis functions, ϑ
(1)st (q), using
Eq. (3.15). The boundaries, X(1)min and X(1)
max span slightly beyond the PSRO region be-
cause each projected function, ϑ(1)st (q), extends outside the [xmin,xmax] range, although
the extension does decay rapidly. In practice, xmin−X(1)min and X(1)
max−xmax are chosen
to be approximately between 1 and 2 a.u. which allows sufficient convergence of the
elements of H and S.
For functions resulting from the application of the PSRO operator p > 1 times,
ϑ(p)st (q) = [(ρQC
K )p ϑst](q), one needs to determine beforehand all appropriate bound-
aries for all modified basis functions from ϑ(1)st to ϑ
(p)st . This is done in a reverse fashion
by first choosing sufficient boundaries, X(p)min and X(p)
max, for ϑ(p)st (q), i.e., xmin − X
(p)min
and X(p)max−xmax are between 1 and 2 a.u., and then using the following equations for
One finds that the functions of lower modification need larger boundaries, i.e., X(b)min <
X(b′)min and X(b)
max > X(b′)max for b < b′.
3.3.2 Morse/Harmonic Oscillator (2 DOF)
For systems where f > 1, the Eq. (3.12) integrations are rather costly, even for
f = 2, for which four-dimensional integrals are needed. In such cases however, one
56
may apply a separable PSRO modification to greatly reduce the computational cost.
In this section, we apply this idea to the 2 DOF Morse/HO system,
H(x′, y′, px, py) =1
2(p2
x + p2y) +
(x′)2
2+ D(e−2κy′ − 2e−κy′ + 1) , (3.33)
which becomes coupled via rotation of the coordinates:
x = x′ cos 10o + y′ sin 10o (3.34)
and
y = −x′ sin 10o + y′ cos 10o . (3.35)
Instead of using the ρQCK (x, y, px, py) PSRO which is coupled, one applies the sep-
arable approximation ρQCKx
(x, px)ρQCKy
(y, py) where KxKy ≥ K. Since the ϑs,t basis is
also separable, one obtains
(ρQCK ϑs,t)(x, y) =
∫ 2xmax−x
2xmin−x
sin[(x− x′) pxmax(
x+x′2
)]
π(x− x′)ϑsxtx(x
′)dx′ ×
∫ 2ymax−y
2ymin−y
sin[(y − y′) pymax(
y+y′2
)]
π(y − y′)ϑsyty(y
′)dy′
= ϑ(1)sxtx(x)ϑ
(1)syty(y) (3.36)
where pxmax(x) =√
2[Emax − Vx(x)] and pymax(y) =√
2[Emax − Vy(y)]. Any of a
number of techniques may be used to obtain suitable 1 DOF marginal potentials (Vx
and Vy), with the primary criterion being that ρQCKx
ρQCKy
resemble ρQCK as closely as
possible. For this chapter, we use the method of Ref. [74].
The marginal potentials, Vx and Vy, resulting from the optimization74 can be found
by simply minimizing the original potential with respect to y and x, respectively,19,104
i.e.,
Vx(x) = min[V (x, y)]y (3.37)
57
and
Vy(y) = min[V (x, y)]x . (3.38)
No adjustments need to be made for the above equations, since min[V (x, y)] = 0.
The classically allowed region corresponding to the separable PSRO has a “cylin-
drical” shape composed of the product of 2 two-dimensional phase space regions,
Rx × Ry. This region contains “corners” not present in the nonzero region of ρQCK .
Thus, the separable PSRO is different from, and less effective than ρQCK , in the sense
that it fails to smooth out all of the peripheral lattice states due to the wasted space
of the corners.
3.3.3 Harmonic Oscillator (HO)
For the HO system, one does not need to bother with numerical integrations
needed for the realistic cases presented in Secs. 3.3.1 and 3.3.2. Instead, an analytical
representation of Eq. (3.12), as presented in Sec. 3.2.3, can be used; thus, one can
avoid the numerical errors coming from the computationally expensive integrations.
Ultimately, we want to represent the HO Hamiltonian in the PSRO-modified SG
|ψ(1)s,t 〉 or weylet |ϕ(1)
s,t〉 basis. Using Eq. (3.28), the orthonormality property of the
HO eigenfunctions, i.e., 〈n|n′〉 = δn1n′1δn2n′2 . . . δnf n′f, and the eigenvalue relationship
H|n〉 = (nS +f/2)|n〉, the Hamiltonian matrix elements in the modified SG basis are
given by
[Hψ(1)
]s,t,s′,t′
= 〈ψ(1)s,t | H |ψ(1)
s′,t′〉
=∑nS≤nmax
[wnS
(Emax)]2
(nS +
f
2
) f∏
j=1
(〈ψsjtj |nj〉〈nj|ψs′jt′j〉
).(3.39)
To solve Eq. (4.2), we also need the overlap matrix
[Sψ(1)
]s,t,s′,t′
= 〈ψ(1)s,t |ψ(1)
s′,t′〉
58
=∑
nS ≤ nmax
[wnS
(Emax)]2 f∏
j=1
(〈ψsjtj |nj〉〈nj|ψs′jt′j〉
). (3.40)
Similar equations apply for the weylet basis. The overlaps are given explicitly as
follows:
〈ψst|n〉 =πn/2
√2n−1n!
e−π4(s2+t2) Re
[(s + it)ne−i(π/2)t(s+1)
](3.41)
and
〈ϕst|n〉 =mmax∑
m,n=−mmax
(−1)(n2+mt)cmn〈ψst|n〉 (3.42)
In the above expressions, we have chosen the parameters a = 1 a.u., and ε = 1/2.
The generalization for |ψ(p)s,t 〉 is obtained by replacing [wnS
(Emax)]2 with [wnS
(Emax)]2p
in Eqs. (3.39) and (3.40).
3.4 Results and Discussion
3.4.1 Results for Morse Oscillator (1 DOF)
For the 1 DOF Morse oscillator system, we optimized our calculations for the
lowest K = 6 eigenstates. These are sufficiently far below dissociation that the QC
eigenstate region R (Emax = −6.7500 a.u.) is convex, suitable for the SG’s and
weylets, yet the last few eigenstates are high enough to clearly exhibit anharmonic
behavior (note the shape of R in Fig. 3.3). The N = 9 basis functions are chosen to
sufficiently cover R and are each modified by the PSRO ρQC6 . In practice, a suitable
basis size N depends strongly on K. If N is insufficiently larger than K, then the
PSRO is ineffective at computing all K desired states to sufficient accuracy. On the
other hand, if N is too large, then the overlap matrix S is ill-conditioned (eigenvalues
of S are too small) preventing generalized eigenvalue routines from solving Eq. (4.2).
59
The weylet basis functions were chosen as in Fig. 3.3. The selection is very similar
to what would be obtained via the phase space truncation criterion used in Secs. 3.4.3
and 3.4.2. We chose a = 1.8282 a.u. such that the heights of the lattice cells equal the
maximum extent of R in the positive and negative momentum direction. Also, the
position (horizontal) shift of the rectangles are adjusted (ε = −0.0273 a.u.) so that
the right edge of the block furthest along the position axis (in the positive direction)
corresponds to the right boundary xmax = 5.3058 a.u. of R. The left boundary
xmin = −2.4871 a.u. of R extends slightly further than one basis function; thus, we
added a ninth basis function on the negative side for sufficient coverage.
We considered both the SG’s and weylets as basis sets, as well as, their PSRO-
modified versions, up to p = 3. The triple modified functions, ϑ(3)st (q), lie between
chosen boundaries of X(3)min = −4 and X(3)
max = 7 a.u., sufficiently outside of the PSRO
bounds, xmin and xmax. By Eqs. (3.31) and (3.32), the bounds of the b < p functions
are X(2)min = −11.9743, X
(1)min = −19.5859, X(2)
max = 14.6116, and X(1)max = 22.5859 a.u.
We report in Table 3.1 the absolute errors between the calculated and analytical values
for all basis set types except for the p = 3 case, since these show no improvement
over the p = 2 case.
In both the weylet and SG case, the greatest reduction in the absolute error from
the unmodified to the modified case occurs in the lowest eigenstates. For example,
a near-four-order-of-magnitude improvement in accuracy is shown for the lowest two
eigenvalues of the p = 2 weylets, |ϕ(2)st 〉, and SG’s, |ψ(2)
st 〉, relative to the correspond-
ing unmodified basis sets, |ϕst〉 and |ψst〉. The |ϕ(1)st 〉 and |ψ(1)
st 〉 sets show accuracy
improvements ranging from 1-3 orders of magnitude for all 6 targeted eigenvalues.
In general, both of the SG’s and weylets follow the same pattern vis-a-vis the PSRO
modification, and exhibit comparable accuracies for the same p value, even in the
unmodified case, p = 0. This advocates strongly in favor of using SG basis functions
for practical applications. The main conclusion, though, is that PSRO modification is
extremely effective for either basis set, with the largest relative error for the targeted
K = 6 energies being only around 3× 10−5 for p = 2. Table 3.1 also indicates errors
60
for the remaining N −K = 3 eigenvalues, which show only small improvements and
sometimes loss of accuracy by the PSRO modification, thus demonstrating the ability
of the method to single out just the desired K eigenvalues from the others.
3.4.2 Results for Morse/Harmonic Oscillator (2 DOF)
We developed a separable PSRO that corresponds to a product region, Rx ×Ry,
where both components are projected regions onto the (x, px) and (y, py) axes,74
respectively, of R, the classically allowed region of the lowest K = 14 eigenstates
(Emax = 5.0700 a.u.) of the 2 DOF Morse/HO system. We applied the PSRO to
each of the SG basis functions (N = 49) selected by the basis truncation criterion
H(qs, pt) ≤ Ecut = 9.0000 a.u.35–37 In parts (a) and (b) of Fig. 3.4, the inner curves
(solid line) denote Rx and Ry, respectively, with boundaries of xmin = −3.1911 and
xmax = 3.1797 a.u. for the former and ymin = −2.4942 and ymax = 5.0820 a.u. for
the latter. The projections of R′, the classically allowed region of the SG basis set
(actually, more reflective of the weylets), are highlighted by the dotted lines and, in
a fashion similar to PSRO regions, are labeled as R′x and R′
y. The parameters of the
SG basis set were chosen to produce optimal computed eigenvalues for the unmodified
basis set and are listed as follows: ax = ay = 0.7979, εx = 0.4064, and εy = 0.0926
a.u.
The analytical eigenvalues of the Morse/harmonic oscillator (column 2 of Ta-
ble 3.2) are equal to the sum of the Morse [Eq. (3.30)] and 1 DOF harmonic eigenval-
ues, exactly like the uncoupled case [Eq. (3.33)], since the coupling is only due to a
rotation on the (x, y) plane. Improvements up to two orders of magnitude are shown
for the modified versus unmodified case. Note that there is not a clear divorcing of
the first 14 and the remaining 6 reported in Table 3.2, as there is in Table 3.1; al-
though, the accuracy improvements diminish rapidly beyond the K = 14 cutoff. This
behavior is due to the artificial separability used in the 2 DOF PSRO. The method
is nevertheless quite effective at improving the accuracy of the desired eigenvalues.
61
3.4.3 Results for Harmonic Oscillator (HO)
The isotropic HO up to 4 DOF’s was solved using 6 different basis sets: |ϕs,t〉,|ϕ(1)
s,t〉, |ϕ(2)s,t〉, |ψs,t〉, |ψ(1)
s,t 〉, and |ψ(2)s,t 〉 (Tables 3.3-3.6). In all cases, the basis truncation
criterion H(qs, pt) ≤ Ecut35–37 was used to obtain a representational basis of size N .
We also chose K = N , i.e., we set the volume of the PSRO region R to equal that of
the basis region R′ [Fig. 3.1(a)].
In Fig. 3.2, we can see, pictorially, how the PSRO affects the weylets in the 1
DOF case. Part (a) shows the WW phase space representation of the projection
operator containing the lowest 6 HO eigenstates. Part (b) gives the same setup for
the 6 weylets chosen as a basis to solve the HO Hamiltonian [the QC region of (b)
is the collective region of blocks in Fig. 3.1(a)], and part (c) represents the same
weylet set under a single PSRO modification. The modified weylets show concentric
ring-like features much like the HO eigenstates resulting in a considerable amount
of improvement in efficiency over the unmodified set (Table 3.3)—although, one can
still observe residual rectilinear weylet features in the slight box-like cornering of the
rings.
The analytical eigenvalues of the isotropic HO are (nS + f/2), with degeneracy
deg(nS, f) =
(nS + f − 1
f − 1
)(3.43)
for f > 1 (the eigenvalues are of course nondegenerate for f = 1). They are compared
to the actual calculated eigenvalues using the different basis sets, and Tables 3.3-3.7
report how many of the computed values fall within relative accuracies of 2 × 10−2,
2×10−3, 2×10−4, and so on (which correspond to error tolerances of (0.2)f , (0.02)f ,
(0.002)f , . . . , respectively.37) Table 3.7 stands apart from the other HO tables in
that it reports the effects of basis set efficiencies as one increases p, the power of the
PSRO operator.
62
For this analysis, we define the efficiency to be L/N , where L is the number
of eigenvalues actually computed to a certain relative accuracy as opposed to the
number of targeted eigenstates, K, used to defined the PSRO (K = N for the HO
case). This efficiency for f = 2 at 2× 10−4 relative accuracy is plotted against N in
Fig. 3.5 which demonstrates similar efficiency curves (apart from a shift) for all six
basis types. Although |ϕ(p)s,t〉 is now substantially more efficient than |ψ(p)
s,t 〉, the effect
is still no greater than that of the PSRO modification itself, in that |ϕs,t〉 and |ψ(1)s,t 〉
are comparable. Note the large increase in efficiency from N = 176 to 2 340; this
“efficiency cliff”37 shifts to the right with increasing dimensionality, and is the leading
cause of accuracy restrictions on weylet calculations at higher dimensionalities. The
fact that the PSRO improvements are comparable on and off the cliff is thus highly
encouraging.
Fig. 3.6 clearly shows that the efficiencies (2 × 10−2 accuracy) of all basis types
do not decay exponentially as the DOF’s increase (N ≈ 10 000 is held constant).
The results of the weylets and the SG’s are very similar in these plots: the weylets
are slightly better than the SG’s in the unmodified case, and for both the single and
double PSRO-modified sets, the results are essentially identical (modified weylets are
not shown). These similarities are only true at the 2× 10−2 accuracy, whereas higher
accuracies display significant improvement of the weylets over SG’s, as shown in the
previous analysis. The most important result is that the efficiency of the modified
functions decreases significantly slower than the unmodified with increasing f ; thus,
the benefits of PSRO modifications are expected to be substantially greater at larger
dimensionality.
In general, the PSRO projection operator significantly improves the efficiency of
both the SG’s and weylets as demonstrated by all tables and plots. Also, the modifi-
cations produce bases that allow for the calculation of a large number of eigenvalues
at extremely high accuracies, otherwise not attainable. For example, in the 2 DOF
case (Table 3.4), there are L = 757 (N = 20 864) eigenvalues that fall within the
relative accuracy of 2×10−12 using the |ϕ(2)s,t〉 basis. Even the less efficient basis |ψ(2)
s,t 〉
63
is impressive at L = 305 at similar conditions. A particularly surprising result is to
be found in Table 3.7, for which over 1 000 eigenvalues are computed to 2 × 10−7
accuracy, using the |ϕ(11)s,t 〉 basis of only N = 2 340 functions. Thus, the PSRO to the
11 power is very close to the actual projection operator ρK .
3.4.4 Discussion
In conclusion, the PSRO modification of the weylets and SG’s produce new
nonorthogonal bases with efficiencies far better than the unmodified sets—especially
useful in cases where the latter are limited in accuracy, or when K and N are small.
For realistic 1 DOF systems, one can successfully apply the PSRO to either the weylet
or SG basis via numerical integration to gain large improvements in accuracy, which
for the most part, is not computationally expensive. However, the same computations
do become a serious problem at larger dimensionalities. The favorable symmetry of
the model HO system allowed the analyses of higher DOF calculations, which would
otherwise not be possible, showing that efficiency and accuracy improvements get
even better as one increases f . This promising result provides motivation to explore
algorithms that would reduce computations involving real systems of multiple DOF’s,
for example, the separable PSRO of Sec. 3.3.2. For large basis sizes N and number
of PSRO modifications p, the computational effort becomes expensive, but is clearly
worth exploring further. One idea for possible future works involves the application
of Monte Carlo integration methods to the phase space integration in Eq. (3.12).
64
Table 3.1. The absolute differences between computed and analytical [Eq. (3.30)]values (in a.u.) for the lowest 9 energy levels of the 1 DOF Morse potential,using various basis sets of N = 9 functions in each case.
Table 3.2. Lowest 20 eigenvalues of the 2 DOF Morse/HO system (in a.u.). Analyti-cal eigenvalues are in column 2, whereas columns 3 and 4 present absolutedifferences between computed (with indicated basis of N = 49 functions)and analytical values.
State index exact |ψs,t〉 |ψ(1)s,t 〉
1 0.994 79 0.110 71 0.007 23
2 1.953 12 0.160 49 0.007 92
3 1.994 79 0.137 23 0.019 68
4 2.869 79 0.182 10 0.011 07
5 2.953 12 0.152 55 0.013 31
6 2.994 79 0.287 10 0.047 11
7 3.744 79 0.109 67 0.003 07
8 3.869 79 0.264 59 0.025 60
9 3.953 12 0.228 14 0.007 93
10 3.994 79 0.512 73 0.130 02
11 4.578 12 0.276 80 0.022 38
12 4.744 79 0.233 47 0.016 40
13 4.869 79 0.498 17 0.048 03
14 4.953 12 0.468 67 0.063 08
15 4.994 79 0.786 11 0.368 03
16 5.369 79 0.612 01 0.049 81
17 5.578 12 0.495 62 0.042 01
18 5.744 79 0.461 22 0.042 82
19 5.869 79 0.608 98 0.093 21
20 5.953 12 0.786 84 0.336 79
66
Table 3.3. Results for the 1 DOF isotropic HO system computed using six differentbasis sets. The values under the different basis columns indicate thenumber of eigenvalues computed to relative accuracy indicated in column3.
Ecut (a.u.) N accuracy |ϕst〉 |ϕ(1)st 〉 |ϕ(2)
st 〉 |ψst〉 |ψ(1)st 〉 |ψ(2)
st 〉4.0 6 2× 10−2 2 5 6 2 6 6
2× 10−3 0 3 5 0 3 5
2× 10−4 0 1 2 0 0 3
2× 10−5 0 0 1 0 0 1
110.0 108 2× 10−2 84 97 104 81 97 104
2× 10−4 53 75 88 48 54 83
2× 10−6 31 47 50 22 37 47
2× 10−8 17 20 29 7 18 20
2× 10−10 4 6 13 0 3 5
2× 10−12 0 1 2 0 0 0
177.0 180 2× 10−2 144 157 172 139 156 172
2× 10−4 105 118 131 97 108 121
2× 10−6 74 93 98 61 83 92
2× 10−8 52 60 75 35 54 58
2× 10−10 28 35 53 17 28 36
2× 10−12 6 24 27 7 13 21
2× 10−14 0 7 7 1 5 5
67
Table 3.4. Number of accurately computed eigenvalues for the 2 DOF isotropic HOsystem using six different basis sets (consult Table 3.3 for further details).
Table 3.5. Number of accurately computed eigenvalues for the 3 DOF isotropic HOsystem using six different basis sets (consult Table 3.3 for further details).
Table 3.6. Number of accurately computed eigenvalues for the 4 DOF isotropic HOsystem using six different basis sets (consult Table 3.3 for further details).
Table 3.7. Number of accurately computed eigenvalues for the 2 DOF isotropic HOsystem at Ecut = 70.0 a.u. (N = 2 340) using modified weylet basiswith the PSRO applied from 3 to 11 times. Beyond p = 11, numericaldifficulties arise in the diagonalization of the Hamiltonian matrix.
accuracy 3 5 7 9 11
2× 10−2 2 161 2 204 2 218 2 226 2 242
2× 10−3 1 772 1 933 2 039 2 088 2 121
2× 10−4 1 343 1 633 1 778 1 870 1 956
2× 10−5 792 1 142 1 439 1 610 1 719
2× 10−6 441 690 986 1 218 1 385
2× 10−7 246 392 627 835 1 056
2× 10−8 71 142 302 507 689
2× 10−9 17 28 106 256 417
2× 10−10 1 4 24 86 223
2× 10−11 0 0 1 28 92
71
2 4- 2- 4q ( a.u.)
- 4
- 2
2
4
p ( a.u.)
(a)
2 4- 2- 4q ( a.u.)
- 4
- 2
2
4
p ( a.u.)
(b)Figure 3.1. Classically allowed (QC) region for the weylets and HO PSRO in the
1 DOF case. The basis truncation criterion H(qs, pt) ≤ Ecut = 4.0 a.u.gives the weylet basis of N = 6 functions (Table 3.3). In part (a), theQC region of each weylet is represented by two squares each with volumeπ a.u. symmetrically placed about the q axis. The two squares with smallunfilled circles represent the weylet |ϕ1/2,3/2〉. The larger unfilled circle
is the QC region R corresponding to the PSRO ρQCK (K = N = 6). The
same weylets modified by the exact projection ρK are approximatelygiven in part (b).
72
02
4
- 2- 4
024
- 2- 4
0.5
02
4
- 2- 4q (a.u.) p (a.u.)
�6(q;p) (a.u.)(a)
02
4
- 2- 4
024
- 2- 4
0.5
02
4
- 2- 4q (a.u.) p (a.u.)
�6(q;p) (a.u.)(b)
02
4
- 2- 4
024
- 2- 4
0.5
02
4
- 2- 4q (a.u.) p (a.u.)
�6(q;p) (a.u.)(c)
Figure 3.2. Wigner-Weyl representation (1 DOF case) of projection operators con-sisting of three different sets of N = 6 (a) HO eigenstates; (b) weylets;(c) PSRO-modified weylets. Different than what is stated in Sec. 3.1, allplots oscillate about (1/2π), instead of unity, due to a discrepant normal-ization. In part (b), there are quantum interference fringes that emergefrom the momentum symmetrization. The modified weylets in part (c)do not have these fringes and adopt the ring-like pattern of the true HOeigenstates, rendering them a more efficient basis for representing theHO system (see Table 3.3).
73
2 4- 2- 4 6q ( a.u.)
2
4
- 2
- 4
p ( a.u.)
Figure 3.3. Classically allowed region for the 1 DOF weylets and Morse PSRO. Theegg-shaped region R represents the 1 DOF Morse system at Emax =−6.7500 a.u. There are K = 6 eigenstates at that energy (area inside is12π a.u.). Filled circles indicate the centers of the N = 9 basis functionsused for the calculation of the 6 corresponding eigenvalues (Table 3.1).
74
2 4- 2- 4x ( a.u.)
2
4
- 2
- 4
px ( a.u.)
(a)
2 4 6 8 10- 2- 4y ( a.u.)
2
4
- 2
- 4
py ( a.u.)
(b)Figure 3.4. Classically allowed region of the separable PSRO for the 2 DOF
Morse/HO system. Rx and Ry are shown in parts (a) and (b), respec-tively, as the inner “solid” curve. R′
x and R′y, reflective of the basis set
(N = 49) of SG’s, are also shown in parts (a) and (b) outlined by the“dotted” lines.
75
0 10000 20000N
0
0.2
0.4
0.6
Effi
cien
cy L
/N
Figure 3.5. Efficiency versus N for the 2 DOF HO at 2 × 10−4 relative accuracy(Table 3.4). The “solid” lines correspond to the weylets either unmodi-fied (circles), single PSRO-modified (squares), or double PSRO-modified(triangles). The “dashed” lines represent the SG’s.
76
2 3 4DOF
0.2
0.4
0.6
0.8
1
Effi
cien
cy L
/N
Figure 3.6. Efficiency versus DOF’s at N ≈ 10 000 held constant for the HO system.The three data points for each line, starting from the left 2 DOF point,represent basis sizes of N = 9 956, 9 552, and 12 720, with L at relativeaccuracy of 2 × 10−2. The labels for the plots are the same as that inFig. 3.5. Only the SG’s are plotted for the modified cases since they arevery similar to the weylets.
77
CHAPTER IV
PARALLEL PREPROCESSED SUBSPACE ITERATION METHOD
4.1 Introduction
For iterative eigenvalue solvers, a common goal is to create a subspace invariant
under the N × N matrix A in question (assumed to be real, symmetric, and sparse
throughout this chapter). Although this subspace is spanned by a basis of select
eigenvectors of A, one need not have any prior knowledge of the eigenvectors or the
corresponding eigenvalues in order to iteratively converge towards the target invariant
subspace (ISUB). In the physics/chemistry community, the ISUB is often represented
as a density or projection matrix ρ, a uniformly mixed ensemble of the appropriate
eigenvectors.
In this paper, an N × d rectangular matrix containing column vectors (not nec-
essarily eigenvectors) spanning the ISUB of dimension d, will be referred to as S. If
the column vectors are orthonormal, then one can easily project out a smaller d× d
real and symmetric matrix C (known as a Rayleigh matrix), i.e.,
STAS = C (4.1)
where T designates the transpose. The advantage gained is that the much smaller C
matrix (d ¿ N) can then be numerically diagonalized using direct standard eigen-
value techniques to obtain the eigenvalues of A that correspond to the eigenvectors of
A contained within the ISUB. Alternatively, one might choose not to orthogonalize
successive column vectors of S. In this case, one must solve the generalized eigenvalue
problem
C~x = λM~x , (4.2)
where C comes from Eq. (4.1) (no longer referred to as a Rayleigh matrix), M = ST S
is the overlap matrix, and (λ, ~x) is the (eigenvalue,eigenvector) pair.
In practice, one is often unable to obtain an exact ISUB, but rather an approximate
ISUB through numerical means. For example, a common technique is the Lanczos
78
method where the approximate ISUB (or Krylov subspace) is
Kw = span(~b, A~b, A2~b, . . . , Aw−1~b) , (4.3)
where ~b is an initial column vector (usually random) and w is the dimension of the
subspace. If one orthonormalizes the vectors spanning Kw in between successive
matrix-vector products, and then combines the vectors after all w − 1 iterations
to make a new N × w matrix, Kw−1 (subscript denotes the number of iterations),
then the Rayleigh matrix, C, is simply found via Eq. (4.1) with Kw−1 replacing S.
Particular to the Lanczos method, C has the favorable property of being tridiagonal.
Not only is the Rayleigh matrix easier to diagonalize as a result, but a clever algorithm
can be implemented requiring minimal storage of only four vectors throughout the
iterations: the growing main and adjacent diagonal of C and two adjacent column
vectors of Kw−1.78
Mathematically, the Lanczos algorithm above produces orthogonal vectors span-
ning the Krylov subspace; due to finite numerical precision however, in practice,
orthogonality is compromised after successive iterations, leading to computed eigen-
values with extra multiplicities, known as “spurious” eigenvalues.105 An occasional
re-orthogonalization of all vectors in the Krylov subspace can remedy this, and there
are methods such as selective106 and partial re-orthogonalization107,108 that do this
efficiently. Unfortunately, all of the column vectors of Kw−1 need to be stored, instead
of just the four mentioned above, which may thus become a major issue.
Wu and Simon have addressed the storage problem using two separate strategies.
First, they have developed a thick-restart Lanczos method109 where after the memory
is completely filled by the growing Krylov subspace, the Ritz vectors (or eigenfunctions
of the Rayleigh matrix at that point) are calculated, and all are used to develop a
starting point for a new and more accurate Krylov subspace, which replaces the old.
Second, they have developed a parallel algorithm, PLANSO,110 where the Lanczos
vectors are uniformly and conformally mapped among compute nodes, and all of
the linear algebra operations needed in the Lanczos algorithm are parallelized to
79
accommodate the distribution. Popular sparse and parallel packages can be used to
interface with PLANSO in order to handle sparse operations such as matrix-vector
multiplication.
In principle, there are numerous strategies one might develop for parallelizing the
Lanczos method. For the most part, however, these only address the parallelization
of the linear operations required of individual Lanczos iterations, i.e., multiple vec-
tors are non-parallelizable, owing to the sequential nature of the Krylov subspace
methods. Block Lanczos methods111 [as opposed to “vector” Lanczos methods of
Eq. (4.3)], where each iteration involves a group of orthonormal vectors instead of
a single vector, does offer a design conducive to parallelization at the vector level,
although the effectiveness of the parallelization is limited by the size of the blocks,
which in practice is relatively small. In order to fully take advantage of the bene-
fits offered by parallelization at the vector level, one must completely eliminate the
sequential aspect of the iterations. Thus, we chose a different iterative method that
offers this, the subspace iteration (SI) method, where the approximate ISUB is
Zd = span(Ar~b1, Ar~b2, . . . , A
r~bd) . (4.4)
The set of column vectors (~b1,~b2, . . . ,~bd) must, at least, be linearly independent or else
all of the vectors spanning Zd (which comprise the N × d matrix Zr) might converge
to the same eigenvector of A as r increases.
In practice, the total number of matrix-vector products of the SI method, r × d,
exceeds that of the vector Lanczos, w−1, when achieving similar eigenvalue accuracies
of A, but parallelizing Zr such that each column vector is calculated on separate nodes,
effectively reduces the number of matrix-vector products to r, which is considerably
smaller than w− 1. Parallelization of the block Lanczos methods can achieve similar
savings as the SI method only if the block size of the former is the same as d, although
this introduces problematic memory issues for the latter method since its approximate
ISUB has a large w × d dimension.
80
The SI method also offers other key advantages over both vector and block Lanczos
methods. For example, one has better control over the dimension of the approximate
ISUB. In other words, one directly chooses d in the SI case, and independently ad-
justs the accuracy via the iteration number r, whereas in the vector and block Lanczos
cases, the dimensionality and number of iterations are both dependent upon the same
parameter w. In practice, good iteration number and dimension parameter values re-
quire a careful balance of factors. Another advantage of the SI method over Lanczos,
in particular the vector Lanczos, is that the former does not become inefficient in
cases where the eigenvalue spectrum is degenerate or nearly-degenerate. This situa-
tion is common in spectroscopic rovibrational calculations, particular when there is
symmetry.15,112,113 The block Lanczos methods do correctly account for degeneracy
as long as the individual block size dimensions are greater than or equal to the extent
of the degeneracy. Last, in the SI method, the approximate ISUB approaches that of
the exact ISUB when not considering numerical error issues, i.e.,
Zr −→ S as r −→∞ , (4.5)
which is not the case for both the vector and block Lanczos methods.
Although the SI method is clearly a natural candidate for parallelization, the in-
corporation of vector orthogonalization (either after every matrix-vector product, or
just occasionally) does require substantial internode communication. In this chapter,
we present a way to preprocess A such that a satisfactory ISUB can be iteratively
computed without the need for costly orthogonalizations. Communication is only re-
quired after the final iteration, in order to calculate the elements of C using Eq. (4.1)
(Zr replaces S), as well as the elements of M (column vectors of Zr are nonorthogo-
nal) and to initialize both C and M on a single node for direct diagonalization using
Eq. (4.2). These preprocessing ideas are inspired by single-particle density matrix pu-
rification (DMP) schemes used in ground-state electronic-structure calculations.70,71
In the DMP context, the matrices are often sufficiently small as to allow direct matrix-
matrix multiplications to be performed, e.g., PRISM (Parallel Research on Invariant
81
Subspace Methods).114,115 For many SI applications, however, e.g. quantum dynam-
ics, the matrices involved are large and sparse, and therefore not amenable to direct
matrix-matrix products. Instead, our approach uses matrix-vector products, which
are far less costly, and also enable efficient sparse matrix techniques to be employed.
In this chapter, we apply the new preprocessed SI method to several model sys-
tems: isotropic (3 and 6 degrees of freedom or DOF’s) and anisotropic (3 DOF)
uncoupled harmonic oscillators (HO’s). We represent the Hamiltonian operator H
of these systems using a type of Weyl-Heisenberg wavelet (or “weylet”) basis chosen
under the guidance of a phase space truncation scheme35–37 which gives N×N sparse
matrix representations, H, of the system. In all cases, the eigenvalues of H are cal-
culated from an approximate ISUB of dimension d = 6 000, and a determination is
made of how many of these fall within a certain relative accuracy level. Although the
6 000 × 6 000 matrix C is diagonalized directly, the expectation is that N À 6 000,
so that the overall calculation is still extremely efficient.
In practice, we find that towards the lower end of the spectrum of the 6 000,
the eigenvalues match those of H with increasing accuracy. The greatest errors are
towards the high end of the spectrum, as is typical for subspace methods. However,
the SI method enables one to improve the accuracy of all d = 6 000 eigenvalues,
in principle to arbitrary precision, simply by increasing r, because of Eq. (4.5). In
practice, we find limitations on the maximum value of r that can be applied. In such
cases, we can still improve the accuracy of the desired eigenvalues arbitrarily, simply
by increasing d (at the expense of using more memory on the nodes), as per other
subspace methods.
The new SI method is also found to be extremely scalable. With respect to
storage requirements, the iteration vectors are found to require most of the memory,
but these can be evenly distributed over all available nodes. With respect to CPU
operations, the bottleneck is the iterative generation of the ISUB vectors, which
exhibits near perfect parallel speedup, as it requires no internode communication.
The subsequent C matrix creation and initialization steps also parallelize efficiently,
82
although communication is involved. Having applied the new method in this paper
to matrices as large as N ≈ 106, we find it to be very effective even in its present
incarnation, although there is still ample room for future improvement and fine tuning.
4.2 Theoretical Background
The SI method is considered to be a variation of the “power method,” for which,
instead of focusing on the attainment of one eigenvector, one desires to find a group
of eigenvectors corresponding to some region of the eigenvalue spectrum. For the
SI case presented in the previous section [Eq. (4.4)], the fully converged Zr (r −→∞) would be the space spanned by the d eigenvectors of A that have the largest
eigenvalue moduli. Traditional SI methods orthogonalize the ISUB vectors after each
matrix-vector product via a QR decomposition or a modified Gram-Schmidt. This
prevents Zd from losing dimensionality, i.e., the new Zr with orthogonal column
vectors maintains full rank d. In addition, the projected matrix C found by Eq. (4.1)
is a Rayleigh matrix, and the eigenvalues are found via direct standard eigenvalue
techniques instead of using Eq. (4.2).
This approach is preferable when r and d are small, otherwise it is too compu-
tationally expensive. Our strategy, on the other hand, is to preprocess A such that
the orthogonalizations may be avoided altogether, thus making it possible to easily
parallelize the complete algorithm. We are not currently able to eliminate the rank
problems completely; however, using the present preprocessing scheme, we can ac-
curately retrieve a significant portion of the eigenvalue spectrum of A. Also, since
our chief interest lies in calculating rovibrational energies of small molecules (A now
becomes the Hamiltonian matrix H), our preprocessing scheme focuses on the lower,
most accurate portion of the eigenvalue spectrum.
There are two steps in the preprocessing. First, the rows and columns of H are
reordered so as to place diagonal elements of H in ascending order, moving from
the top-left corner towards the bottom-right. The second step, commonly known as
83
“scaling”, is the following simple adjustment of H:
H ′ = γ(µI − H) + I (4.6)
where γ = min[
1
εmax − µ,
1
µ− εmin
].
The N × N matrix I is the identity, and µ is known as the “chemical potential”
in DMP papers71 (the significance will be addressed later). The parameters εmax
and εmin are the approximate largest and smallest eigenvalues, respectively, found via
Gershgorin’s formulas,116
εmax = max[Hii +
∑
j 6=i
|Hij|]
i(4.7)
and
εmin = min[Hii −
∑
j 6=i
|Hij|]
i. (4.8)
No eigenvalues of H stray outside of the above boundaries.
The eigenvalues of H ′ range between 0 and 2, with those between 1 and 2 cor-
responding to the desired eigenvalues of H below µ. The µ parameter should be
chosen so that d (dimension of ISUB) eigenvalues of H ′ lie within the latter range.
In practice, one first decides on d, and then determines a suitable µ value after a few
trial runs starting from a reasonable initial guess. The matrix H ′ = A is then used
to obtain the approximate ISUB via Eq. (4.4) with the initial vectors ~bi chosen to
be the orthonormal unit vectors ~zi (with the ith component equal to 1, and all other
components 0). The approximate ISUB can therefore be written (H ′)rZ0 = Zr where
r is the number of iterations, and Z0 = (~z1, ~z2, . . . , ~zd). Mathematically, as r −→ ∞,
Zr −→ S, as mentioned before. Only matrix-vector products are performed so that
one has only to deal with the d column vectors of matrix Zr and the sparse matrix
H ′. In contrast, matrix-matrix products of H ′ would lead to a dense N ×N matrix
that is too large to store on each node. The rationale here is that the Zr contribution
from the subspace of eigenvectors corresponding to the eigenvalues of H ′ (the original
matrix H has the same eigenvectors) between 0 and 1 will dissipate exponentially
84
with respect to r. By the same token, the [1, 2] contribution becomes increasingly
prominent with increasing r (see Appendix D).
Since the column vectors of Zr are not orthogonal, then the eigenvalues are ob-
tained by Eqs. (4.1) and (4.2). This requires that the overlap matrix M be positive
definite, which is true formally if Zr is of full rank, but can cause numerical instabili-
ties if the smallest M eigenvalues are too close to zero. The reordering of H discussed
earlier, together with the choice ~bi = ~zi, is designed to ameliorate this difficulty. It
is certainly found to be very successful in this regard; however, one is still limited in
the number of iterations, r, that can be used without numerical instabilities arising
in the generalized eigenvalue solution routines. This limitation, in turn, restricts the
accuracy that can be achieved for a given value of d.
4.3 Parallel and Numerical Implementation
In this section, we describe the numerical algorithm in detail, including paral-
lelization. The implementation can be broken down into 7 sequential steps as follows:
1. Construct H using the truncated weylet basis.
2. Preprocess H to get the new matrix H ′.
3. Distribute d vectors ~zi of Z0 across nodes; duplicate H ′ across nodes.
4. Perform r iterations in parallel, in order to compute Zr = (H ′)rZ0.
5. Calculate the elements of C and M while keeping the d Zr vectors distributed
among the nodes.
6. Send all of the C and M matrix elements to a single node.
7. Solve the generalized eigenvalue problem of Eq. (4.2) on a single node.
In this study, we only look at separable systems, e.g., isotropic and anisotropic
uncoupled HO where the Hamiltonian operator is
H = (1/2)f∑
j=1
(p2j/mj + mjω
2j q
2j ) (4.9)
85
(all of the masses are set to unity, i.e., mj = 1 for j = 1, . . . , f). The 1 DOF weylet
lattice basis (h = 1 assumed throughout chapter) used to represent H is
ϕst(q) =∑
|m|+|n|≤ 6
(−1)(n2+mt)cmnψuv(q) (4.10)
where ψuv(q) = (4a2/π)(1/4) cos[va√
π(q−(u+1/2)
√π/a
)]e−a2(q−u
√π/a)2/2 , (4.11)
u = s + m, v = t + n, m and n are even integers, cmn are coefficients with values
reported in Ref. [36], a is related to the “aspect ratio,” s (half-integer) is the position
index of the weylet block on phase space, and t (positive half-integer) is the momen-
tum parameter. For f DOF’s, the basis consists of products of the 1 DOF functions,
i.e.,
ϕs,t(q) =f∏
j=1
ϕsjtj(qj) (4.12)
where q = (q1, q2, . . . , qf ), s = (s1, s2, . . . , sf ), and t = (t1, t2, . . . , tf ).
Step 1 , the creation of the Hamiltonian matrix H, is very quick since the matrix
elements are analytical. The 1 DOF kinetic energy matrix has elements derived from
〈ϕst|p2|ϕs′t′〉 =∑
|m|+|n|≤ 6
∑
|m′|+|n′|≤ 6
(−1)(n2+mt+n′
2+m′t′)cmncm′n′〈ψuv|p2|ψu′v′〉 (4.13)
with
〈ψuv|p2|ψu′v′〉 =πa2
2[h(u, v, u′, v′)− h(u, v, u′,−v′)] , (4.14)
and
h(u, v, u′, v′) = e−(π/4)(u2∆+v2
∆)
[1
2
(v2
+ − u2∆ +
2
π
)
× cos ζ − u∆v+ sin ζ
]. (4.15)
The “∆” and “+” subscripts indicate the difference and addition, respectively, of the
bra and ket indices, e.g., u∆ = u− u′ and v+ = v + v′. Note that under a change of
sign of v′ [i.e., for the last term of Eq. (4.14)], v∆ becomes v+ and vice-versa. The
phase quantity ζ is given by
ζ(u, v, u′, v′) =π
2(u∆v+ + v∆) , (4.16)
86
and since u∆, v∆, and v+ are always integers, the trigonometric quantities in Eq. (4.15)
are always ±1 or 0. The 1 DOF potential energy matrix elements are obtained from
〈ψuv|q2|ψu′v′〉 =π
2a2[τ(u, v, u′, v′)− τ(u, v, u′,−v′)] , (4.17)
and
τ(u, v, u′, v′) = e−(π/4)(u2∆+v2
∆)
[1
2
(u2
+ − v2∆ +
2
π
)
× cos ζ + v∆u+ sin ζ
]. (4.18)
With regard to the full f DOF matrix representation, the separability of H results
in a sparse matrix H, especially for large f . The full f DOF kinetic energy matrix is
given by
(1/2)〈ϕs,t|p21 + p2
2 + . . . + p2f |ϕs′,t′〉 = (1/2)
∑
(i,j,...,k)=(1,2,...,f)
(〈ϕsiti|p2
i |ϕs′it′i〉
×δsjsj′δtjtj ′ . . . δsksk
′δtktk′
)(4.19)
where the summation is over all cyclic permutations of (1, 2, . . . , f), δ is the Kronecker
delta function, and 〈ϕsiti|p2i |ϕs′it
′i〉 is defined in Eqs. (4.13)-(4.15). The potential
energy matrix elements adhere to a similar sparse form as Eq. (4.19) except that q’s
are put in the place of p’s.
To maximize storage efficiency and to speed up linear algebra operations needed
later, the row-indexed sparse storage mode16 is used for H. The nonzero elements
of H are stored directly in a 1-dimensional array (SA) accompanied by another 1-
dimensional array (IJA) of integer parameters containing the matrix positions of the
nonzero elements. Both of the arrays contain G + 1 elements, where G is the number
of nonzero elements (the extra storage slot is needed for the storage mode).
The good phase space localization of the weylets, along with the orthonormality,
allows for a phase space truncation scheme which defeats exponential scaling. In
87
other words, the efficiency K/N , where N represents the number of basis functions
(order of H) needed to calculate K eigenvalues at a desired accuracy, does not decay
exponentially with respect to f , the number of DOF’s of the system in question.35–37
More specifically, the truncation involves selecting only those weylets, whose center
coordinates on phase space when plugged into the classical Hamiltonian expression,
produce values Hmid below some energy cutoff Ecut. The justification of the scheme is
based upon the quasiclassical approximation which treats the set of weylets as a lattice
of 2f -dimensional blocks partitioning phase space, and the target K eigenstates of H
are represented as a uniform region. This approximation improves as both K and N
increase.34
Due to the sparsity of H, the preprocessing in step 2 is very quick, as well, and
the sparsity is preserved exactly in the new matrix H ′ (which fully replaces H in the
1-dimensional arrays). The reordering of H and the calculation in Eq. (4.6) using
the Gershgorin’s formulas in Eqs. (4.7) and (4.8) take full advantage of the sparsity
and the row-indexed sparse storage mode. This allows the step to be an insignificant
contribution to the total CPU time required.
In step 3, the 1-dimensional array SA(1 : G + 1) containing the elements of H ′,
along with the integer array IJA(1 : G + 1) are copied onto each of the g compute
nodes. The strategy is that the iterations, i.e., H ′Zr−1 = Zr, needed to create the
ISUB will be done in parallel by distributing the column vectors of Z0 as equally as
possible over the g nodes. Simultaneously, each node performs matrix-vector products
with H ′ and its assigned vectors. In practice, instead of apportioning Z0 at the start
of the process, one can simply divvy up the first d columns of H ′ which is equivalent
to Z1.
For clarity, we assign a number y for each node ranging from 0 to g − 1. For
communication purposes, as shown in Fig. 4.1, the nodes are arranged in a loop, with
the column vectors stored sequentially around the loop. If d > g, then the distribution
will continue looping around in the same manner. With this arrangement, node y
will have either αy = [[d/g]] + 1 vectors ([[ ]] denotes “greatest integer smaller than,”
88
also known as the “floor” function) if y + 1 ≤ mod(d, g) = d − g[[d/g]], otherwise
αy = [[d/g]] vectors.
Step 4 involves the r matrix-vector products between H ′ and the αy vectors on each
node y done in parallel. Multiple trial runs are needed to find the “largest” number
of iterations, r, the point before the numerical generalized eigenvalue solver fails (or
eigenvalues of M are too close to zero) due to loss of rank. One can incorporate a
singular value decomposition method if one exceeds r; although, one finds that no
accuracy is gained beyond the largest iteration point.
After the r iterations are completed, step 5, the construction of matrices C and
M , is effected. The simplest method would be to transfer all of the dense vectors of Zr
to a single node; although, it is more than likely that each node does not have enough
memory to handle Zr. Instead, an efficient communication scheme is employed with
[[g/2]] stages.
Before any communication in step 5, two important steps are implemented. First,
H ′ is “unpreprocessed” back to H which is used for the calculation of C. Second,
all matrix elements of C and M corresponding to the local set of αy vectors are
calculated. For example, in Fig. 4.1, node y = 0 contains column vectors ~z(r)
1 , ~z(r)
6 ,
and ~z(r)
11 of Zr. Matrix elements (1, 1), (6, 6), (11, 11), (6, 1), (11, 1), and (11, 6),
where the first and second numbers in each set (i, j) is the row and column of C and
M , are obtained by the matrix products (~z(r)
i )TH~z(r)
j and (~z(r)
i )T~z(r)
j , respectively.
Note that since C and M are symmetric, we only need to calculate those elements in
the lower (or upper) triangular portion of the matrices.
In the first stage of the communication, node y sends αy vectors, one at a time, to
node mod(y + 1, g), and node y receives αmod(y−1,g) vectors from node mod(y − 1, g).
Immediately, after each vector is received, all of the possible matrix elements between
the transferred vector and the local vectors belonging to the receiving node are calcu-
lated and stored. For example, in Fig. 4.1, node 1 will receive ~z(r)
1 from node 0 and
will calculate elements (2,1), (7,1), and (12,1), followed by the immediate deletion of
89
~z(r)
1 in order to save memory. Vectors ~z(r)
6 and ~z(r)
11 from node 0 undergo the same
process.
The second stage involves node y sending αy vectors to mod(y+2, g) and receiving
αmod(y−2,g) vectors from node mod(y − 2, g). In general, one can conclude for stage v
that y sends αy vectors to mod(y + v, g) and receives αmod(y−v,g) vectors from node
mod(y − v, g). When dealing with an odd number of nodes, this generalization is
true for all [[g/2]] stages. On the other hand, for even g, the generalization is true
except for the last stage g/2. As shown in Fig. 4.2, the nodes in each of the g/2 pairs
take turns in sending a single vector to the other. In the first “substage”, one vector
from one of the nodes in the pair will be sent to the other and will be combined
with all vectors present on the receiving end for the calculation of all possible matrix
elements. Next, the sender and receiver switch roles, and one vector is sent in the
other direction, to be combined with vectors that have not been sent in previous
substages. Fig. 4.2 explicitly shows the pattern for the d = 13 and g = 6 case.
It is important that we tally all of the arrays needed for each node in this algorithm
such that, based on the known limitations of memory on each node, we can determine
the minimum bound of g—any number of nodes larger than or equal to the minimum
of g can be used for the successful handling of the target problem. Each node will
contain SA(1 : G + 1) and IJA(1 : G + 1) which should not be a significant memory
consumer due to the sparsity of H; although, the size of G (number of nonzero
elements), in our chosen model problems, does grow at a slightly faster than linear
rate with N and could overstep the memory limitations for very large N . In the
future, we may investigate ways to distribute these arrays among the nodes. At the
present, however, the largest memory consumer is the set of vectors that make up Zr
which we will denote as the 2-dimensional array vec(1 : N, 1 : αy), specific for the
node y. Obviously, as one increases g, then αy is reduced; thus, we have direct control
of the size of vec by varying g. Each node also needs an additional column vector
recvec(1 : N) which is the vector that each node receives during the aforementioned
communication steps for the purpose of calculating matrix elements of C and M .
90
Last, elements from the two matrices need to be stored on each node. Fortunately,
these are also distributed almost evenly among the nodes, though, in step 6, all such
elements will be transferred to one node.
For step 5, a safe shift algorithm117 is used on each node to implement the commu-
nication in a timely fashion. There are two key elements in the method that add flow
control to the message passing, which is important, especially when communicating
large messages. First, a nonblocking receive command in the code is posted before
the send command. This is known as “preposting” the message, i.e., the nodes are
ready to receive any message before anything is sent. Second, a small message is sent
in the reverse direction which is known as a “permission to send” (PTS) message.
The receiver sends a small PTS message to the sender after the nonblocking receive
command (or the preposting) which opens up the pathway for the communication of
the actual data.
Finally, step 7 is performed, i.e., the generalized eigenvalue problem [Eq. (4.2)].
This is done on one node using the LAPACK subroutine DSYGV. In the future, if
we want to recover a large number of eigenvalues, i.e., d is large, then we will need
to incorporate a parallel dense linear algebra solver, such as specified in the PRISM
project.114,115
4.4 Results and Discussion
First, we considered the f = 3 DOF isotropic case, where ω1 = ω2 = ω3 = 1
[Eq. (4.9)]. For the Hamiltonian matrix H of order N = 36 083, we chose d =
6 000. For comparison, the exact eigenvalues of the HO Hamiltonian operator, H,
are[( ∑f
j=1 nj
)+f/2
]where nj is the nonnegative integer signifying the energy level
of the jth DOF of the HO. The degeneracy for each level is
deg(nS, f) =
nS + f − 1
f − 1
(4.20)
for f > 1 where nS =∑f
j=1 nj; the eigenvalues are nondegenerate for f = 1.
91
Table 4.1 reports the number of eigenvalues, K, out of the lowest d = 6 000,
computed to various relative accuracies (2×10−2, 2×10−3, 2×10−4, etc., correspond to
error tolerances of (0.2)f , (0.02)f , (0.002)f ,. . ., in a.u. as per Ref. [37]). Eigenvalues
for both Eq. (4.2) (computed using the proposed parallel algorithm of Sec. 4.3) and
for the 36 083× 36 083 matrix H (the LAPACK DSYEV subroutine) are considered,
with errors taken relative to the exact analytical eigenvalues of H (described above).
At relative accuracies 2 × 10−5 or better, there is a perfect match between the two
calculations, indicating that the proposed method introduces no substantial error
beyond those of H itself. For relative accuracies of 2 × 10−4 and above, the small
discrepancies indicate that the r value chosen (r = 37) is insufficiently large to achieve
exact agreement for the highest eigenvalues. The value r = 37 is the largest that can
be used without encountering numerical instabilities in the eigensolver routines due to
the loss of full rank. Though quite small, this value nevertheless sufficiently converged,
with respect to achieving nearly one full accuracy of H itself throughout the spectrum.
Lanczos, for instance, would not achieve anywhere near the performance of Table 4.1.
The 3 DOF system is studied further as shown in Fig. 4.3. The numbers of
eigenvalues, K, that have relative accuracies of 2 × 10−2, 2 × 10−4, 2 × 10−6, and
2×10−8 with regard to the eigenvalues of H, are plotted against N for fixed d = 6 000.
As N increases, the largest iteration number r increases, as well, e.g., at N = 21 976,
36 083, 49 840, 104 912, 207 320, and 416 840, the values of r are 33, 37, 43, 59,
75, and 95, respectively. The 2 × 10−2 relative accuracy curve is basically flat at
all values of N , and nearly equal to the full d = 6 000. Clearly, a large basis size
N is not required to compute the desired eigenvalues to this low level of accuracy.
However, the method becomes increasingly useful for higher accuracy calculations,
for which much larger N values are required, but CPU effort (since d is still 6 000)
increases only modestly. In general, K increases with N as expected, but only up
to a point, beyond which K is essentially flat, due to the limitations on r. For the
higher accuracy curves, the point at which the curve flattens is at a larger N . For
92
example, at 2 × 10−6 and 2 × 10−8 the flattening does not occur until N = 104 912
and 207 320, respectively.
We also looked at the 3 DOF anisotropic case where ω1 =√
2, ω2 =√
3, and
ω3 =√
5 a.u. [Eq. (4.9)]. For the isotropic case, the aspect ratios for each DOF
(a1, a2, and a3) are all unity, but in the anisotropic case, for optimal efficiency of the
weylet basis, a1 =√
ω1, a2 =√
ω2, and a3 =√
ω3. The eigenvalues of the Hamiltonian
operator, H, are non-degenerate and equal to[ ∑f
j=1 ωj(nj + 1/2)].
Comparing between Figs. 4.3 and 4.4, we note that, although the isotropic and
anisotropic cases demonstrate similar patterns with respect to the different relative
accuracies, the isotropic calculation is slightly more accurate. For example, at 2×10−4
and 2×10−6 the curves in the isotropic case reach a maximum number of eigenvalues
at around K = 4 000 and 2 700, respectively; whereas, the same curves in the
anisotropic case flatten out at K = 3 500 and 2 400. However, we have found that
incorporating coupling into the problem does not affect the efficiency of the weylet
basis. More specifically, we did a small study (data not reported) on a coupled
anisotropic system [obtained by adding −ε(q1q2 + q1q3 + q2q3) to Eq. (4.9) with f = 3
and the coupling parameter ε = 0.1] and found scarcely any difference in the number
of accurately computed eigenvalues. The coupling does reduce sparsity somewhat,
however, resulting in increased computation time.
We have also applied the parallel algorithm to the 6 DOF isotropic uncoupled HO
system at fixed d = 6 000. Fig. 4.5 indicates that two of the curves (2 × 10−3 and
2 × 10−4) have monotonic behavior over the N range considered, i.e., it would be
fruitful to consider basis sets even larger than N = 106 here, which is not surprising,
though we have not done so. With a C of order d = 6 000 representing H at
N = 977 789, we were able to extract an impressive 4 419 eigenvalues at 2 × 10−3
relative accuracy and 1 184 at 2× 10−4.
Another strategy for increasing accuracy that we have not considered explicitly
is to increase d. With larger d, more nodes will be needed in order to accommodate
increased memory requirements; although, it also turns out that the largest number
93
of iterations, r, is smaller. While this reduces CPU effort somewhat, it also nullifies
to some extent the gain in accuracy. The loss of rank problem might also be remedied
by the addition of some useful steps in the algorithm. We feel that incorporating an
orthogonalization scheme at strategic moments during the iteration step 4 could help
Zr to achieve full rank. This would lead to C more accurately reflecting H. This
scheme would require some costly communication similar to that of step 5, although
this would not be required after every iteration but possibly after every 10 or 20
iterations or so. For Hamiltonian matrices of size N larger than what was considered
in this chapter, we believe that this modification of the proposed method would be
worth investigating.
94
Table 4.1. Number of accurately computed eigenvalues, K, out of the lowest d =6 000 that fall within some relative accuracy (column 1) for the 3 DOFisotropic HO (N = 36 083) using either Eq. (4.2) at r = 37 (column 2)or the direct diagonalization of H (column 3). By comparing columns2 and 3, one can assess the additional error introduced by the parallelalgorithm.
accuracy parallel direct
2× 10−2 5 654 6 000
2× 10−3 4 790 6 000
2× 10−4 3 577 4 008
2× 10−5 1 653 1 653
2× 10−6 585 585
2× 10−7 59 59
2× 10−8 6 6
95
�� @@I
@@R ���
-
0
1
2 3
4
1,6,11
2,7,12
3,8,13 4,9
5,10
0
1
2 3
4
1,6,11
2,7,12
3,8,13 4,9
5,10
��
��
��
��� AA
AA
AA
AAK
HHHHHHHHj��������*�
�� @@I
@@R ���
-
�
0 5
1
2 3
4
1,7,13 6,12
2,8
3,9 4,10
5,11
0 5
1
2 3
4
1,7,13 6,12
2,8
3,9 4,10
5,11
?
HHHHHHHHj��������*
6
HHHHHHHHY ���������
0 5
1
2 3
4
1,7,13 6,12
2,8
3,9 4,10
5,11JJJ
JJ
JJ
JJJ]
�
�
-�
Figure 4.1. Node communication setup, with the first and second columns repre-senting the g = 5 and 6 case, respectively, both with d = 13, where grepresents the number of nodes and d is the dimension of the approximateISUB, Zr. The numbers inside the boxes (representing nodes) indicatewhich column vectors of the N ×d matrix Zr are stored accordingly, andthe numbers outside are the labels y of the nodes. The arrows designatethe communication direction in each stage (first stage is in the top row).For the g = 6 case, the last stage has communication in both directionsbetween the pairs of nodes which is further explained in Fig. 4.2.
96
0 3
1 7 13 4 10m����
-
1 7 13 4 10m����
�
1 7 13 4 10m � ��
-
1 7 13 4 10� ��
� ��
�
1 4
2 8 5 11m����
-
2 8 5 11m m�
2 8 5 11m � ��
-
2 5
3 9 6 12m����
-
3 9 6 12m m�
3 9 6 12m � ��
-
Figure 4.2. Node communication setup for the last stage (g = 6) with d = 13 (fromexample in Fig. 4.1). The pairs of nodes {0, 3}, {1, 4}, and {2, 5}, gothrough a two-way communication process. The circles highlight thosespecific vectors that are sent in the direction of the arrow and involvedin the element calculations of C and M . For example, in the first step ofthe {0, 3} node pair, ~z
(r)1 is sent followed by the calculation of elements
(4, 1) and (10, 1).
97
0 1 2 3 4N x 100,000
0
2000
4000
6000
Num
ber
of E
igen
valu
es
Figure 4.3. Number of eigenvalues, K, at a relative accuracy versus N for the 3DOF isotropic HO (d = 6 000). In general, the number of accurateeigenvalues increases with the growth of N . The solid line represents thenumber of eigenvalues with a relative accuracy of 2 × 10−2, dotted line2 × 10−4, dashed line 2 × 10−6, and the long dashed line represents themost accurate eigenvalues at 2× 10−8.
98
0 1 2 3 4N x 100,000
0
2000
4000
6000
Num
ber
of E
igen
valu
es
Figure 4.4. Number of eigenvalues, K, at a relative accuracy versus N for the 3 DOFanisotropic HO (d = 6 000). The set up is the same as that reported inFig. 4.3.
99
0 2 4 6 8 10N x 100,000
0
2000
4000
6000
Num
ber
of E
igen
valu
es
Figure 4.5. Number of eigenvalues, K, at a relative accuracy versus N for the 6DOF isotropic HO (d = 6 000). The solid line represents the number ofeigenvalues at 2 × 10−2 relative accuracy. The dotted and dashed linereflect higher accuracies at 2× 10−3 and 2× 10−4, respectively.
100
REFERENCES
[1] G. Scoles, D. Bassi, U. Buck, and D. C. Laine, eds., Atomic and MolecularBeam Methods (Oxford University Press, Oxford, 1988).
[2] G. W. M. Vissers, G. C. Groenenboom, and A. van der Avoird, J. Chem. Phys.119, 277 (2003).
[3] R. E. Miller, Acc. Chem. Res. 23, 10 (1990).
[4] G. W. M. Vissers, G. C. Groenenboom, and A. van der Avoird, J. Chem. Phys.119, 286 (2003).
[5] D. C. Dayton, K. W. Jucks, and R. E. Miller, J. Chem. Phys. 90, 2631 (1989).
[6] E. J. Bohac, M. D. Marshall, and R. E. Miller, J. Chem. Phys. 96, 6681 (1992).
[7] B. M. Smirnov, Clusters and Small Particles (Springer, New York, 2000).
[8] M. L. Mandich, AMO Physics Handbook (American Institute of Physics, 1996),chap. Clusters, p. 452.
[9] R. S. Berry, J. Jellinek, and G. Natanson, Phys. Rev. A 30, 919 (1984).
[10] P. A. Frantsuzov, D. Meluzzi, and V. A. Mandelshtam, Phys. Rev. Letts. 96,113401 (2006).
[11] A. Stace, Science 294, 1292 (2001).
[12] P. Kebarle, Annu. Rev. Phys. Chem. 28, 445 (1977).
[13] J. K. L. MacDonald, Phys. Rev. 43, 830 (1933).
[14] Z. Bacic and J. C. Light, Annu. Rev. Phys. Chem. 40, 469 (1989).
[15] J. Montgomery and B. Poirier, J. Chem. Phys. 119, 6609 (2003).
[16] W. H. Press et al, Numerical Recipes in Fortran 77: The Art of ScientificComputing (Cambridge University Press, Cambridge, England, 2001), 2nd ed.
[17] I. P. Hamilton and J. C. Light, J. Chem. Phys. 84, 306 (1986).
[18] Z. Bacic and J. C. Light, J. Chem. Phys. 85, 4594 (1986).
[19] Z. Bacic and J. C. Light, J. Chem. Phys. 86, 3065 (1987).
[20] Z. Bacic, D. Watt, and J. C. Light, J. Chem. Phys. 89, 947 (1988).
[21] A. C. Peet, J. Chem. Phys. 90, 4363 (1989).
101
[22] S. Garashchuk and J. C. Light, J. Chem. Phys. 114 (2001).
[23] T. Gonzalez-Lezana, J. Rubayo-Soneira, S. Miret-Artes, F. A. Gianturco,G. Delgado-Barrio, and P. Villarreal, J. Chem. Phys. 110, 9000 (1999).
[24] B. Poirier and J. C. Light, J. Chem. Phys. 113, 211 (2000).
[25] F. Gygi, Phys. Rev. B 48, 11692 (1993).
[26] F. Gygi, Phys. Rev. B 51, 11190 (1995).
[27] E. Fattal, R. Baer, and R. Kosloff, Phys. Rev. E. 53, 1217 (1996).
[28] V. Kokoouline, O. Dulieu, R. Kosloff, and F. Masnou-Seeuws, J. Chem. Phys.110, 9865 (1999).
[29] B. Poirier and J. C. Light, J. Chem. Phys. 111, 4869 (1999).
[30] M. Cargo and R. G. Littlejohn, Phys. Rev. E 65, 026703 (2002).
[31] J. M. Bowman, Comp. Phys. Commun., Special Issue on “Molecular Vibrations”51, 225 (1988).
[32] H. Weyl, Z. Phys. 46, 1 (1928).
[33] E. Wigner, Phys. Rev. 40, 749 (1932).
[34] B. Poirier, Found. Phys. 30, 1191 (2000).
[35] B. Poirier, J. Theo. Comput. Chem. 2, 65 (2003).
[36] B. Poirier and A. Salam, J. Chem. Phys. 121, 1690 (2004).
[37] B. Poirier and A. Salam, J. Chem. Phys. 121, 1704 (2004).
[38] J. R. Klauder and B.-S. Skagerstam, Coherent States: Applications in Physicsand Mathematical Physics (World Scientific Publishing Co., Singapore, 1985).
[39] S. Szabo, P. Adam, J. Janszky, and P. Domokos, Phys. Rev. A 53 (1996).
[40] A. Kenfack, J. M. Rost, and A. M. Ozorio de Almeida, J. Phys. B: At. Mol.Opt. Phys. 37, 1645 (2004).
[41] M. J. Davis and E. J. Heller, J. Chem. Phys. 71, 3383 (1979).
[42] J. von Neumann, Mathematical Foundations of Quantum Mechanics (PrincetonUniversity Press, New Jersey, 1932).
[43] M. Boon and J. Zak, Phys. Rev. B 18, 6744 (1978).
102
[44] J. Zak, J. Phys. A: Math. Gen. 34, 1063 (2001).
[45] L. K. Stergioulas and A. Vourdas, J. Mod. Opt. 45 (1998).
[46] L. K. Stergioulas, V. S. Vassiliadis, and A. Vourdas, J. Phys. A: Math. Gen. 32(1999).
[47] L. M. Arevalo Aguilar and H. Moya-Cessa, Phys. Scr. 70, 14 (2004).
[48] D. Gabor, J. Inst. Electr. Engin. 93, 429 (1946).
[49] V. Bargmann, P. Butera, L. Girardello, and J. R. Klauder, Rep. Math. Phys.2, 221 (1971).
[50] A. M. Perelomov, Theor. Math. Phys. 6, 156 (1971).
[51] H. Bacry, A. Grossmann, and J. Zak, Phys. Rev. B 12, 1118 (1975).
[52] P.-O. Lowdin, Adv. Phys. 5, 1 (1956).
[53] R. Balian, C. R. Acad. Sc. Paris 292, 1357 (1981).
[54] F. Low, Complete Sets of Wave-Packets (World Scientific, Singapore, 1985),pp. 17–22.
[55] K. G. Wilson, Generalized wannier functions, Cornell University preprint, 1987.
[56] I. Daubechies, S. Jaffard, and J. Journe, SIAM J. Math. Anal. 22, 554 (1991).
[57] D. O. Harris, G. G. Engerholm, and W. D. Gwinn, J. Chem. Phys. 43, 1515(1965).
[58] A. S. Dickinson and P. R. Certain, J. Chem. Phys. 49, 4209 (1968).
[59] J. V. Lill, G. A. Parker, and J. C. Light, Chem. Phys. Lett. 89, 483 (1982).
[60] R. W. Heather and J. C. Light, J. Chem. Phys. 79, 147 (1983).
[61] J. V. Lill, G. A. Parker, and J. C. Light, J. Chem. Phys. 85, 900 (1986).
[62] J. C. Light and T. Carrington Jr., Adv. Chem. Phys. 114, 263 (2000).
[63] R. Dawes and T. Carrington, Jr., J. Chem. Phys. 122, 134101 (2005).
[64] M. J. Bramley and T. Carrington, Jr., J. Chem. Phys. 99, 8519 (1993).
[65] S. Carter and N. C. Handy, Comput. Phys. Rep. 5, 115 (1986).
[66] L. Halonen, D. W. Noid, and M. S. Child, J. Chem. Phys. 78, 2803 (1983).
103
[67] L. Halonen and M. S. Child, J. Chem. Phys. 79, 4355 (1983).
[68] A. J. Bracken, H.-D. Doebner, and J. G. Wood, Phys. Rev. Lett. 83, 3758(1999).
[69] J. G. Wood, Ph.D. thesis in physics, University of Queensland, St. Lucia 4072,Australia (2003).
[70] R. McWeeny, Rev. Mod. Phys. 32, 335 (1960).
[71] A. H. R. Palser and D. E. Manolopoulos, Phys. Rev. B 58, 12 (1998).
[72] J. Echave and D. C. Clary, Chem. Phys. Lett. 190, 225 (1992).
[73] H. Wei and T. Carrington, Jr., J. Chem. Phys. 97, 3029 (1992).
[74] W. Bian and B. Poirier, J. Theo. Comput. Chem. 2, 583 (2003).
[75] F. L. Quere and C. Leforestier, J. Chem. Phys. 92, 247 (1990).
[76] B. Poirier and T. Carrington, Jr., J. Chem. Phys. 114, 9254 (2001).
[77] G. H. Golub and C. F. Van Loan, Matrix Computations (Johns Hopkins Uni-versity Press, Baltimore, 1989).
[78] Y. Saad, Iterative Methods for Sparse Linear Systems (PWS, Boston, 2000).
[79] R. L. Johnston, Atomic and Molecular Clusters (Taylor and Francis, London,2002).
[80] R. J. Gdanitz, Chem. Phys. Lett. 348, 67 (2001).
[81] S. M. Cybulski and R. R. Toczylowski, J. Chem. Phys. 111, 10520 (1999).
[82] A. Wuest and F. Merkt, J. Chem. Phys. 118, 8807 (2003).
[83] G. C. Maitland, Mol. Phys. 26, 513 (1973).
[84] R. J. Le Roy, M. L. Klein, and I. J. McGee, Mol. Phys. 28, 587 (1974).
[85] R. A. Aziz, W. J. Meath, and A. R. Allnatt, Chem. Phys. 78, 295 (1983).
[86] R. A. Aziz, W. J. Meath, and A. R. Allnatt, Chem. Phys. 85, 491 (1984).
[87] R. A. Aziz and M. J. Slaman, Chem. Phys. 130, 187 (1989).
[88] Y. Tanaka and K. Yoshina, J. Chem. Phys. 57, 2964 (1972).
[89] Y. Tanaka and W. C. Walker, J. Chem. Phys. 74, 2760 (1981).
104
[90] D. N. Timms, A. C. Evans, M. Boninsegni, D. M. Ceperley, J. Mayers, andR. O. Simmons, J. Phys.: Condens. Matter 8, 6665 (1996).
[91] Q. Wang and J. K. Johnson, Fluid Phase Equil. 132, 93 (1997).
[92] J. P. Hansen and J. J. Weis, Phys. Rev. 188, 314 (1969).
[93] D. Thirumalai, R. W. Hall, and B. J. Berne, J. Chem. Phys. 81, 2523 (1984).
[94] B. R. Johnson, J. L. Mackey, and J. L. Kinsey, J. Comp. Phys. 168, 356 (2001).
[95] J. P. Modisette, P. Nordlander, J. L. Kinsey, and B. R. Johnson, Chem. Phys.Lett. 250, 485 (1996).
[96] A. Maloney, J. L. Kinsey, and B. R. Johnson, J. Chem. Phys. 117, 3548 (2002).
[97] J. E. Moyal, Proc. Cambridge Phil. Soc. 45, 99 (1949).
[98] M. Hillery, R. F. O’Connell, M. O. Scully, and E. P. Wigner, Phys. Rep. 106,121 (1984).
[99] G. C. Carney, L. L. Sprandel, and C. W. Kern, Adv. Chem. Phys. 37, 305(1978).
[100] J. M. Bowman, J. Chem. Phys. 68, 608 (1978).
[101] J. M. Bowman, Acc. Chem. Res. 19, 202 (1986).
[102] R. Lombardini and B. Poirier, J. Chem. Phys. 124, 144107 (2006).
[103] S. Flugge, Practical Quantum Mechanics (Springer-Verlag, New York, 1971),vol. 1, p. 94.
[104] H. Chen, S. Liu, and J. C. Light, J. Chem. Phys. 110, 168 (1999).
[105] C. C. Paige, J. Inst. Math. Appl. 10, 373 (1972).
[106] B. N. Parlett and D. S. Scott, Math. Comp. 33, 217 (1979).
[107] H. D. Simon, Ph.D. thesis, University of California, Berkeley (1982).
[108] H. D. Simon, Math. Comp. 42, 115 (1984).
[109] K. Wu and H. D. Simon, Tech. Rep. 41412, Lawrence Berkeley National Labo-ratory (1998).
[110] K. Wu and H. D. Simon, Tech. Rep. 41284, Lawrence Berkeley National Labo-ratory (1997).
[111] A. Ruhe, Math. Comput. 33, 680 (1979).
105
[112] X.-G. Wang and T. Carrington, Jr., J. Chem. Phys 114, 1473 (2001).
[113] R. Chen and H. Guo, J. Chem. Phys. 114, 1467 (2001).
[114] C. Bischof, S. Huss-Lederman, X. Sun, and A. Tsao, in Scalable Parallel Li-braries Conference (IEEE Computer Society, Washington, DC, 1994), pp. 123–131.
[115] L. Auslander and A. Tsao, Adv. Appl. Math. 13, 253 (1992).
[116] J. H. Wilkinson, The Algebraic Eigenvalue Problem (Oxford University, London,1965).
[117] M. P. Sears, (private communication).
[118] R. G. Littlejohn, Phys. Rep. 138, 193 (1986).
[119] R. Simon, E. C. G. Sudarshan, and N. Mukunda, Phys. Rev. A 36, 3868 (1987).
[120] I. Gradshteyn and I. Ryzhik, eds., Table of Integrals, Series, and Products(Academic Press, San Diego, 2000).
[121] J. W. Demmel, Applied Numerical Linear Algebra (SIAM, Philadelphia, 1997).
106
APPENDIX A
JUSTIFICATION OF SPHERICAL TRUNCATION CONDITION
To justify the rotational symmetry implicit in Eqs. (2.33) and (2.34), only the un-
symmetrized Gaussian representation will be addressed (for simplicity); however, from
Eq. (2.10), it is clear that the conclusions drawn here also apply to the symmetrized
case. The doubly dense unsymmetrized 3 DOF Gaussians are given by
guv(x) =
(a2
π
)3/4
e−i π2u·vei
√πav·xe
−a2
2
(x−u
√π
a
)2
, (A.1)
from which the potential matrix elements are found to be
[V g
]u,v,u′,v′
=
(a2
π
)3/2
e−i π2(u·v−u′·v′)e−
π4(u∆)2
∫V (x)e−i
√πax·v∆e
−a2
(x−
√π
2
u+a
)2
dx.
(A.2)
By taking the absolute value of the Eq. (A.2) integrand, one obtains an upper limit
on the absolute value of the integral:
∣∣∣∣[V g
]u,v,u′,v′
∣∣∣∣ ≤(
a2
π
)3/2
e−π4(u∆)2
∫V (x)e
−a2
(x−
√π
2
u+a
)2
dx. (A.3)
Since the potential V (x) = V (x · x) is rotationally symmetric, it is obvious that
Eq. (A.3) is invariant with respect to any rotation of the vectors u∆ or u+; thus, a
spherical truncation is applicable for these parameters.
For v∆, the rotational invariance can be deduced from the momentum space rep-
resentation, in which the 3 DOF Gaussians are found to be
guv(p) =1
(πa2)3/4ei π
2u·ve−i
√π
au·pe−
12a2 (p−v
√πa)
2
. (A.4)
The potential energy operator is represented as a function of the momentum Lapla-
cian, i.e. V = V (−∇2p). For simplicity, we consider only the first-order −∇2
p term,
although similar conclusions can be drawn for all other orders as well. The upper
limit of the −∇2p matrix element integral is found to be
∣∣∣〈g(2)uv |(−∇2
p)|g(2)u′v′〉
∣∣∣ ≤ 1
(πa2)3/2e−
π4(v∆)2
∫e− 1
a2
(p−
√π
2av+
)2
107
×ζ[(p− v′a
√π),u′
]dp, (A.5)
where ζ [p∆,u′] =
√(1
a4(p∆)2 − 3
a2− π
a2(u′)2
)2
+π
a6(u′ · p∆)2. (A.6)
Since ζ is invariant with respect to simultaneous rotation of all vectors, the same
must be true of the integrand in Eq. (A.5), and also the entire right hand side. Thus,
spherical truncation in v∆ may also be applied.
108
APPENDIX B
EIGENFUNCTIONS OF HARMONIC OSCILLATOR PSRO
This section proves that the PSRO ρQCK associated with 2f -dimensional hyper-
spheres centered at the origin have the same eigenfunctions as the quantum HO
Hamiltonian. The first part of the proof describes a specific symplectic matrix and
its corresponding operator formalism known as a metaplectic operator. Next, it is
shown that the PSRO and its Wigner-Weyl (WW) phase space representation ρQCK are
invariant under this metaplectic/symplectic transformation which ultimately leads to
the conclusion of the eigenfunctions of the PSRO.
Consider a subgroup of SO(2f) that consists of the set of 2f -dimensional rotations
R(Θ) where Θ = (θ1, θ2, . . . , θf ) that all have a block diagonal matrix form, i.e.,
R(Θ) = R(θ1)⊕ R(θ2)⊕ . . .⊕ R(θf )
=
R(θ1) 0
R(θ2). . .
0 R(θf )
(B.1)
where R(θi) =
cos(θi) sin(θi)
−sin(θi) cos(θi)
and θi ∈ [0, 2π).
If we use the subgroup to act on 2f -dimensional phase space, i.e.,
R(Θ)~z = ~z ′ (B.2)
109
and ~z =
q1
p1
...
qf
pf
,
then R(Θ) can be thought of as a composition of f 2-dimensional counterclockwise
rotations about the origin each rotating a pair of phase space axes (qj, pj) by θj where
j = 1, . . . , f .
One important property of these rotation matrices is that they are symplectic,
i.e., R(Θ) ∈ Sp (2f,R), since they satisfy the equation:
[R(Θ)
]Tβ R(Θ) = β (B.3)
where β =
0 1 0
−1 0. . .
0 1
0 −1 0
and[R(Θ)
]Tis the transpose of R(Θ). Symplectic matrices are known in classical
mechanics as transformations of canonical coordinates that leave the Poisson bracket
invariant. This property carries over to quantum mechanics where the commutator
of the operators corresponding to the canonical coordinates, i.e,
[(~z)i, (~z)j]Q = i[β
]ij
(B.4)
110
where ~z =
q1
p1
...
qf
pf
,
are invariant under these transformations. Thus, one can write
[(~z)′i, (~z)′j]Q = [(~z)i, (~z)j]Q (B.5)
where (~z)′i =[R(Θ)
]ij
(~z)j
and repeated indices are implied to be summed.
For every symplectic matrix there corresponds a unitary operator which in our
case will be denoted as U [R(Θ)] parameterized by R(Θ) where
U [R(Θ)] (~z)i U [R(Θ)]−1 =[R(Θ)
]ij
(~z)j . (B.6)
These unitary operators are known as metaplectic operators and are thoroughly re-
viewed in Ref. [118] . Based on the definition of R(Θ) in (B.1), the corresponding
metaplectic operator is
U [R(Θ)] = ei2Θ(q2+p2)
= ei2θ1(q2
1+p21) ⊗ e
i2θ2(q2
2+p22) ⊗ . . .⊗ e
i2θf (q2
f+p2f ) . (B.7)
The last expression can be verified by plugging it into (B.6) and using the Baker-
Hausdorff lemma. For example,
111
U [R(Θ)] qj U [R(Θ)]−1 = ei2θj(q
2j +p2
j ) qj e−i2θj(q
2j +p2
j )
= qj +(iθj/2)
1![q2
j + p2j , qj]Q
+(iθj/2)2
2![q2
j + p2j , [q
2j + p2
j , qj]Q]Q + . . .
= qj cos(θj) + pj sin(θj) (B.8)
which is in agreement with the right hand side of (B.6). One can also substitute qj
with pj and get
U [R(Θ)] pj U [R(Θ)]−1 = −qj sin(θj) + pj cos(θj) (B.9)
also satisfying the symplectic/metaplectic relationship.
Let us consider the case where f = 1 and the rotation parameter for the single
DOF is very small, i.e., |θ| ¿ 1. The rotation matrix can be written approximately
as
R(θ) =
cos(θ) sin(θ)
−sin(θ) cos(θ)
= eiθJ
≈ 1 + iθJ (B.10)
where J =
0 −i
i 0
is the generator of the SO(2) group. The corresponding infinitesimal metaplectic
operator is
U [R(θ)] = ei2θ(q2+p2)
112
≈ 1 +i
2θ(q2 + p2) (B.11)
which approximately transforms a 1 DOF PSRO by the equation
U [R(θ)] ρQCK (q, p) U [R(θ)]−1 ≈ ρQC
K (q, p) +i
2θ[q2 + p2, ρQC
K (q, p)]Q . (B.12)
Our goal is to show how the WW representation of the corresponding PSRO,
ρQCK (q, p), transforms via (B.12). Let’s first look at the WW transform in two forms
relating the PSRO and its WW representation:
ρQCK (q, p) =
1
2π
∫〈q − 1
2q′| ρQC
K (q, p) |q +1
2q′〉 eiq′pdq′ (B.13)
ρQCK (q, p) =
1
2π
∫〈p− 1
2p′| ρQC
K (q, p) |p +1
2p′〉 e−ip′qdp′ (B.14)
where the matrix elements inside the integrals are configuration kernels defined in
Eq. (3.10). Using (B.13) and (B.14), one can easily find the WW representations
corresponding to [q2, ρQCK (q, p)]Q and [p2, ρQC
K (q, p)]Q, respectively:
[q2, ρQCK (q, p)]Q −→ 2iq
∂
∂pρQC
K (q, p) (B.15)
[p2, ρQCK (q, p)]Q −→ −2ip
∂
∂qρQC
K (q, p) . (B.16)
Thus, the righthand side of Eq. (B.12)
ρQCK (q, p) +
i
2θ[q2 + p2, ρQC
K (q, p)]Q −→ ρQCK (q, p) + pθ
∂
∂qρQC
K (q, p)− qθ∂
∂pρQC
K (q, p)
≈ ρQCK (q + pθ, p− qθ)
= ρQCK (
[1 + iθJ
]1j
~zj,[1 + iθJ
]2j
~zj )
≈ ρQCK (
[R(θ)
]1j
~zj,[R(θ)
]2j
~zj ) (B.17)
where ~z =
q
p
.
113
In the 1 DOF case, we find that the infinitesimal metaplectic transformation of the
PSRO is equivalent to the infinitesimal symplectic transformation of the coordinates
of the corresponding WW representation. If we now consider the rotation parameter θ
to be finite, then applying N infinitesimal rotations repeatedly each with parameter
θ/N (where N is large) is the same as applying the actual metaplectic operator
U [R(θ)] and symplectic matrix R(θ) since
U [R(θ)] = limN→∞
(1 +
i
2
θ
N(q2 + p2)
)N
= ei2θ(q2+q2) (B.18)
R(θ) = limN→∞
(1 + i
θ
NJ
)N
= eiθJ . (B.19)
Thus, our argument can be extended to transformations that are not infinitesimal.
Also, it is obvious that the relationship is valid when carried over to the f DOF case
dealing with the rotations of (B.1) and (B.7); therefore, we can conclude that
U [R(Θ)] ρQCK (q, p) U [R(Θ)]−1 −→
ρQCK
( [R(Θ)
]1j
(~z)j,[R(Θ)
]2j
(~z)j, . . . ,[R(Θ)
](2f)j
(~z)j
). (B.20)
In general, this mapping is true for any symplectic/metaplectic pair, and a formal
proof of this can be seen in Ref. [119] .
For the HO case, ρQCK (q,p) is a 2f -dimensional hypersphere centered at the origin
which is invariant under rotations along the center, i.e.,
ρQCK
( [R(Θ)
]1j
(~z)j,[R(Θ)
]2j
(~z)j, . . . ,[R(Θ)
](2f)j
(~z)j
)= ρQC
K (q,p) . (B.21)
Since the WW transform is isomorphic, one can deduce from (B.20) and (B.21) that
U [R(Θ)] ρQCK (q, p) U [R(Θ)]−1 = ρQC
K (q, p) ; (B.22)
thus, U [R(Θ)] and ρQCK (q, p) share the same eigenfunctions. From (B.7), we see
that the eigenfunctions of U [R(Θ)] are the HO eigenfunctions |n〉; therefore, the
eigenfunctions of ρQCK (q, p) are |n〉, as well.
114
APPENDIX C
EIGENVALUES OF HARMONIC OSCILLATOR PSRO
This section proves the relationship wn(Emax) = wnS(Emax) where nS =
∑fj=1 nj
and provides an analytical expression of the eigenvalue. The following derivation is
very similar to the proof presented in Ref. [69]; although, there are slight deviations.
First, we want to find an analytical expression for the single DOF eigenvalue
w(1)n (Emax) which will be useful later in the proof. We start by plugging Eqs. (3.21)
and (3.22) into (3.20) for the 1 DOF case, i.e.,
w(1)n (Emax) =
∫
RWn(q, p)dqdp
=(−1)n
π
∫
RLn[2(q2 + p2)]e−(q2+p2)dqdp (C.1)
where R is a disk of radius√
2Emax. The next logical step is to introduce polar
coordinates r =√
q2 + p2 and φ such that the region of integration or disk R =
{(r, φ) | 0 ≤ r ≤ √2Emax, 0 ≤ φ ≤ 2π}; thus,
w(1)n (Emax) =
(−1)n
π
∫ 2π
0
∫ √2Emax
0Ln(2r2)e−r2
rdrdφ. (C.2)
After integrating out the angle and substituting 2r2 with the variable t, Eq. (C.2)
simplifies to
w(1)n (Emax) =
(−1)n
2
∫ 4Emax
0Ln(t)e−t/2dt. (C.3)
From Ref. [120] , there is a useful Laguerre polynomial identity:
Ln(t) =d
dt[Ln(t)− Ln+1(t)]. (C.4)
After plugging this equation into (C.3) and integrating by parts, we arrive at the
expression
115
w(1)n (Emax) =
(−1)n
2e−2Emax [Ln(4Emax)− Ln+1(4Emax)] +
1
2
((−1)n
2
×∫ 4Emax
0Ln(t)e−t/2dt +
(−1)n+1
2
∫ 4Emax
0Ln+1(t)e
−t/2dt)
=(−1)n
2e−2Emax [Ln(4Emax)− Ln+1(4Emax)]
+1
2
(w(1)
n (Emax) + w(1)n+1(Emax)
)(C.5)
which simplifies into the recurrence relationship
w(1)n+1(Emax) = w(1)
n (Emax) + (−1)n+1e−2Emax [Ln(4Emax)− Ln+1(4Emax)]. (C.6)
Given that w(1)0 = 1− e−2Emax [look at (C.3) where L0(t) = 1], the closed form of the
last equation is
w(1)n (Emax) = 1− 2e−2Emax
[ n−1∑
k=0
(−1)kLk(4Emax) +(−1)n
2Ln(4Emax)
](C.7)
for n > 0.
Let’s now go back to the f DOF case where (C.1) is now
wn(Emax) =∫
RWn(q,p)dfqdfp
=∫
R
f∏
j=1
(−1)nj
πLnj
[2(q2j + p2
j)]e−(q2
j +p2j )dqjdpj. (C.8)
We can use polar coordinates r = (r1, . . . , rf ) and φ = (φ1, . . . , φf ) where qj = rj cosφj
and pj = rj sinφj. The region of integration or hypersphere R = {(r, φ) | 0 ≤ r1 ≤√
2Emax, 0 ≤ r2 ≤√
2Emax − r21, . . . , 0 ≤ rf ≤
√2Emax − r2
1 − r22 − · · · − r2
f−1, 0 ≤φ1 ≤ 2π, 0 ≤ φ2 ≤ 2π, . . . , 0 ≤ φf ≤ 2π}; thus, (C.8) can be written as
116
wn(Emax) = 2f (−1)n1+···+nf
∫ √2Emax
0Ln1(2r
21)e
−r21r1
∫ √2Emax−r21
0Ln2(2r
22)e
−r22r2 · · ·
×∫ √2Emax−r2
1−···−r2f−1
0Lnf
(2r2f )e
−r2f rf drf · · · dr2dr1 (C.9)
where the all of the angles have been integrated to give (2π)f .
In order to solve for an analytical representation of this integral, we will first look
at the rf−1 and rf contributions, the two innermost integrals, i.e.,
22(−1)nf−1+nf
∫ √2Emax−r21−···−r2
f−2
0Lnf−1
(2r2f−1)e
−r2f−1rf−1
×∫ √2Emax−r2
1−···−r2f−1
0Lnf
(2r2f )e
−r2f rf drfdrf−1. (C.10)
From Eq. (C.2), we notice that we have a 1 DOF eigenvalue for the rf contribution:
2(−1)nf−1
∫ √2Emax−r21−···−r2
f−2
0Lnf−1
(2r2f−1)e
−r2f−1rf−1
×w(1)nf
[Emax − (1/2)(r21 − · · · − r2
f−1)] drf−1. (C.11)
Replacing the eigenvalue with (C.7) and substituting 2r2f−1 with the variable u, the
previous expression becomes
(−1)nf−1
2
∫ 2(2Emax−r21−···−r2
f−2)
0Lnf−1
(u)e−u/2
(1− 2e−(2Emax−r2
1−···−r2f−2−u/2)
×[ nf−1∑
k=0
(−1)kLk[2(2Emax − r21 − · · · − r2
f−2)− u]
+(−1)nf
2Lnf
[2(2Emax − r21 − · · · − r2
f−2)− u]])
du. (C.12)
117
Noticing that the first term is also a 1 DOF eigenvalue and using the Laguerre iden-
tity120
∫ t
0Lm(u)Ln(t− u)du = Lm+n(t)− Lm+n+1(t), (C.13)
we get the expression
w(1)nf−1
(E ′max)− (−1)nf−1 e−2E′max
( nf−1∑
k=0
(−1)k(Lnf−1+k(4E
′max)
−Lnf−1+k+1(4E′max)
))− (−1)nf−1+nf
2e−2E′max
(Lnf−1+nf
(4E ′max)
−Lnf−1+nf+1(4E′max)
)(C.14)
where E ′max = Emax− (1/2)(r2
1−· · ·− r2f−2) . Replacing the first term with (C.7) and
then combining summations, one can recognize, using (C.7) in reverse, that (C.14)
simplifies to
1
2
(w
(1)nf−1+nf
[Emax − (1/2)(r2
1 − · · · − r2f−2)
]
+w(1)nf−1+nf+1
[Emax − (1/2)(r2
1 − · · · − r2f−2)
]). (C.15)
Finally, plugging (C.15) into our original expression (C.9), we get
wn(Emax) = 2f−2(−1)n1+···+nf−2
∫ √2Emax
0Ln1(2r
21)e
−r21r1
×∫ √2Emax−r2
1
0Ln2(2r
22)e
−r22r2 · · ·
∫ √2Emax−r21−···−r2
f−3
0Lnf−2
(2r2f−2)
×e−r2f−2rf−2
1
2
(w
(1)nf−1+nf
[Emax − (1/2)(r2
1 − · · · − r2f−2)
]
+w(1)nf−1+nf+1
[Emax − 1
2(r2
1 − · · · − r2f−2)
])drf−2 · · · dr2dr1 . (C.16)
118
Using steps (C.11)-(C.15) for each eigenvalue term, we can eliminate the innermost
integral (rf−2 contribution) to give the simplified expression
1
22
(w
(1)nf−2+nf−1+nf
(E ′′max) + 2w
(1)nf−2+nf−1+nf+1(E
′′max)
+w(1)nf−2+nf−1+nf+2(E
′′max)
)(C.17)
where E ′′max = Emax − (1/2)(r2
1 − · · · − r2f−3). If one repeats this process in order to
eliminate all of the integrals in Eq. (C.9), the final equation becomes
wn(Emax) =1
2f−1
( f−1∑
k=0
(f − 1
k
)w
(1)nS+k(Emax)
). (C.18)
Note that wn(Emax) depends on nS; thus, we can write wn(Emax) = wnS(Emax).
119
APPENDIX D
JUSTIFICATION OF PREPROCESSING AND SUBSPACE ITERATION
METHOD
The symmetric and real matrix, H ′, can be diagonalized via some orthogonal
similarity transformation, i.e.,
H ′ = F ΛFT , (D.1)
where the columns of F are the orthonormal eigenvectors of H ′, and Λ is a diagonal
matrix containing the corresponding eigenvalues. To simplify the proof, we arrange
the columns of F such that the eigenvalues in Λ are listed in descending order, starting
from the top-left corner (H ′ is similarly reordered). Thus, Λ can be rewritten as
Λ =
λu
λl
(D.2)
where λu is the d×d diagonal matrix that contains the eigenvalues ranging from 1 to
2 and λl is the N −d×N −d diagonal matrix with the remaining eigenvalues ranging
from 0 to 1.
Since FTF = I, then
(H ′)rZ0 = F ΛrFTZ0
= F
(λu)r
(λl)r
FTZ0 . (D.3)
We can rewrite Eq. (D.3) as
(H ′)rZ0 = F
V d×d
W (N−d)×d
(D.4)
120
where each row of V d×d and W (N−d)×d is a multiple of a diagonal element of (λu)r
and (λl)r, respectively. We can also write F in a split form, i.e.,
F =[
FN×du F
N×(N−d)l
](D.5)
where the column eigenvectors of FN×du and F
N×(N−d)l correspond to the eigenvalues
in λu and λl, respectively.
With Eqs. (D.4) and (D.5), a useful expression can be realized:
(H ′)rZ0 = FN×du V d×d + F
N×(N−d)l W (N−d)×d . (D.6)
Since the elements in (λl)r approach 0 as r −→∞, then
(H ′)rZ0 −→ FN×du V d×d (D.7)
as the iteration number r increases. Since V d×d has full rank, then the space spanned
by the column vectors of FN×du V d×d is the same as that spanned by FN×d
u . Thus,
after many iterations, one approaches an ISUB spanned by the eigenvectors that
correspond to eigenvalues of H ′ that range between 1 and 2. These eigenvectors are
exactly the same as those of the original matrix H with eigenvalues less than µ. This
derivation is based upon ideas obtained from Ref. [121] .