Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets A. Litvinenko, Y. Sun, M. Genton, D. Keyes, CEMSE, KAUST H IERARCHICAL L IKELIHOOD A PPROXIMATION Goal: To improve estimation of unknown statistical parameters in a spatial soil moisture field. How ?: By reducing linear algebra cost from O (n 3 ) to O (n log n). Let Z be a mean-zero, stationary and isotropic Gaussian process with a Matérn covariance at n ir- regularly spaced locations. Let Z =(Z (s 1 ), ..., Z (s n )) T ∼N (0, C(θ )), θ ∈ R q is an unknown parameter vector of interest, where C ij (θ ) = cov(Z (s i ),Z (s j )) = C (ks i - s j k, θ ), and C (r ) := C θ (r )= 2σ 2 Γ(ν ) r 2‘ ν K ν r ‘ , θ =(σ 2 , ν, ‘) T is the Matérn covariance function. The MLE of θ is obtained by maximizing the Gaussian log- likelihood function: L(θ )= - n 2 log(2π ) - 1 2 log |C(θ )|- 1 2 Z > C(θ ) -1 Z. We approximate C ≈ e C in the H-matrix for- mat with cost and storage O (kn log n), k n. Obtain a cheap approximation L(θ ) ≈ ˜ L(θ ; k ). Operation Sequen. Compl. Parallel Compl. (shared mem.) building( ˜ C) O (n log n) O(n log n) p + O (|V (T )\L(T )|) storage( ˜ C) O (kn log n) O (kn log n) ˜ Cz O (kn log n) O(kn log n) p + n √ p H-Cholesky O (k 2 n log 2 n) O(n log n) p + O ( k 2 n log 2 n n 1/d ) Daily soil moisture, Mississippi basin. H-matrix rank 3 7 9 cov. length 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 Box-plots for different H-ranks k = {3, 7, 9}, ‘ =0.0334. ℓ, ν =0.325, σ 2 =0.98 0 0.2 0.4 0.6 0.8 1 ×10 4 -5 -4 -3 -2 -1 0 1 2 3 log(| ˜ C|) z T ˜ C -1 z - ˜ L Moisture, n = 66049, rank k = 11. P ARALLEL H IERARCHICAL MATRICES (H ACKBUSCH ,K RIEMANN ’05) Advantages to approximate C by e C: H-approximation is cheap; storage and matrix-vector product cost O (kn log n); LU and inverse cost O (k 2 n log 2 n); efficient parallel implementations exists. 112 26 24 112 112 123 26 107 19 17 107 107 25 25 25 28 24 25 26 16 25 29 22 18 24 14 16 21 23 30 24 28 35 112 112 19 24 35 21 112 19 8 23 35 18 15 12 20 115 115 33 115 23 29 127 125 24 25 125 27 25 25 30 125 125 17 125 26 19 125 125 17 35 22 25 26 25 21 15 28 30 17 13 19 20 17 25 21 25 119 119 16 119 23 21 120 120 12 17 16 13 20 120 14 17 25 23 17 24 13 20 8 12 17 122 18 15 122 122 122 34 23 112 11 26 24 23 113 113 21 104 17 23 104 104 120 28 24 18 120 29 22 26 31 18 27 125 125 28 125 19 30 122 122 23 32 121 15 15 26 33 122 122 123 25 123 24 19 18 28 35 33 24 35 16 14 19 21 12 20 124 124 18 123 21 16 123 123 18 27 127 29 19 21 23 127 127 21 123 11 32 123 123 28 26 16 123 26 30 19 31 18 26 123 113 113 22 113 16 25 26 16 28 23 24 26 28 123 123 112 73 39 112 112 123 63 107 56 65 107 107 25 50 67 71 76 50 75 16 41 84 75 76 65 70 67 69 63 80 82 79 83 112 112 19 43 76 82 112 73 72 62 83 37 51 12 32 115 115 87 115 78 77 127 127 63 73 125 70 77 59 75 125 125 76 125 78 84 125 125 48 71 79 75 82 59 71 43 61 80 67 52 69 74 67 67 64 78 119 119 64 119 65 74 120 120 12 29 45 47 59 120 51 58 53 69 44 52 25 40 8 20 67 122 62 61 122 122 122 69 71 113 63 71 58 71 113 113 64 104 59 72 104 104 120 55 69 63 120 66 80 66 74 43 60 125 125 81 125 69 78 122 122 68 77 122 62 65 64 81 122 122 123 76 123 71 80 34 56 73 85 76 93 55 61 45 54 27 40 124 124 68 123 63 71 123 123 61 69 127 68 74 60 74 127 127 73 123 63 74 123 123 57 72 66 123 67 88 63 74 45 61 123 113 113 72 113 65 75 69 68 72 83 62 74 83 123 123 (1st) Matérn H-matrix approximations for moisture example, n = 8000, ε = 10 -3 , ‘ =0.64, ν =0.325, σ 2 =0.98, 29.3MB vs 488.3MB for dense, set up time 0.4 sec.; (2nd) Cholesky factor L, with accuracy in each block ε = 10 -8 , 4.8 sec., storage 52.8 MB.; (3rd) Distribution across p processors; (4) Kronecker prod- uct of H-matrices, n = 381K ; (5) Discretization of Mississippi basin, [-84.8 ◦ -72.9 ◦ ]×[32.446 ◦ , 43.4044 ◦ ]. N UMERICAL EXAMPLES H-matrix approximation, ν =0.5, domain G = [0, 1] 2 , k e C (0.25,0.75) k 2 = {212, 568}, n = 16049. k KLD kC - e Ck 2 k e C e C -1 - Ik 2 ‘ =0.25 ‘ =0.75 ‘ =0.25 ‘ =0.75 ‘ =0.25 ‘ =0.75 10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.1 50 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9 Computing time and number of iterations for maximization of log-likelihood e L(θ ; k ), n = 66049. k size, GB e C, set up time, s. compute e L, s. maximizing, s. # iters 10 1 7 115 1994 13 20 1.7 11 370 5445 9 dense 38 42 657 ∞ - Moisture data. We used adaptive rank arithmetics with ε = 10 -4 for each block of e C and ε = 10 -8 for each block of e C -1 . Number of processing cores is 40. n compute e C e L e L T inverse Compr. time size time size kI - ( e L e L T ) -1 e Ck 2 time size kI - e C -1 e Ck 2 rate % sec. MB sec. MB sec. MB 10000 86% 0.9 106 4.1 109 7.7e-6 44 230 7.8e-5 30000 92.5% 4.3 515 25 557 1.1e-3 316 1168 1.1e-1 n = 512K , accuracy inside each block 10 -8 , matrix setup 261 sec., compression rate 99.98% (0.4 GB against 2006 GB). H-LU is done in 843 sec., required 5.8 GB RAM, inversion LU error 2 · 10 -3 . (1st) -L vs. ν ; (2nd) with nuggets {0.01, 0.005, 0.001} for Gaussian covariance, n = 2000, k = 14, σ 2 =1; (3rd) Zoom of 2nd figure; (4th) box-plots for ν vs number of locations n. R EFERENCES AND A CKNOWLEDGEMENTS [1] B. N. K HOROMSKIJ , A. L ITVINENKO , H. G. M ATTHIES , Application of hierarchical matrices for computing the Karhunen-Loéve expan-sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008. [2] Y. S UN , M. S TEIN, Statistically and computationally efficient estimating equations for large spatial datasets, JCGS, 2016, [3] A. L ITVINENKO , M. G ENTON , Y. S UN , D. K EYES , ??matrix techniques for approximating large covariance matrices and estimating its parameters, PAMM 16 (1), 731-732, 2016 [4] W. N OWAK , A. L ITVINENKO, Kriging and spatial design accelerated by orders of magnitude: combining low-rank covariance approximations with FFT-techniques, J. Mathematical Geosciences, Vol. 45, N4, pp 411-435, 2013. Work supported by SRI-UQ and ECRC, KAUST. Thanks to Ronald Kriemann for HLIBPro .