Символьное обращение плохо обусловленных матриц в Грид-среде сервисов доступа к системе компьютерной алгебры MAXIMa. Ill-conditioned matrices symbolic inversion in Desktop Grid of CAS Maxima services. Vladimir V. Voloshinov Center of Grid-technologies and Distributed Computing, Institute of System Analysis RAS, http://dcs.isa.ru Moscow, 2009 Supported by RFBR, grant #08-07-00430-а
23
Embed
Ill-conditioned matrices symbolic inversion in Desktop ...mmcp2009.jinr.ru/pdf/Voloshinov.pdf · # 4 CAS (Computer Algebra System) Maxima (1). Has been started in Massachusetts Institute
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Символьное обращение плохо обусловленных матриц в Грид-среде сервисов доступа к системе
компьютерной алгебры MAXIMa.
Ill-conditioned matrices symbolic inversion in Desktop Grid of CAS Maxima services.
Vladimir V. Voloshinov
Center of Grid-technologies and Distributed Computing, Institute of System Analysis RAS, http://dcs.isa.ru
Has been started in Massachusetts Institute of Technology, by prof. William Schelter. Since 1998 - GNU Public License.
The core - GCL (GNU Common Lisp), Windows, LinuxSingle-threaded Lisp interpreter
Has almost the same basic “symbolic” capabilities as Maple and Mathematica: differentiation&integration, series, ODE solving, matrices and linear algebra, polynomials, sets, lists, tensors...
Well-known example of ill-conditioned matrices. Condition number of HN is growing exponentially w.r.t N
Hilbert matrices
Matrices HN of the type: H N={hm , n}m=1, n=1N , N , где hm ,n=
1mn−1
hmn=∫0
1
tm−1⋅tn−1 dt
cond H N =∥H N∥⋅∥H N −1∥~e3.5⋅N
Values of cond(HN) for some N (calculated exactly in Maxima)
N cond(HN)
10 1.6⋅1013
50 1.5⋅1074
70 5.5⋅10104
100 4.1⋅10150
150 1.2⋅10227
«Traditionally», «well-conditioned» matrices should have cond less than 1000.
Gram matrix of the basic polynomials in L2[0,1]
∥AN×N∥= ∑m=1, n=1
N , N
Ai , j2
1 /2
# 10
Let M be [N×N] matrix. LU-factor {L, U, P}: L - lower-triangular; U - upper-triangular; P – permutation matrix. L⋅U=P⋅Μ .Let EN be unit matrix. To obtain M-1 - solve (by forward and backward substitution): L⋅U⋅X=P⋅EN ⇒ X=M -1
Maxima has function lu_factor(M) (Gaussian elimination) and lu_backsub(LU,B) (LU — returned lu_factor(M)) to solve L⋅U⋅X=P⋅B for any rectangular matrix B[N×M]. invert_by_lu(M):=lu_backsub( lu_factor(M), EN)If EN is sliced vertically into K submatrices
,
then parallel call of lu_backsub(LU, ) gives K set of inverse columns M-1[nk-1:nk].
K=N means parallel calculations of M -1 columns.
2 31 1 2 11:1: 1: 1:Kn nn n n n NN N N N N
−++ + = E E E E E 0 1 2 3 11 K Kn n n n n n N−= ≤ < < < < < =
1 1:k kn nN
− +E
Possible speedup reasoning for «LU-inversion» of HN (1)
# 11
Possible speedup reasoning for «LU-inversion» of HN (2)
Durations of two phases of inversion : LU-factorization (Gaussian elimination) and calculation of inverse matrix's columns.
Load balancing implementation at Phase II (inv. columns calc)
k1……….
….
….
….
….
4. return
5. remove(k1)
……….
Maxima services
Client application
Worker threads
Thread-safe task queue
It turned out, that this heuristic gives ~1.5 times more than optimal schedule of Phase II (in deterministic post facto task assignment problem formulation)
# 19
Results of tests of distributed inversion via LU-decomposition
«Collection» of inverse matrix's columns (as files) takes ~ 2.5% of total scenario duration (including network data transmission)
Speedup of invert_by_lu @ DesktopGrid
300250200
150100
0
200
400
600
800
1000
1200
1400
1600
1800
1 2 3 4 5N
sec
Ninv@G@G+collect
220% - 230%
# 20
Blocks processing provides more flexible for subsequent parallel and recursive algorithm because calculation of A-1 , S-1 and matrices multiplications may be parallelized as well.
Speedup evaluation at the next slide.
Matrix inversion by Schur complement (1)
There is another inversion approach based on «block decomposition» and Schur complement
# 21
Let M[N×N] be divided into four [N/2×N/2] blocks. The cost of inverse matrix blocks' “parallel” calculation (symbol «||») may be evaluated as follows.
Matrix inversion by Schur complement (2)
A-1 => ~O((N/2)3)VA-1 || A-1U => ~O((N/2)3)
VA-1 U => ~O((N/2)3)
B -VA-1 U => ~O((N/2)2)
S-1 => ~O((N/2)3)
S-1(VA-1) || (A-1U)S-1 => ~O((N/2)3)
A-1US-1VA-1 => ~O((N/2)3)
A-1+A-1US-1VA-1 => ~O((N/2)2)
So, speedup: 4 N 3
3 N 32 N 2≈43
(130% for large N)
1−A
( ) ( )1 1 1− − −=VA U V A U VA U
1−A U-1VA
1−= −S B VA U
-1S
( )1−-1S VA ( )1− -1A U S
( ) ( ) ( ) ( )1 1 1− − −=-1 -1 -1 -1 -1 -1A US VA A US VA A U S VA
1 1− −+ -1 -1A A US VA
# 22
Speedup of "four-block" inversion (210% - 230%)
301
150155
100250 300
400 400350
200
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
1 2 3 4 5 6 7 8 9 10Experiments, {N,dim(A)}
sec,
N
NinvSchur
Preliminary results of Schur complement approach (standalone «simulation»)