Using LU Decomposition to Optimize the modconcen.m Routine Matt Tornowske April 1, 2002
Dec 27, 2015
Review of Gaussian Elimination Find the augmented matrix of the system of
linear equations. Find an echelon form of the augmented matrix
using elementary row operations. Find the system of equations corresponding to
the echelon form. Use back substitution to arrive at the solution.
Example using Gaussian Elimination
Solve the following system of linear equations using the methodof Gaussian Elimination:
Starting with the augmented matrix create zeros below the pivot in the first column:
Gaussian Elimination : Echelon Form
At this stage we create a zero only below the pivot:
The corresponding system of equations is:
Gaussian Elimination : Back Substitution
We now solve the system using back substitution.
Since we have x4 = 2 we substitute this into the row above:
Substituting x4 = 2 and x3 = -5 into the first equation:
And we’re left with:
Computational Cost of Gaussian Elimination for an n x n Matrix
For multiplication:
For addition:
Total operations:
For 1532 x 1532 matrix:
Method of LU Decomposition Find the LU decomposition of the
matrix Solve LY=B using forward
substitution Solve UX=Y by back substitution Hopefully, we save computational
time
Finding the LU Factors The LU Factorization is done by finding
the Upper and Lower triangle matrices of an original matrix.
The upper matrix, U, is the reduced echelon form of the original matrix, A.
The lower matrix, L, is the inverse of the elementary matrices obtained from the operations to obtain U.
LU decomposition: Getting U
Solve the following system of equations using LU decomposition:
First reduce the associated matrix into upper triangular form:
LU Decomposition: Elementary Matrices
The elementary matrices that correspond to these row operations are:
The inverses of these matrices are:
LU Decomposition: Solving LUX = B
We now solve the given system LUX = B by solving the two subsystems LY = B and UX = Y:
This lower triangular system has solution y1 = -1, y2 = 7, y3 = 2.
The solution to the upper triangular system is x1 = 5, x2 = -8, x3 = -1, which happens to be the solution to A.
Computational Cost of LU Decomposition for an n x n Matrix
Total number of arithmetic operations to solve a system of equations using LU decomposition is exactly the same as Gaussian Elimination.
The method is the most efficient method available for solving many systems all having the same matrix of coefficients, when the right hand sides are not all available in advance.
Since the factorization only has to be done once, it uses (4n3 – 3n2 – n)/6 operations.
The solutions of the two triangular systems can be carried out in 2n2 – n operations compared to (4n3 + 9n2 – 7n)/6 operations for full Gaussian elimination each time.
Ideally, for 1528 x 1528 matrix, this is a savings of nearly 99.8% per system!
Permutation Matrix Our LU decomposition won’t work
properly if there are any pivots needed We can use the lu() function in Matlab to
create a permutation matrix P to keep track of these pivots
We’ll simply change the algorithm slightly to solve LY=PB instead of LY=B
Implementation in Matlab We use the Matlab lu() function to
give us our L, U, and P matrices We will adjust the code in Jennifer’s
modified concentration routine to use LU factorization.
Modified Concentration Routine with LU Implemented
% do the LU factorization here.[L,U,P]=lu(w);clear w;for m=1:ntimes m rhs = zeros(1528,1); for k = 1:1528 N = nb(k,find((nbor(k,:) == 1))); W = nb(k,find((nbor(k,:) == 3))); S = nb(k,find((nbor(k,:) == 5))); E = nb(k,find((nbor(k,:) == 7))); if(~isempty(N) & ~isempty(W) & ~isempty(S) &
~isempty(E))
In the last outer for block (modifications in red):
Modified Routine, continued
ux = scaledvelocity(k,1); uy = scaledvelocity(k,2); rhs(k,1) = -concentration(k,m)/deltat; else rhs(k,1) = 0; end end %xjen=w\rhs; replace this line with %backward/forward substitution rhs=P*rhs; xjen2=L\rhs; xjen2=U\xjen2;
Modified Routine, continued
concentration(:,m+1)=xjen2;end
Note: We use the left matrix divide instead ofinverse operation. While some improvement canBe gained by performing some inversions outsideof the for loop, the increase in speed can be faroffset in the larger memory requirements.
Performance Gains on Pentium 4 1400 Mhz with 384 MB of RAM
With deltat=0.5, old routine took 73 seconds while the new routine took 11 seconds
Adjusting deltat for 0.25, old = 145 seconds, new = 17 seconds
Deltat for .1, old = 350 seconds, new = 38 seconds