College of Saint Benedict and Saint John's University College of Saint Benedict and Saint John's University DigitalCommons@CSB/SJU DigitalCommons@CSB/SJU Honors Theses, 1963-2015 Honors Program 4-2015 Parallel Preconditioners for Finite Element Computations Parallel Preconditioners for Finite Element Computations Emily Furst College of Saint Benedict/Saint John's University Follow this and additional works at: https://digitalcommons.csbsju.edu/honors_theses Part of the Computer Sciences Commons, and the Mathematics Commons Recommended Citation Recommended Citation Furst, Emily, "Parallel Preconditioners for Finite Element Computations" (2015). Honors Theses, 1963-2015. 66. https://digitalcommons.csbsju.edu/honors_theses/66 This Thesis is brought to you for free and open access by DigitalCommons@CSB/SJU. It has been accepted for inclusion in Honors Theses, 1963-2015 by an authorized administrator of DigitalCommons@CSB/SJU. For more information, please contact [email protected].
58
Embed
Parallel Preconditioners for Finite Element Computations
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
College of Saint Benedict and Saint John's University College of Saint Benedict and Saint John's University
DigitalCommons@CSB/SJU DigitalCommons@CSB/SJU
Honors Theses, 1963-2015 Honors Program
4-2015
Parallel Preconditioners for Finite Element Computations Parallel Preconditioners for Finite Element Computations
Emily Furst College of Saint Benedict/Saint John's University
Follow this and additional works at: https://digitalcommons.csbsju.edu/honors_theses
Part of the Computer Sciences Commons, and the Mathematics Commons
Recommended Citation Recommended Citation Furst, Emily, "Parallel Preconditioners for Finite Element Computations" (2015). Honors Theses, 1963-2015. 66. https://digitalcommons.csbsju.edu/honors_theses/66
This Thesis is brought to you for free and open access by DigitalCommons@CSB/SJU. It has been accepted for inclusion in Honors Theses, 1963-2015 by an authorized administrator of DigitalCommons@CSB/SJU. For more information, please contact [email protected].
In order to solve the one dimensional heat equation using the Crank-Nicolson scheme of finite
element computations, a Matlab document was modified to fit our specific problem [6]. The
file used two other Matlab files that were obtained from the same source. One file completed
LU-factorization of a tridiagonal matrices and the other solved linear systems with LU-factored
matrices of coefficients. The exact approximation used by the Crank-Nicolson scheme is
− α
2∆x2uk+1i−1 + (
1
∆t+
α
2∆x2)uk+1i − α
2∆x2uk+1i+1 =
α
2∆x2uki−1 + (
1
∆t− α
2∆x2)uki +
α
2∆x2uki+1
18
where for our heat equation, α = 110 , all values on the left side of the equation are unknown, and
values on the right side are known. This equation is converted into the following linear system
a b 0 · · · 0
c a b · · · 0
0. . .
. . .. . . 0
......
0 0 · · · c a
uk+11
uk+12
...
uk+1N
=
d1
d2...
dN
where
a =1
∆t+
α
∆x2
b = c = − α
2∆x2
di = −cuki−1 + (1
∆t+ b+ c)uki − buki+1
[7]. The Matlab program then solves this system of equations using LU-factorization. The system
is iterated multiple times until the desired time step is reached. This program was run similarly to
the finite difference in that the same values for ∆x and ∆t were used. Further, the point u(3, 10)
was recorded for each discretization.
5.5 Developing a Method for Updating the deal.II Software
After developing an understanding of finite methods for solving partial differential equations, a
method for updating the deal.II software was developed. To begin, an example using Epetra
objects, and ML preconditioner, and an AztecOO solver was taken from the Trilinos examples
directory. The example generated a three dimensional Laplacian matrix and solved a test linear
problem obtained from the gallery object. The problem size could be changed to any value specified
(line 61 of MLAztecOO.cpp in Appendix) so long as the value was a perfect cube (as the problem
was dealing with a three dimensional matrix). The next step was to modify the program so that a
Belos solver was used instead of an AztecOO solver. This was a relatively simple replacement using
the Trilinos documentation for Belos available online as a guide. The next step was to determine
how to translate Epetra objects into Xpetra objects. Xpetra had a method to accomplish this
for vector objects. The vectors simply had to be wrapped as reference-counted pointers (RCPs)
before being passed into the method (lines 115-122 of MueLuBelos.cpp in Appendix). However,
the translation of the matrix was a bit more involved. First, the graph of the matrix (containing
the information describing locations of nonzero values) had to be copied into an empty Xpetra
19
object. Then, the values had to be copied one row at a time into the Xpetra object (lines 100-
111 of MueLuBelos.cpp in Appendix). Once this was accomplished, a MueLu preconditioner and
Belos solver could be set up to solve the linear problem. After the three examples were working
correctly, a timer was added to output the total runtime and solving time for each example. The
examples were named MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cpp.
In order to test these examples, the following time sizes were selected: 503, 1203, 1903, 2603,
3303, and 4003. These were selected based off of initial time trials to determine a range of problem
sizes that would obtain a wide range of total time results. The examples were compiled using a
basic Makefile obtained with the MLAztecOO.cpp file from the Trilinos examples directory. On
each problem size, each example was run 10 times on a node of Melchior. Two scatter plots, total
time and solve time plots, were then created on each problem size from these results.
6 Results
6.1 One-Dimensional Heat Equation Results
6.1.1 Analytical Solution
Figure 1: An overlay of the graph of the computed analytical solution at time, t = 0 and thegraph of the specified initial conditions, u(x, 0) = 6x− x2.
20
Figure 2: The graph of the computed analytical solution at time, t = 10.
Figure 3: The graph of the computed analytical solution over the period of time t = 0 to t = 25.
21
By analyzing Figure 1, it can be seen that the analytical solution to the one dimensional heat
equation obtained using separation of variables,
u(x, t) =
∞∑n=0
288
(2n+ 1)3π3e
−(2n+1)2π2t360 sin(
(2n+ 1)πx
6)
is correct at time, t = 0. Further, by looking at both Figure 2 and Figure 3, it can be seen that
the function is behaving as expected. That is, as there is not heat source or sink, it is expected
that the heat along the rod would dissipate over time.
6.1.2 Finite Difference Results
Figure 4: The graph of the solution computed using the FTCS scheme with ∆x = 1 and ∆t = 1at time t = 10.
22
Figure 5: The graph of the solution computed using the FTCS scheme with ∆x = 12 and ∆t = 1
at time t = 10.
Figure 6: The graph of the solution computed using the FTCS scheme with ∆x = 14 and ∆t = 1
4at time t = 10.
23
Figure 7: The graph of the solution computed using the FTCS scheme with ∆x = 18 and ∆t = 1
16at time t = 10.
Figure 8: The graph of the solution computed using the FTCS scheme with ∆x = 116 and ∆t = 1
64at time t = 10.
24
Figure 9: The graph of the solution computed using the FTCS scheme with ∆x = 132 and ∆t = 1
256at time t = 10.
Figure 10: The graph of the solution computed using the FTCS scheme with ∆x = 164 and
∆t = 11024 at time t = 10.
25
Figure 11: The graph compares the values of the analytical solution and the values of the solutionobtained using the finite difference method at x = 3 and t = 10, i.e. u(3, 10). The constant lineis the value obtained from the analytical solution and the line decreasing in value corresponds tothe values obtained from the finite difference method for decreasing ∆x.
26
Table 1: Results of Finite Difference∆x ∆t u(3, 10)
1 1 7.1165312 1 7.0833114
14 7.08334
18
116 7.06539
116
164 7.05089
132
1256 7.04207
164
11024 7.03723
Analytical - 7.03211
Having verified the analytical solution to the one dimensional heat equation, the accuracy
of the FTCS scheme of the finite difference method can be analyzed. Figures 4-10 show that
as both the number of time steps and the number of points on the problem space increase, the
approximation of the method converges to the analytical solution. It should be noted that this
method does not maintain zero temperature at the endpoints. However, as the problem is further
discretized, the end point values do tend back towards zero. Figure 11 shows this convergence by
plotting the values for u(3, 10) obtained from FTCS scheme against the value obtained from the
analytical solution. While the approximation achieved at ∆x = 164 and ∆t = 1
1024 is very close to
the analytical solution (roughly .05 off), it was not possible to further refine the problem and get a
better approximation without causing the Mathematica software to crash. Thus, while the FTCS
scheme of the finite difference method is simple and easy to implement, it is not a realistic method
when either a large problem is being solved or when a very close approximation is required.
27
6.1.3 Finite Element Method Results
Figure 12: The graph compares the values of the analytical solution and the values of the solutionobtained using the Crank Nicolson finite element method at x = 3 and t = 10, i.e. u(3, 10). Thesolutions are obtained from the Matlab implementation of this method. The constant line is thevalue obtained from the analytical solution and the line decreasing in value corresponds to thevalues obtained from the finite element method for decreasing ∆x.
28
Table 2: Results of Finite Element Method∆x ∆t u(3, 10)
1 1 7.0591112 1 7.039614
14 7.0340
18
116 7.0326
116
164 7.0322
132
1256 7.0321
164
11024 7.0321
Analytical - 7.03211
Figure 13: The graph compares the value of the analytical solution, the values of the solutionobtained using the Crank Nicolson finite element method, and the values of the solution obtainedusing the FTCS finite difference scheme at x = 3 and t = 10, i.e. u(3, 10). The solutions fromthe finite element method are obtained from the Matlab implementation of this method. Theconstant line is the value obtained from the analytical solution and the line decreasing in valuecorresponds to the values obtained from the finite element method for decreasing ∆x.
29
When looking at the results of the Crank-Nicolson scheme of the finite element method, specif-
ically Figure 13, it becomes clear that this method provides a much better approximation than
the finite difference approach. It converged to the solution much more quickly than the finite
difference method did. A coarser discretization of the problem was able to produce much better
results for u(3, 10) than the same discretization using FTCS. As a result, the Crank-Nicolson
scheme of the finite element method appears to be an adequate choice for problems where close
approximations are required.
6.2 Small Scale Example Results
Figure 14: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 503. The measured time is total time to run including problem setup andsolving. This plot shows the range in time performance for each file and the comparative time tocomplete between each file.
30
Figure 15: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 503. The measured time is the amount of time taken to solve the problem.Time to setup is not included. This plot shows the range in solving time performance for each fileand the comparative time to solve between each file.
Figure 16: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 1203. The measured time is total time to run including problem setup andsolving. This plot shows the range in time performance for each file and the comparative time tocomplete between each file.
31
Figure 17: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 1203. The measured time is the amount of time taken to solve the problem.Time to setup is not included. This plot shows the range in solving time performance for each fileand the comparative time to solve between each file.
Figure 18: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 1903. The measured time is total time to run including problem setup andsolving. This plot shows the range in time performance for each file and the comparative time tocomplete between each file.
32
Figure 19: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 1903. The measured time is the amount of time taken to solve the problem.Time to setup is not included. This plot shows the range in solving time performance for each fileand the comparative time to solve between each file.
Figure 20: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 2603. The measured time is total time to run including problem setup andsolving. This plot shows the range in time performance for each file and the comparative time tocomplete between each file.
33
Figure 21: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 2603. The measured time is the amount of time taken to solve the problem.Time to setup is not included. This plot shows the range in solving time performance for each fileand the comparative time to solve between each file.
Figure 22: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 3303. The measured time is total time to run including problem setup andsolving. This plot shows the range in time performance for each file and the comparative time tocomplete between each file.
34
Figure 23: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 3303. The measured time is the amount of time taken to solve the problem.Time to setup is not included. This plot shows the range in solving time performance for each fileand the comparative time to solve between each file.
Figure 24: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 4003. The measured time is total time to run including problem setup andsolving. This plot shows the range in time performance for each file and the comparative time tocomplete between each file.
35
Figure 25: A scatterplot of 10 time trials for MLAztecOO.cpp, MLBelos.cpp, and MueLuBelos.cppwith a problem size of 4003. The measured time is the amount of time taken to solve the problem.Time to setup is not included. This plot shows the range in solving time performance for each fileand the comparative time to solve between each file.
36
Next, we analyze the results of the proposed method of updating the deal.II software. Overall,
there is a trend of Belos performing slightly better as a solver than AztecOO. Further, MueLu
paired with Belos on every problem size performed better than the other two implementations in
terms of solving time. This trend can especially be seen in Figures 15, 17, 19, and 25. ML with
AztecOO and ML with Belos performed very similarly concerning both solve time and total time.
In general there was a trend of ML with Belos performing slightly better than ML with AztecOO
in terms of both solving and total time. However, Figures 21, 23 ,24, and 25 show some instances
in which ML with AztecOO performed better than ML with Belos on certain runs. These results
are in line with what was expected. However, concerning total time, MueLu paired with Belos took
almost five times as long to complete as either of the other two implementations. When looking at
the source code it became clear what was causing the significant increase in time. The translation
of the Epetra matrix to an Xpetra object (lines 104-110 of MueLuBelos.cpp in Appendix) is being
executed one row at a time in a for loop. As the problem sizes increase, this very quickly becomes
an issue. Some form of parallelism or a better method needs to be implemented to resolve this
issue before deal.II can realistically implement the proposed method of integrating MueLu into its
software.
7 Conclusion
1. Performance of Finite Methods
This work confirmed that finite numerical methods find adequate approximations to tradi-
tional analytical solutions to differential equations. Further, it can be noted that the finite
difference, forward in time scheme is not the most efficient method as the number of time
steps required rapidly increases as the discretization of the problem space becomes more re-
fined. This results in The finite element method using the Crank-Nicolson System converged
to the analytical solution much more quickly than the finite difference method and was able
to give a better approximation with a coarser discretization of the problem space than was
achieved using the finite difference method.
2. Proposed Method of Updating the deal.II Software
While, the small scale examples showed decrease in performance for total time when uti-
lizing MueLu over ML, I predict that with added parallelism in the set up phase and with
larger, more complex examples, time performance will increase. There is currently too much
overhead created by translating the objects into Xpetra objects. Specifically, I believe that
37
the current method of transferring the values of the matrix one row at a time is contributing
the most to the total run time. Looking at just the time used for solving the problem shows
that MueLu combined with Belos does perform better than ML and AztecOO together or
ML and Belos together.
This thesis explored methods of solving partial differential equations that can be implemented
with a computer. As computers cannot handle infinite amounts of points on a solution, it is
necessary to discretize the solution space and solve using a finite numerical method. This work
was able to show the accuracy of several such methods on the one-dimensional heat equation with
Dirichlet boundary conditions. Further, a method of updating the deal.II software was proposed
and its potential benefits were explored by analyzing time performance on small scale examples.
7.1 Future Work
In the future, a method should be developed for decreasing the overhead time created from trans-
lating the Epetra matrix to an Xpetra matrix. As the computation inside the current for loop
accomplishing this transition is independent, I believe that this could easily be parallelized in
order to increase performance. Further, if a parallel or distributed method were implemented in
the Xpetra class to translate Epetra or Tpetra matrices, I believe that even better performance
could be seen.
The next step along this course of work would be to implement the suggested changes in
the deal.II software. As shown in the small scale examples, the code needs to be written so as
to encapsulate or wrap the existing objects as Xpetra objects so that they are compatible with
MueLu preconditioners. As hardware improves in accordance with Moore’s Law, it is important to
focus on maintaining and improving software as new technologies emerge and hardware improves.
38
References
[1] The trilinos project. trilinos.org. both documentation of and source code for the Trilinos
Project.
[2] W. Bangerth, T. Heister, L. Heltai, G. Kanschat, M. Kronbichler, M. Maier, B. Turcksin,
and T. D. Young. The deal.ii library, version 8.1. arXiv preprint, http: // arxiv. org/ abs/
1312. 2266v4 , 2013.
[3] John R. Cannon. The One-Dimensional Heat Equation. Addison-Wesley Publishing Com-
pany, Inc., 1984.
[4] D. DeTurck. Math 241: Solving the heat equation. 2012. Course notes for Math 241 at
University of Pennsylvania.
[5] Gene H. Golub and Charles F. Van Loan. Matrix Computations. John Hopkins University
Press, 2012.
[6] G. Recktenwald. Heatcn.m. Matlab code implementing the Crank-Nicolson scheme for solving
the 1D heat equation.
[7] G. Recktenwald. Finite-difference approximations to the heat equation. 2011.
[8] Yousef Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied
Mathematics, 2003.
[9] P. Seshaiyar. Finite-difference method for the 1d heat equation. 2012. Course notes for
Math679 at George Mason University.
[10] Griffiths D.V. Margetts L. Smith, I.M. Programming the Finite Element Method. John Wiley
& Sons, 2013.
[11] R. Vichnevetsky. Computer Methods for Partial Differential Equations, Volume 1. Prentice-
Hall, Inc., 1981.
[12] Taylor R. L. Zhu J. Z. Zienkiewicz, O. C. Finite Element Method : Its Basis and Fundamen-