Laplace-Example with MPI & PETSc Slide 1 Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner [email protected]University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Laplace-Example with MPI and PETSc MPI PET 1 Rolf Rabenseifner Laplace-Example with MPI & PETSc Slide 2 / 58 Höchstleistungsrechenzentrum Stuttgart Laplace Example • Compute steady temperature distribution for given temperatures on a boundary • i.e., solve Laplace partial differential equation (PDE) –∆u(x,y) = –[ u(x,y) + u(x,y) ] = 0 on Ω⊂ R 2 • with boundary condition u(x,y) = φ(x,y) on ∂Ω • area Ω = [x min ,x max ]x[y min ,y max ] • Compare: – Chap. [6] A Heat-Transfer Example with MPI – Explicit time-step integration of the the un steady heat conduction ∂u/∂t = ∆u ∂ 2 ∂x 2 ∂ 2 ∂y 2
29
Embed
Laplace-Example with MPI and PETSc · 42. — Laplace-Example with MPI and PETSc — 42. 42-6 Laplace-Example with MPI & PETSc Slide 11 Höchstleistungsrechenzentrum Stuttgart Rolf
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
42. — Laplace-Example with MPI and PETSc — 42.42-1
Laplace-Example with MPI & PETScSlide 1 Höchstleistungsrechenzentrum Stuttgart
• Can be calculated in the receiving processusing the halo_pos (=local index)and the corresponding global columnminus start-value in the sending process.
• m,n dimensions of the physical problem• i,j index in physics (0..m–1, 0..n–1 – without boundary)• I global row index in Laplace matrix and vector (0 .. nm–1)• J global column index in the Laplace matrix (0 .. nm–1)
– process-local data: start .. end1–1
• I_loc local row index in Laplace matrix and vector (and in halo)• J_loc local column in Laplace matrix
• Your working directory: ~/CG/<nr>• Choose your task: <task>• Fetch your skeleton: cp ~/CG/skel/cg_<task>.c .• Add your code, compile, run and test it (correct result?, same as serial result?)• If your task works:
– extract your part (from /*== task_ii begin ==*/ to /*== task_ii end ==*/ )into cgp<task>.c
• Advanced exercise: Implement the communication-optimized distribution– in a copy of your cg_<task>.c – compare execution time: 1-dim decomposition / 2-dim load optimal / 2-dim comm.-opt.
• When all groups have finished, everyone can check the total result with:– ls –l ../*/cgp*.c– cat ../00/cgp00.c ../*/cgp01.c ../*/cgp02.c ../*/cgp03.c ../*/cgp04.c ../*/cgp05.c
../*/cgp12.c ../00/cgp13.c ../00/cgp14.c ../*/cgp15.c ../00/cgp16.c > cg_all.c– duplicate parts must be selected by hand (<nr> instead of *)– missing parts may be fetched also from ../source/parts/cgp<task>.c– Compile and run cg_all.c
Do not modify any lines outside of your
task segment
42. — Laplace-Example with MPI and PETSc — 42.42-16
• Compile-time options [default]:–Dserial — compile without MPI and without distribution [parallel]
• Run-time options [default]:–m <m> — vertical dimension of physical heat area [4]–n <n> — horizontal dimension … [4]–imax <iter_max> — maximum number of iterations in the CG solver [500]–eps <epsilon> — abort criterion of the solver for residual vector [1e-6]–twodims — choose 2-dimensional domain decomposition [1-dim]–mprocs <m_procs> — choose number of processors, vertical, (–twodims needed)–nprocs <n_procs> — … and horizontal [given by MPI_Dims_create]–prtlev 0|1|2|3|4|5 — printing and debug level [1]:
1 = only || result – exact solution || and partial result matrix2 = and residual norm after each iteration3 = and result of physical heat matrix4 = and all vector and matrix information in 1st iteration5 = and in all iterations
Initialization of matrix A55: /* When using MatCreate(), the matrix format can be specified at runtime.
Also, the parallel partitioning of the matrix is determined by PETSc at runtime.Performance tuning note: For problems of substantial size, preallocation of matrix memory is crucial for attaining good performance. Since preallocation is not possible via the generic matrix creation routine MatCreate(), we recommend for practical problems instead to use the creation routine for a particularmatrix format, e.g., MatCreateMPIAIJ() – parallel AIJ (compressed sparse row)
MatCreateMPIBAIJ() – parallel block AIJSee the matrix chapter of the users manual for details. */
69: MatCreate(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n,&A); 70: MatSetFromOptions(A); 73: /* Currently, all PETSc parallel matrix formats are partitioned by contiguous chunks of rows
across the processors. Determine which rows of the matrix are locally owned. */
77: MatGetOwnershipRange(A,&Istart,&Iend); 92: for (I=Istart; I<Iend; I++) 93: v = –1.0; i = I/n; j = I – i*n; 94: if (i>0) J = I – n; MatSetValues(A, 1,&I, 1,&J, &v, INSERT_VALUES); 95: if (i<m–1) J = I + n; MatSetValues(A, 1,&I, 1,&J, &v, INSERT_VALUES); 96: if (j>0) J = I – 1; MatSetValues(A, 1,&I, 1,&J, &v, INSERT_VALUES); 97: if (j<n–1) J = I + 1; MatSetValues(A, 1,&I, 1,&J, &v, INSERT_VALUES); 98: v = 4.0; MatSetValues(A,1,&I,1,&I,&v,INSERT_VALUES);
– We form 1 vector from scratch and then duplicate as needed.– When using VecCreate(), VecSetSizes and VecSetFromOptions()
in this example, we specify only thevector's global dimension; the parallel partitioning is determined at runtime.
– When solving a linear system, the vectors and matrices MUSTbe partitioned accordingly. PETSc automatically generatesappropriately partitioned matrices and vectors when MatCreate()and VecCreate() are used with the same communicator.
– The user can alternatively specify the local vector and matrixdimensions when more sophisticated partitioning is needed(replacing the PETSC_DECIDE argument in the VecSetSizes() statementbelow).
145: PetscOptionsHasName(PETSC_NULL,"–random_exact_sol",&flg); 146: if ( ! flg)
VecGetOwnershipRange(b,&Istart,&Iend); h = 1.0 / (m+1);for (I=Istart; I<Iend; I++)
v = 0; i = I/n; j = I – i*n; h = 1/(m+1);if (i==0) v = v + /* u(-1,j): */ h * 0;if (i==m-1) v = v + /* u(m,j): */ h * (m+1);if (j==0) v = v + /* u(i,-1): */ h * (i+1);if (j==n-1) v = v + /* u(i, n): */ h * (i+1);if (v != 0) ; VecSetValues(b,1,&I,&v,INSERT_VALUES);v = /* u(i, j): */ h * (i+1); VecSetValues(u,1,&I,&v,INSERT_VALUES);
173: /* - - - - - - Create the linear solver and set various options - - - - - - - */177: /* Create linear solver context */179: SLESCreate(PETSC_COMM_WORLD,&sles); 182: /* Set operators. Here the matrix that defines the linear system
also serves as the preconditioning matrix. */185: SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTERN); 188: /* Set linear solver defaults for this problem (optional).
– By extracting the KSP (Krylov subspace methods) and PC (Preconditioner) contexts fromthe SLES context, we can then directly call any KSP and PC routines to set various options.
– The following two statements are optional; all of these parameters could alternatively be specified at runtime via SLESSetFromOptions(). All of these defaults can be overridden at runtime, as indicated below. */
29: #include petscda.h251: PetscOptionsHasName(PETSC_NULL,"–view_sol_x",&flg); 252: if (flg) /* view solution grid in an X window */ 253: PetscScalar *xx; DA da;
283: /* Check the error */285: VecAXPY(&neg_one,u,x); 286: VecNorm(x,NORM_2,&norm); 287: /* Optional: Scale the norm: norm *= sqrt(1.0/((m+1)*(n+1))); */
290: /* Print convergence information. PetscPrintf() produces a single print statement from all processes that share a communicator.An alternative is PetscFPrintf(), which prints to a file. */
294: PetscPrintf(PETSC_COMM_WORLD,"Norm of error %A iterations %d\n",norm,its);
297: /* Free work space. All PETSc objects should be destroyed when they are no longer needed. */
305: /* Always call PetscFinalize() before exiting a program. This routine– finalizes the PETSc libraries as well as MPI– provides summary and diagnostic information if certain runtime
options are chosen (e.g., –log_summary). */310: PetscFinalize();
42. — Laplace-Example with MPI and PETSc — 42.42-22
3: static char help[ ] = "Solves a linear system in parallel with SLES: Compute steady \n4: temperature distribution for given temperatures on a boundary.\n5: Input parameters include: \n6: –random_exact_sol : use a random exact solution vector \n7: –view_exact_sol : write exact solution vector to stdout \n8: –view_sol_serial : write solution grid to stdout (1 item/line) \n9: –view_sol : write solution grid to stdout (as matrix) \n
10: –view_sol_x –draw_pause 3 : view solution x on a X window \n11: –view_mat_x –draw_pause 3 : view matrix A on a X window \n12: –m <mesh_x> : number of mesh points in x-direction \n13: –n <mesh_y> : number of mesh points in y-direction \n";...46: PetscInitialize(&argc, &args, (char *)0, help);
–ksp_rtol <rtol> convergence criterion set by the program to 1.e-2/((m+1)*(n+1))
–pc_type <type> e.g., bjacobi (BlockJacobi), asm (Additive Schwarz)–sub_pc_type <type> e.g., jacobi (Block Jacobi), sor (SOR), ilu (Incomplete LU)
–ksp_monitor prints an estimate of the l2 -norm of the residual at each iteration–sles_view prints information on chosen KSP (solver) and PC (preconditioner)–log_summary prints statistical data–options_table prints all used options–options_left prints options table and unused options
42. — Laplace-Example with MPI and PETSc — 42.42-23
1 #! /bin/csh2 # 3 # Sample script: Experimenting with linear solver options.4 # Can be used with, e.g., petsc/src/sles/examples/tutorials/ex2.c5 # or heat_petsc.c6 #7 set appl='./heat_petsc' # path of binary8 set options='–ksp_monitor –sles_view –log_summary –options_table –options_left
–m 10 –n 10' 9 set num='0' 10 foreach np (1 2 4 8) # number of processors11 foreach ksptype (gmres bcgs tfqmr) # Krylov solver12 set pctypes_parallel='bjacobi asm' # parallel preconditioners13 set pctypes_serial='ilu' # non-parallel preconditioners14 if ($np == 1) then15 set pctype_list="$pctypes_serial $pctypes_parallel"16 else17 set pctype_list="$pctypes_parallel"18 endif 19 foreach pctype ($pctype_list)20–49 ... (see next slide)50 end #for pctype51 end #for ksptype52 end #for np
The PETSc Makefile System is located in $PETSC_DIR/bmake.This directory has subdirectories for each supported platform.If you want to customize your installation you have to edit the following files:
• $PETSC_DIR/bmake/$PETSC_ARCH/packages– locations of all needed packages
• $PETSC_DIR/bmake/$PETSC_ARCH/variables– definitions of compilers, linkers, etc.
42. — Laplace-Example with MPI and PETSc — 42.42-27
# $Id: packages,v 1.63 2001/10/10 18:50:03 balay Exp $ # This file contains site-specific information. The definitions below# should be changed to match the locations of libraries at your site.# The following naming convention is used:# XXX_LIB - location of library XXX# XXX_INCLUDE - directory for include files needed for library XXX# Location of BLAS and LAPACK.# See $PETSC_DIR/docs/intallation.html for information on # retrieving them.# BLASLAPACK_LIB = -L/home/petsc/software/blaslapack/linux
# ----------------------------------------------------------------------------------------# Locations of OPTIONAL packages. Comment out those # you do not have.# ----------------------------------------------------------------------------------------# Location of X-windows softwareX11_INCLUDE =X11_LIB = -L/usr/X11R6/lib -lX11PETSC_HAVE_X11 = -DPETSC_HAVE_X11# Location of MPE# If using MPICH version 1.1.2 or higher use the flag #DPETSC_HAVE_MPE_INITIALIZED_LOGGING#MPE_INCLUDE = -I/home/petsc/mpich-1.1.1/mpe#MPE_LIB = -L/home/petsc/mpich-1.1.1/lib/LINUX/ch_p4
Summary• Laplace equation: –∆u(x,y) = 0 on Ω ⊂ R2 with Ω = [xmin,xmax]x[ymin,ymax]• Boundary condition: u(x,y) given on ∂Ω• Discretization: – ui–1,j – ui,j–1 + 4ui,j – ui,j+1 – ui+1,j = 0 for i=0…m-1, j=0…n-1• 4 Boundaries: i= –1, i=m, j= –1, j=m• New ordering: (i,j)i=0..m-1, j=0..n-1 I= 0..mn-1• Matrix equation: Au = b, A=sparse matrix, b=based on u on ∂Ω, u=solution on Ω–∂Ω• Example with n=m=4, with solution & boundary u(x,y) := x• Linear Equation Solver (SLES) with PETSc
– MatSetValues(A, 1,&I, 1,&J, &v, INSERT_VALUES);– VecSetValues(b,1,&I,&v,INSERT_VALUES);– SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTERN);– SLESSolve(sles,b,x,&its); x is the solution vector, ordered with I= 0..mn-1– printing x in ordering (i,j)i=0..m-1, j=0..n-1 (transposed)
• If you want to compile:– cp ../source/heat_petsc.c ../source/Makefile ./– setenv PETSC_DIR ... or export PETSC_DIR=...– setenv PETSC_ARCH ... or export PETSC_ARCH=...– make BOPT=O heat_petsc
42. — Laplace-Example with MPI and PETSc — 42.42-29