The Portable Extensible Toolkit for Scientific Computing Matthew Knepley Mathematics and Computer Science Division Computation Institute Argonne National Laboratory University of Chicago PETSc Tutorial Jackson School of Geosciences University of Texas at Austin, TX September 7, 2011 M. Knepley (UC) PETSc JSG ’11 1 / 199
250
Embed
The Portable Extensible Toolkit for Scientific Computingmk51/presentations/TutorialTACC2009.… · Getting Started with PETSc What is PETSc? The Role of PETSc Developing parallel,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ThePortable Extensible Toolkit for Scientific Computing
Matthew Knepley
Mathematics and Computer Science Division Computation InstituteArgonne National Laboratory University of Chicago
PETSc TutorialJackson School of Geosciences
University of Texas at Austin, TX September 7, 2011
M. Knepley (UC) PETSc JSG ’11 1 / 199
Getting Started with PETSc
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
2 PETSc Integration
3 Common PETSc Usage
4 Advanced PETSc
5 Future Plans
6 Conclusions
M. Knepley (UC) PETSc JSG ’11 2 / 199
Getting Started with PETSc What is PETSc?
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 3 / 199
Getting Started with PETSc What is PETSc?
Unit Objectives
Introduce PETSc
Download, Configure, Build, and Run an Example
Empower students to learn more about PETSc
M. Knepley (UC) PETSc JSG ’11 4 / 199
Getting Started with PETSc What is PETSc?
What I Need From You
Tell me if you do not understandTell me if an example does not workSuggest better wording or figuresFollowup problems at [email protected]
Developing parallel, nontrivial PDE solvers thatdeliver high performance is still difficult and re-quires months (or even years) of concentratedeffort.
PETSc is a toolkit that can ease these difficul-ties and reduce the development time, but it isnot a black-box PDE solver, nor a silver bullet.— Barry Smith
You want to think about how you decompose your datastructures, how you think about them globally. [...] If youwere building a house, you’d start with a set of blueprintsthat give you a picture of what the whole house lookslike. You wouldn’t start with a bunch of tiles and say.“Well I’ll put this tile down on the ground, and then I’llfind a tile to go next to it.” But all too many people try tobuild their parallel programs by creating the smallestpossible tiles and then trying to have the structure oftheir code emerge from the chaos of all these littlepieces. You have to have an organizing principle ifyou’re going to survive making your code parallel.
Portable to any parallel system supporting MPI, including:Tightly coupled systems
Cray XT6, BG/Q, NVIDIA Fermi, K ComputerLoosely coupled systems, such as networks of workstations
IBM, Mac, iPad/iPhone, PCs running Linux or Windows
PETSc HistoryBegun September 1991Over 60,000 downloads since 1995 (version 2)Currently 400 per month
PETSc Funding and SupportDepartment of Energy
SciDAC, MICS Program, AMR Program, INL Reactor ProgramNational Science Foundation
CIG, CISE, Multidisciplinary Challenge Program
M. Knepley (UC) PETSc JSG ’11 13 / 199
Getting Started with PETSc What is PETSc?
Timeline
1991 1995 2000 2005 2010
PETSc-1
MPI-1MPI-2
PETSc-2 PETSc-3Barry
BillLois
SatishDinesh
HongKrisMatt
VictorDmitry
LisandroJedShri
Peter
M. Knepley (UC) PETSc JSG ’11 14 / 199
Getting Started with PETSc What is PETSc?
The PETSc Team
Bill Gropp Barry Smith Satish Balay
Jed Brown Matt Knepley Lisandro Dalcin
Hong Zhang Mark Adams Toby IssacM. Knepley (UC) PETSc JSG ’11 15 / 199
Getting Started with PETSc Who uses PETSc?
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 100
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 200
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 300
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 400
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 500
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 600
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 700
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Vortex Methodt = 800
Incompressible FlowGaussian vortex blobsHigh Re
PetFMM2D/3D domainsAutomatic load balancingVariety of kernelsOptimized with templates
PetRBFVariety of RBFsUses PETSc solversScalable preconditioner
ParallelismMPIGPU
a
aCruz, Yokota, Barba, Knepley
M. Knepley (UC) PETSc JSG ’11 26 / 199
Getting Started with PETSc Who uses PETSc?
Gravity Anomaly Modeling
Potential SolutionKernel of inverse problemNeeds optimal algorithm
ImplementationsDirect SummationFEMFMM
ParallelismMPI4000+ coresAll methods scalable
a
aMay, Knepley
M. Knepley (UC) PETSc JSG ’11 27 / 199
Getting Started with PETSc Who uses PETSc?
FEniCS-AppsRheagen
RheologiesMaxwellGrade 2Oldroyd-B
StabilizationDGSUPGEVSSDEVSSMacroelement
AutomationFIAT (elements)FFC (weak forms)
a
aTerrel
M. Knepley (UC) PETSc JSG ’11 28 / 199
Getting Started with PETSc Who uses PETSc?
FEniCS-AppsRheagen
RheologiesMaxwellGrade 2Oldroyd-B
StabilizationDGSUPGEVSSDEVSSMacroelement
AutomationFIAT (elements)FFC (weak forms)
a
aTerrel
M. Knepley (UC) PETSc JSG ’11 28 / 199
Getting Started with PETSc Who uses PETSc?
Real-time Surgery
Brain SurgeryElastic deformationOverlaid on MRIGuides surgeon
Laser Thermal TherapyPDE constrainedoptimizationPer-patient calibrationThermal inverse problem a
aWarfield, Ferrant, et.al.
M. Knepley (UC) PETSc JSG ’11 29 / 199
Getting Started with PETSc Who uses PETSc?
Real-time Surgery
Brain SurgeryElastic deformationOverlaid on MRIGuides surgeon
Laser Thermal TherapyPDE constrainedoptimizationPer-patient calibrationThermal inverse problem
frastructure [1, 6] inherent to the control system relies critically on the precise real-time orchestration of
large-scale parallel computing, high-speed data transfer, a diode laser, dynamic imaging, visualizations,
inverse-analysis algorithms, registration, and mesh generation. We demonstrated that this integrated tech-
nology has significant potential to facilitate a reliable minimally invasive treatment modality that delivers
a precise, predictable and controllable thermal dose prescribed by oncologists and surgeons. However, MR
guided LITT (MRgLITT) has just recently entered into patient use [4] and substantial translational research
and validation is needed to fully realize the potential of this technology [20, 23] within a clinical setting. The
natural progression of the computer driven MRgLITT technology will begin with prospective pre-treatment
planning. Future innovations on the delivery side will likely involve combining robotic manipulation of fiber
location within the applicator as well as multiple treatment applicators firing simultaneously.
2D Slice
Catheter Entry
Prostate
Thermal Field
Skin
Figure 1: 3D volume rendering of in vivo MR-guided LITT delivery in a canine model of prostate. Contrastenhanced T1-W MR images have been volume rendered to better visualize the relationship of the targetvolume and applicator trajectory to the surrounding anatomy.. As displayed, the subject was stabilized inthe supine position with legs upward. A stainless steel stylet was used to insert the laser catheter consistingof a 700 µm core diameter, 1 cm di!using-tip silica fiber within a 2mm diameter water-cooled catheter (lightblue cylinder). A volume rendering of the multi-planar thermal images (in degrees Celsius) is registered andfused with the 3D anatomy to visualize the 3D volume of therapy while an axial slice cut from the principletreatment plane demonstrates a 2D representation of the local heating in that slice. The full field of viewshown is 240mm x 240mm (scale on image in mm).
2
a
aFuentes, Oden, et.al.
M. Knepley (UC) PETSc JSG ’11 29 / 199
Getting Started with PETSc Stuff for Windows
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 30 / 199
Getting Started with PETSc Stuff for Windows
Questions for Windows Users
Have you installed cygwin?Need python, make, and build-utils packages
Will you use the GNU compilers?If not, remove link.exeIf MS, check compilers from cmd window and use win32fe
Which MPI will you use?You can use -with-mpi=0If MS, need to install MPICH2If GNU, can use -download-mpich
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 32 / 199
Getting Started with PETSc How can I get PETSc?
Downloading PETSc
The latest tarball is on the PETSc site:http://www.mcs.anl.gov/petsc/download
There is a Debian package (aptitude install petsc-dev)
The full development repository is open to the publichttps://bitbucket.org/petsc/petsc/
Why is this better?You can clone to any release (or any specific ChangeSet)You can easily rollback changes (or releases)You can get fixes from us the same day
Just clone development repositorygit clone http://bitbucket.org/petsc/petsc.gitgit clone -rv3.4.4 petsc petsc-3.4.4
or
Unpack the tarballtar xzf petsc.tar.gz
M. Knepley (UC) PETSc JSG ’11 35 / 199
Getting Started with PETSc How can I get PETSc?
Exercise 1
Download and Unpack PETSc!
M. Knepley (UC) PETSc JSG ’11 36 / 199
Getting Started with PETSc How do I Configure PETSc?
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 37 / 199
Getting Started with PETSc How do I Configure PETSc?
Configuring PETSc
Set $PETSC_DIR to the installation root directoryRun the configuration utility
Getting Started with PETSc How do I Configure PETSc?
Exercise 2
Configure your downloaded PETSc.
M. Knepley (UC) PETSc JSG ’11 41 / 199
Getting Started with PETSc How do I Build PETSc?
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 42 / 199
Getting Started with PETSc How do I Build PETSc?
Building PETSc
There is now One True Way to build PETSc:makemake install if you configured with --prefixCheck build when done with make test
Can build multiple configurationsPETSC_ARCH=linux-fast makeLibraries are in $PETSC_DIR/$PETSC_ARCH/lib/
Complete log for each build is in logfile./$PETSC_ARCH/conf/make.logALWAYS send this with bug reports
Getting Started with PETSc How do I run an example?
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 46 / 199
Getting Started with PETSc How do I run an example?
Can create new communicators by splitting existing onesEvery PETSc object has a communicatorSet PETSC_COMM_WORLD to put all of PETSc in a subcomm
Point-to-point communicationHappens between two processes (like in MatMult())
Reduction or scan operationsHappens among all processes (like in VecDot())
M. Knepley (UC) PETSc JSG ’11 50 / 199
Getting Started with PETSc How do I run an example?
Alternative Memory Models
Single process (address space) modelOpenMP and threads in generalFortran 90/95 and compiler-discovered parallelismSystem manages memory and (usually) thread schedulingNamed variables refer to the same storage
Single name space modelHPF, UPCGlobal ArraysTitaniumVariables refer to the coherent values (distribution is automatic)
Distributed memory (shared nothing)Message passingNames variables in different processes are unrelated
M. Knepley (UC) PETSc JSG ’11 51 / 199
Getting Started with PETSc How do I run an example?
Common Viewing Options
Gives a text representation-vec_view
Generally views subobjects too-snes_view
Can visualize some objects-mat_view draw::
Alternative formats-vec_view binary:sol.bin:, -vec_view ::matlab,-vec_view socket
Sometimes provides extra information-mat_view ::ascii_info, -mat_view::ascii_info_detailed
Use -help to see all options
M. Knepley (UC) PETSc JSG ’11 52 / 199
Getting Started with PETSc How do I run an example?
Common Monitoring Options
Display the residual-ksp_monitor, graphically -ksp_monitor_draw
Can disable dynamically-ksp_monitors_cancel
Does not display subsolvers-snes_monitor
Can use the true residual-ksp_monitor_true_residual
Can display different subobjects-snes_monitor_residual, -snes_monitor_solution,-snes_monitor_solution_update-snes_monitor_range-ksp_gmres_krylov_monitor
Can display the spectrum-ksp_monitor_singular_value
M. Knepley (UC) PETSc JSG ’11 53 / 199
Getting Started with PETSc How do I run an example?
include $PETSC_DIR/conf/variablesinclude $PETSC_DIR/conf/rules
To get the project ready-madehg clone http://petsc.cs.iit.edu/petsc/tutorials/SimpleTutorial newsim
M. Knepley (UC) PETSc JSG ’11 55 / 199
Getting Started with PETSc How do I get more help?
Outline
1 Getting Started with PETScWhat is PETSc?Who uses PETSc?Stuff for WindowsHow can I get PETSc?How do I Configure PETSc?How do I Build PETSc?How do I run an example?How do I get more help?
M. Knepley (UC) PETSc JSG ’11 56 / 199
Getting Started with PETSc How do I get more help?
Fundamental objects representingsolutionsright-hand sidescoefficients
Each process locally owns a subvector of contiguous global data
M. Knepley (UC) PETSc JSG ’11 67 / 199
PETSc Integration Vector Algebra
Vector Algebra
How do I create vectors?
VecCreate(MPI_Comm, Vec*)
VecSetSizes(Vec, PetscIntn, PetscInt N)
VecSetType(Vec, VecType typeName)
VecSetFromOptions(Vec)
Can set the type at runtime
M. Knepley (UC) PETSc JSG ’11 68 / 199
PETSc Integration Vector Algebra
Vector Algebra
A PETSc Vec
Supports all vector space operationsVecDot(), VecNorm(), VecScale()
Has a direct interface to the valuesVecGetArray(), VecGetArrayF90()
Has unusual operationsVecSqrtAbs(), VecStrideGather()
Communicates automatically during assemblyHas customizable communication (PetscSF,VecScatter)
M. Knepley (UC) PETSc JSG ’11 69 / 199
PETSc Integration Vector Algebra
Parallel AssemblyVectors and Matrices
Processes may set an arbitrary entryMust use proper interface
Entries need not be generated locallyLocal meaning the process on which they are stored
PETSc automatically moves data if necessaryHappens during the assembly phase
M. Knepley (UC) PETSc JSG ’11 70 / 199
PETSc Integration Vector Algebra
Vector Assembly
A three step processEach process sets or adds valuesBegin communication to send values to the correct processComplete the communication
VecSetValues ( Vec v , Pe tsc In t n , Pe tsc In t rows [ ] ,PetscScalar values [ ] , InsertMode mode)
Mode is either INSERT_VALUES or ADD_VALUESTwo phases allow overlap of communication and computation
VecAssemblyBegin(Vecv)VecAssemblyEnd(Vecv)
M. Knepley (UC) PETSc JSG ’11 71 / 199
PETSc Integration Vector Algebra
One Way to Set the Elements of a Vector
VecGetSize ( x , &N) ;MPI_Comm_rank (PETSC_COMM_WORLD, &rank ) ;i f ( rank == 0)
va l = 0 . 0 ;f o r ( i = 0 ; i < N; ++ i )
VecSetValues ( x , 1 , &i , &val , INSERT_VALUES ) ;va l += 10 .0 ;
/ * These rou t i nes ensure t h a t the data i s
d i s t r i b u t e d to the other processes * /VecAssemblyBegin ( x ) ;VecAssemblyEnd ( x ) ;
M. Knepley (UC) PETSc JSG ’11 72 / 199
PETSc Integration Vector Algebra
A Better Way to Set the Elements of a Vector
VecGetOwnershipRange ( x , &low , &high ) ;va l = low * 1 0 . 0 ;f o r ( i = low ; i < high ; ++ i )
VecSetValues ( x , 1 , &i , &val , INSERT_VALUES ) ;va l += 10 .0 ;
/ * No data w i l l be communicated here * /VecAssemblyBegin ( x ) ;VecAssemblyEnd ( x ) ;
M. Knepley (UC) PETSc JSG ’11 73 / 199
PETSc Integration Vector Algebra
Selected Vector Operations
Function Name OperationVecAXPY(Vec y, PetscScalar a, Vec x) y = y + a ∗ xVecAYPX(Vec y, PetscScalar a, Vec x) y = x + a ∗ yVecWAYPX(Vec w, PetscScalar a, Vec x, Vec y) w = y + a ∗ xVecScale(Vec x, PetscScalar a) x = a ∗ xVecCopy(Vec y, Vec x) y = xVecPointwiseMult(Vec w, Vec x, Vec y) wi = xi ∗ yiVecMax(Vec x, PetscInt *idx, PetscScalar *r) r = maxriVecShift(Vec x, PetscScalar r) xi = xi + rVecAbs(Vec x) xi = |xi |VecNorm(Vec x, NormType type, PetscReal *r) r = ||x ||
M. Knepley (UC) PETSc JSG ’11 74 / 199
PETSc Integration Vector Algebra
Working With Local Vectors
It is sometimes more efficient to directly access local storage of a Vec.PETSc allows you to access the local storage with
VecGetArray(Vec, double *[])
You must return the array to PETSc when you finishVecRestoreArray(Vec, double *[])
Allows PETSc to handle data structure conversionsCommonly, these routines are fast and do not involve a copy
M. Knepley (UC) PETSc JSG ’11 75 / 199
PETSc Integration Vector Algebra
VecGetArray in C
Vec v ;PetscScalar * ar ray ;Pe tsc In t n , i ;
VecGetArray ( v , &ar ray ) ;VecGetLocalSize ( v , &n ) ;PetscSynchron izedPr in t f (PETSC_COMM_WORLD,
" F i r s t element o f l o c a l a r ray i s %f \ n " , a r ray [ 0 ] ) ;PetscSynchronizedFlush (PETSC_COMM_WORLD) ;f o r ( i = 0 ; i < n ; ++ i )
a r ray [ i ] += ( PetscScalar ) rank ;VecRestoreArray ( v , &ar ray ) ;
M. Knepley (UC) PETSc JSG ’11 76 / 199
PETSc Integration Vector Algebra
VecGetArray in F77
# inc lude " f i n c l u d e / petsc . h "
Vec v ;PetscScalar ar ray ( 1 )PetscOf fse t o f f s e tPe tsc In t n , iPetscErrorCode i e r r
c a l l VecGetArray ( v , array , o f f s e t , i e r r )c a l l VecGetLocalSize ( v , n , i e r r )do i =1 ,n
ar ray ( i + o f f s e t ) = ar ray ( i + o f f s e t ) + rankend doc a l l VecRestoreArray ( v , array , o f f s e t , i e r r )
M. Knepley (UC) PETSc JSG ’11 77 / 199
PETSc Integration Vector Algebra
VecGetArray in F90
# inc lude " f i n c l u d e / petsc . h90 "
Vec v ;PetscScalar p o i n t e r : : a r ray ( : )Pe tsc In t n , iPetscErrorCode i e r r
c a l l VecGetArrayF90 ( v , array , i e r r )c a l l VecGetLocalSize ( v , n , i e r r )do i =1 ,n
ar ray ( i ) = ar ray ( i ) + rankend doc a l l VecRestoreArrayF90 ( v , array , i e r r )
M. Knepley (UC) PETSc JSG ’11 78 / 199
PETSc Integration Vector Algebra
VecGetArray in Python
wi th v as a :f o r i i n range ( len ( a ) ) :
a [ i ] = 5 .0* i
M. Knepley (UC) PETSc JSG ’11 79 / 199
PETSc Integration Vector Algebra
DMDAVecGetArray in C
DM da ;Vec v ;DMDALocalInfo * i n f o ;PetscScalar * * ar ray ;
DMDAVecGetArray ( da , v , &ar ray ) ;f o r ( j = in fo−>ys ; j < in fo−>ys+ in fo−>ym; ++ j )
f o r ( i = in fo−>xs ; i < in fo−>xs+ in fo−>xm; ++ i ) u = x [ j ] [ i ] ;uxx = ( 2 . 0 * u − x [ j ] [ i −1] − x [ j ] [ i + 1 ] ) * hydhx ;uyy = ( 2 . 0 * u − x [ j −1][ i ] − x [ j + 1 ] [ i ] ) * hxdhy ;f [ j ] [ i ] = uxx + uyy ;
No one data structure is appropriate for all problemsBlocked and diagonal formats provide performance benefitsPETSc has many formatsMakes it easy to add new data structures
Assembly is difficult enough without worrying about partitioningPETSc provides parallel assembly routinesHigh performance still requires making most operations localHowever, programs can be incrementally developed.MatPartitioning and MatOrdering can help
Matrix decomposition in contiguous chunks is simpleMakes interoperation with other codes easierFor other ordering, PETSc provides “Application Orderings” (AO)
N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen,How fast are nonsymmetric matrix iterations?,SIAM J. Matrix Anal. Appl., 13, pp.778–795, 1992.Anne Greenbaum, Vlastimil Ptak, and ZdenekStrakos, Any Nonincreasing Convergence Curveis Possible for GMRES, SIAM J. Matrix Anal.Appl., 17 (3), pp.465–469, 1996.
M. Knepley (UC) PETSc JSG ’11 90 / 199
PETSc Integration Algebraic Solvers
Solver Types
Explicit:Field variables are updated using local neighbor information
Semi-implicit:Some subsets of variables are updated with global solvesOthers with direct local updates
Implicit:Most or all variables are updated in a single global solve
M. Knepley (UC) PETSc JSG ’11 91 / 199
PETSc Integration Algebraic Solvers
Linear SolversKrylov Methods
Using PETSc linear algebra, just add:KSPSetOperators(KSPksp, MatA, MatM, MatStructure flag)KSPSolve(KSPksp, Vecb, Vecx)
Can access subobjectsKSPGetPC(KSPksp, PC*pc)
Preconditioners must obey PETSc interfaceBasically just the KSP interface
Can change solver dynamically from the command line-ksp_type bicgstab
M. Knepley (UC) PETSc JSG ’11 92 / 199
PETSc Integration Algebraic Solvers
Nonlinear Solvers
Using PETSc linear algebra, just add:SNESSetFunction(SNESsnes, Vecr, residualFunc, void *ctx)SNESSetJacobian(SNESsnes, MatA, MatM, jacFunc, void *ctx)SNESSolve(SNESsnes, Vecb, Vecx)
Can access subobjectsSNESGetKSP(SNESsnes, KSP*ksp)
Can customize subobjects from the cmd lineSet the subdomain preconditioner to ILU with -sub_pc_type ilu
M. Knepley (UC) PETSc JSG ’11 93 / 199
PETSc Integration Algebraic Solvers
Basic Solver Usage
Use SNESSetFromOptions() so that everything is set dynamicallySet the type
Use -snes_type (or take the default)Set the preconditioner
Use -npc_snes_type (or take the default)Override the tolerances
Use -snes_rtol and -snes_atol
View the solver to make sure you have the one you expectUse -snes_view
For debugging, monitor the residual decreaseUse -snes_monitorUse -ksp_monitor to see the underlying linear solver
M. Knepley (UC) PETSc JSG ’11 94 / 199
PETSc Integration Algebraic Solvers
3rd Party Solvers in PETSc
Complete table of solvers1 Sequential LU
ILUDT (SPARSEKIT2, Yousef Saad, U of MN)EUCLID & PILUT (Hypre, David Hysom, LLNL)ESSL (IBM)SuperLU (Jim Demmel and Sherry Li, LBNL)MatlabUMFPACK (Tim Davis, U. of Florida)LUSOL (MINOS, Michael Saunders, Stanford)
The PETSc Mesh class is a topology interface.Unstructured grid interface
Arbitrary topology and element shape
Supports partitioning, distribution, and global orders
M. Knepley (UC) PETSc JSG ’11 97 / 199
PETSc Integration More Abstractions
Higher Level Abstractions
The PETSc DM class is a hierarchy interface.Supports multigrid
PCMG combines it with a multigrid preconditioner
Abstracts the logic of multilevel methods
The PetscSection class is a helper class for data layout.Functions over unstructured grids
Arbitrary layout of degrees of freedom
Enables distribution and assembly
M. Knepley (UC) PETSc JSG ’11 98 / 199
PETSc Integration More Abstractions
3 Ways To Use PETSc
User manages all topology (just use Vec and Mat)All indexing is user managed
PETSc manages single topology (use DM)DMDA manages structured grids using (i , j , k) indexingDMMesh manages unstructured grids using PetscSection forindexingCommunication is setup automaticallyUse KSPSetDM() and SNESSetDM() to notify solver
PETSc manages a hierarchy (use PCMG)Only automated for DMDA
M. Knepley (UC) PETSc JSG ’11 99 / 199
Common PETSc Usage
Outline
1 Getting Started with PETSc
2 PETSc Integration
3 Common PETSc UsagePrinciples and DesignDebugging PETScProfiling PETScSerial Performance
3 Common PETSc UsagePrinciples and DesignDebugging PETScProfiling PETScSerial Performance
M. Knepley (UC) PETSc JSG ’11 107 / 199
Common PETSc Usage Debugging PETSc
Correctness Debugging
Automatic generation of tracebacks
Detecting memory corruption and leaks
Optional user-defined error handlers
M. Knepley (UC) PETSc JSG ’11 108 / 199
Common PETSc Usage Debugging PETSc
Interacting with the Debugger
Launch the debugger-start_in_debugger [gdb,dbx,noxterm]-on_error_attach_debugger [gdb,dbx,noxterm]
Attach the debugger only to some parallel processes-debugger_nodes 0,1
Set the display (often necessary on a cluster)-display khan.mcs.anl.gov:0.0
M. Knepley (UC) PETSc JSG ’11 109 / 199
Common PETSc Usage Debugging PETSc
Debugging Tips
Put a breakpoint in PetscError() to catch errors as they occurPETSc tracks memory overwrites at both ends of arrays
The CHKMEMQ macro causes a check of all allocated memoryTrack memory overwrites by bracketing them with CHKMEMQ
PETSc checks for leaked memoryUse PetscMalloc() and PetscFree() for all allocationPrint unfreed memory on PetscFinalize() with -malloc_dump
Simply the best tool today is valgrindIt checks memory access, cache performance, memory usage, etc.http://www.valgrind.orgNeed -trace-children=yes when running under MPI
wi th PETSc . LogStage ( ’ F l u i d Stage ’ ) as f l u i d S t a g e :# A l l opera t ions w i l l be aggregated i n f l u i d S t a g ef l u i d . so lve ( )
wi th PETSc . logEvent ( ’ Reconst ruc t ion ’ ) as recEvent :# A l l opera t ions are t imed i n recEventrecons t ruc t ( so l )# Flops are logged to recEventPETSc . Log . logFlops ( user_event_ f lops )
M. Knepley (UC) PETSc JSG ’11 118 / 199
Common PETSc Usage Profiling PETSc
Adding A Logging Class
s t a t i c i n t CLASS_ID ;
PetscLogClassRegister (&CLASS_ID , "name" ) ;
Class ID identifies a class uniquelyMust initialize before creating any objects of this type
M. Knepley (UC) PETSc JSG ’11 119 / 199
Common PETSc Usage Profiling PETSc
Matrix Memory Preallocation
PETSc sparse matrices are dynamic data structurescan add additional nonzeros freely
Dynamically adding many nonzerosrequires additional memory allocationsrequires copiescan kill performance
Memory preallocation providesthe freedom of dynamic data structuresgood performance
Easiest solution is to replicate the assembly codeRemove computation, but preserve the indexing codeStore set of columns for each row
Call preallocation rourines for all datatypesMatSeqAIJSetPreallocation()MatMPIAIJSetPreallocation()Only the relevant data will be used
Output:[proc #] Matrix size: %d X %d; storage space:%d unneeded, %d used[proc #] Number of mallocs during MatSetValues( )is %d
M. Knepley (UC) PETSc JSG ’11 120 / 199
Common PETSc Usage Profiling PETSc
Exercise 8
Return to Execise 7 and add more profiling.
Update to the next revisionhg update -r3
Build, run, and look at the profiling reportmake ex5./bin/ex5 -use_coords -log_summary
Add a new stage for setupAdd a new event for FormInitialGuess() and log the flopsBuild it again and look at the profiling report
M. Knepley (UC) PETSc JSG ’11 121 / 199
Common PETSc Usage Serial Performance
Outline
3 Common PETSc UsagePrinciples and DesignDebugging PETScProfiling PETScSerial Performance
M. Knepley (UC) PETSc JSG ’11 122 / 199
Common PETSc Usage Serial Performance
Importance of Computational Modeling
Without a model,performance measurements are meaningless!
Before a code is written, we should have a model ofcomputationmemory usagecommunicationbandwidthachievable concurrency
This allows us toverify the implementationpredict scaling behavior
M. Knepley (UC) PETSc JSG ’11 123 / 199
Common PETSc Usage Serial Performance
Complexity Analysis
The key performance indicator, which we will call the balance factor β,is the ratio of flops executed to bytes transfered.
We will designate the unit flopbyte as the Keyes
Using the peak flop rate rpeak, we can get the required bandwidthBreq for an algorithm
Breq =rpeak
β(1)
Using the peak bandwidth Bpeak, we can get the maximum floprate rmax for an algorithm
rmax = βBpeak (2)
M. Knepley (UC) PETSc JSG ’11 124 / 199
Common PETSc Usage Serial Performance
Performance Caveats
The peak flop rate rpeak on modern CPUs is attained through theusage of a SIMD multiply-accumulate instruction on special128-bit registers.SIMD MAC operates in the form of 4 simultaneous operations (2adds and 2 multiplies):
c1 = c1 + a1 ∗ b1 (3)c2 = c2 + a2 ∗ b2 (4)
You will miss peak by the corresponding number of operations youare missing. In the worst case, you are reduced to 25% efficiencyif your algorithm performs naive summation or products.Memory alignment is also crucial when using SSE, theinstructions used to load and store from the 128-bit registers throwvery costly alignment exceptions when the data is not stored inmemory on 16 byte (128 bit) boundaries.
M. Knepley (UC) PETSc JSG ’11 125 / 199
Common PETSc Usage Serial Performance
Analysis of BLAS axpy()
~y ← α~x + ~y
For vectors of length N and b-byte numbers, we haveComputation
2N flops
Memory Access(3N + 1)b bytes
Thus, our balance factor β = 2N(3N+1)b ≈
23b Keyes
M. Knepley (UC) PETSc JSG ’11 126 / 199
Common PETSc Usage Serial Performance
Analysis of BLAS axpy()
~y ← α~x + ~y
For Matt’s Laptop,rpeak = 1700MF/s
implies thatBreq = 2550b MB/s
Much greater than Bpeak
Bpeak = 1122MB/s
implies thatrmax = 748
b MF/s5.5% of rpeak
M. Knepley (UC) PETSc JSG ’11 126 / 199
Common PETSc Usage Serial Performance
STREAM Benchmark
Simple benchmark program measuring sustainable memory bandwidth
Protoypical operation is Triad (WAXPY): w = y + αxMeasures the memory bandwidth bottleneck (much below peak)Datasets outstrip cache
N data, N2 computationNonlinear evaluation (Picard, FAS, Exact Polynomial Solvers)
N data, Nk computation
M. Knepley (UC) PETSc JSG ’11 129 / 199
Common PETSc Usage Serial Performance
Performance Tradeoffs
We must balance storage, bandwidth, and cycles
Assembled Operator ActionTrades cycles and storage for bandwidth in application
Unassembled Operator ActionTrades bandwidth and storage for cycles in applicationFor high orders, storage is impossibleCan make use of FErari decomposition to save calculationCould storage element matrices to save cycles
Partial assembly gives even finer control over tradeoffsAlso allows introduction of parallel costs (load balance, . . . )
M. Knepley (UC) PETSc JSG ’11 130 / 199
Advanced PETSc
Outline
1 Getting Started with PETSc
2 PETSc Integration
3 Common PETSc Usage
4 Advanced PETScSNESDMDA
5 Future Plans
6 Conclusions
M. Knepley (UC) PETSc JSG ’11 131 / 199
Advanced PETSc SNES
Outline
4 Advanced PETScSNESDMDA
M. Knepley (UC) PETSc JSG ’11 132 / 199
Advanced PETSc SNES
Flow Control for a PETSc Application
Timestepping Solvers (TS)
Preconditioners (PC)
Nonlinear Solvers (SNES)
Linear Solvers (KSP)
FunctionEvaluation Postprocessing
JacobianEvaluation
ApplicationInitialization
Main Routine
PETSc
M. Knepley (UC) PETSc JSG ’11 133 / 199
Advanced PETSc SNES
SNES Paradigm
The SNES interface is based upon callback functionsFormFunction(), set by SNESSetFunction()
FormJacobian(), set by SNESSetJacobian()
When PETSc needs to evaluate the nonlinear residual F (x),Solver calls the user’s function
User function gets application state through the ctx variablePETSc never sees application data
M. Knepley (UC) PETSc JSG ’11 134 / 199
Advanced PETSc SNES
Topology Abstractions
DMDAAbstracts Cartesian grids in any dimensionSupports stencils, communication, reorderingNice for simple finite differences
DMMeshAbstracts general topology in any dimensionAlso supports partitioning, distribution, and global ordersAllows aribtrary element shapes and discretizations
M. Knepley (UC) PETSc JSG ’11 135 / 199
Advanced PETSc SNES
Assembly Abstractions
DMAbstracts the logic of multilevel (multiphysics) methodsManages allocation and assembly of local and global structuresInterfaces to PCMG solver
PetscSectionAbstracts functions over a topologyManages allocation and assembly of local and global structuresWill merge with DM somehow
M. Knepley (UC) PETSc JSG ’11 136 / 199
Advanced PETSc SNES
SNES Function
User provided function calculates the nonlinear residual:
PetscErrorCode ( * func ) (SNES snes , Vec x , Vec r , vo id * c t x )
x: The current solutionr: The residual
ctx: The user context passed to SNESSetFunction()
Use this to pass application information, e.g. physical constants
M. Knepley (UC) PETSc JSG ’11 137 / 199
Advanced PETSc SNES
SNES Jacobian
User provided function calculates the Jacobian:
PetscErrorCode ( * func ) (SNES snes , Vec x , Mat * J , Mat *M, vo id * c t x )
x: The current solutionJ: The JacobianM: The Jacobian preconditioning matrix (possibly J itself)
ctx: The user context passed to SNESSetJacobian()
Use this to pass application information, e.g. physical constants
Alternatively, you can usematrix-free finite difference approximation, -snes_mffinite difference approximation with coloring, -snes_fd
M. Knepley (UC) PETSc JSG ’11 138 / 199
Advanced PETSc SNES
SNES Variants
Picard iteration
Line search/Trust region strategies
Quasi-Newton
Nonlinear CG/GMRES
Nonlinear GS/ASM
Nonlinear Multigrid (FAS)
Variational inequality approaches
M. Knepley (UC) PETSc JSG ’11 139 / 199
Advanced PETSc SNES
Finite Difference Jacobians
PETSc can compute and explicitly store a Jacobian via 1st-order FDDense
Activated by -snes_fdComputed by SNESDefaultComputeJacobian()
Sparse via colorings (default)Coloring is created by MatFDColoringCreate()Computed by SNESDefaultComputeJacobianColor()
Can also use Matrix-free Newton-Krylov via 1st-order FDActivated by -snes_mf without preconditioningActivated by -snes_mf_operator with user-definedpreconditioning
Uses preconditioning matrix from SNESSetJacobian()
M. Knepley (UC) PETSc JSG ’11 140 / 199
Advanced PETSc SNES
SNES ExampleDriven Cavity
Velocity-vorticity formulationFlow driven by lid and/or bouyancyLogically regular grid
Parallelized with DMDA
Finite difference discretizationAuthored by David Keyes
typedef s t r u c t / *−−−−− basic a p p l i c a t i o n data −−−−−* /PetscReal l i d _ v e l o c i t y ;PetscReal p r a n d t lPetscReal grashof ;PetscBool draw_contours ;
Residual (SNES snes , Vec X, Vec F , vo id * p t r ) AppCtx * user = ( AppCtx * ) p t r ;
/ * l o c a l s t a r t i n g and ending g r i d po in t s * /Pe tsc In t i s t a r t , iend , j s t a r t , jend ;PetscScalar * f ; / * l o c a l vec to r data * /PetscReal grashof = user−>grashof ;PetscReal p r a n d t l = user−>p r a n d t l ;PetscErrorCode i e r r ;
/ * Code to communicate non loca l ghost po i n t data * /VecGetArray (F , & f ) ;/ * Code to compute l o c a l f u n c t i o n components * /VecRestoreArray (F , & f ) ;r e t u r n 0 ;
ResLocal (DMDALocalInfo * in fo ,PetscScalar * * x , PetscScalar * * f , vo id * c tx )
f o r ( j = in fo−>ys ; j < in fo−>ys+ in fo−>ym; ++ j )
f o r ( i = in fo−>xs ; i < in fo−>xs+ in fo−>xm; ++ i ) u = x [ j ] [ i ] ;uxx = ( 2 . 0 * u − x [ j ] [ i −1] − x [ j ] [ i + 1 ] ) * hydhx ;uyy = ( 2 . 0 * u − x [ j −1][ i ] − x [ j + 1 ] [ i ] ) * hxdhy ;f [ j ] [ i ] . u = uxx + uyy − . 5 * ( x [ j + 1 ] [ i ] . omega−x [ j −1][ i ] . omega ) * hx ;f [ j ] [ i ] . v = uxx + uyy + . 5 * ( x [ j ] [ i + 1 ] . omega−x [ j ] [ i −1].omega ) * hy ;f [ j ] [ i ] . omega = uxx + uyy +
( vxp * ( u − x [ j ] [ i −1].omega) + vxm * ( x [ j ] [ i + 1 ] .omega − u ) ) * hy +( vyp * ( u − x [ j −1][ i ] . omega) + vym * ( x [ j + 1 ] [ i ] . omega − u ) ) * hx −0.5* grashof * ( x [ j ] [ i + 1 ] . temp − x [ j ] [ i −1]. temp ) * hy ;
f [ j ] [ i ] . temp = uxx + uyy + p r a n d t l *( ( vxp * ( u − x [ j ] [ i −1]. temp ) + vxm * ( x [ j ] [ i + 1 ] . temp − u ) ) * hy +( vyp * ( u − x [ j −1][ i ] . temp ) + vym * ( x [ j + 1 ] [ i ] . temp − u ) ) * hx ) ;
DMDA is a topology interface on structured gridsHandles parallel data layoutHandles local and global indices
DMDAGetGlobalIndices() and DMDAGetAO()Provides local and global vectors
DMGetGlobalVector() and DMGetLocalVector()Handles ghost values coherence
DMGlobalToLocalBegin/End() and DMLocalToGlobalBegin/End()
M. Knepley (UC) PETSc JSG ’11 146 / 199
Advanced PETSc DMDA
Residual Evaluation
The DM interface is based upon local callback functionsFormFunctionLocal()
FormJacobianLocal()
Callbacks are registered usingSNESSetDM(), TSSetDM()
DMSNESSetFunctionLocal(), DMTSSetJacobianLocal()
When PETSc needs to evaluate the nonlinear residual F(x),Each process evaluates the local residual
PETSc assembles the global residual automaticallyUses DMLocalToGlobal() method
M. Knepley (UC) PETSc JSG ’11 147 / 199
Advanced PETSc DMDA
Ghost Values
To evaluate a local function f (x), each process requiresits local portion of the vector xits ghost values, bordering portions of x owned by neighboringprocesses
info: All layout and numbering informationx: The current solutionJ: The Jacobian
ctx: The user context passed to DASetLocalJacobian()
The local DMDA function is activated by calling
DMDASNESSetJacobianLocal(dm, ljac, &ctx)
M. Knepley (UC) PETSc JSG ’11 153 / 199
Advanced PETSc DMDA
Bratu Jacobian Evaluation
JacLocal (DMDALocalInfo * in fo , PetscScalar * * x , Mat jac , vo id * c tx ) f o r ( j = in fo−>ys ; j < in fo−>ys + in fo−>ym; j ++)
f o r ( i = in fo−>xs ; i < in fo−>xs + in fo−>xm; i ++) row . j = j ; row . i = i ;i f ( i == 0 | | j == 0 | | i == mx−1 | | j == my−1)
v [ 0 ] = 1 . 0 ;Mat SetVa luesStenc i l ( jac ,1 ,& row ,1 ,& row , v , INSERT_VALUES ) ;
e lse v [ 0 ] = −(hx / hy ) ; co l [ 0 ] . j = j −1; co l [ 0 ] . i = i ;v [ 1 ] = −(hy / hx ) ; co l [ 1 ] . j = j ; co l [ 1 ] . i = i −1;v [ 2 ] = 2 . 0 * ( hy / hx+hx / hy )
− hx * hy * lambda * PetscExpScalar ( x [ j ] [ i ] ) ;v [ 3 ] = −(hy / hx ) ; co l [ 3 ] . j = j ; co l [ 3 ] . i = i +1;v [ 4 ] = −(hx / hy ) ; co l [ 4 ] . j = j +1; co l [ 4 ] . i = i ;Mat SetVa luesStenc i l ( jac ,1 ,& row ,5 , col , v , INSERT_VALUES ) ;
A DMDA contains topology, geometry, and (sometimes) an implicit Q1discretization.
It is used as a template to createVectors (functions)Matrices (linear operators)
M. Knepley (UC) PETSc JSG ’11 155 / 199
Advanced PETSc DMDA
DMDA Vectors
The DMDA object contains only layout (topology) informationAll field data is contained in PETSc Vecs
Global vectors are parallelEach process stores a unique local portionDMCreateGlobalVector(DM da, Vec *gvec)
Local vectors are sequential (and usually temporary)Each process stores its local portion plus ghost valuesDMCreateLocalVector(DM da, Vec *lvec)includes ghost and boundary values!
M. Knepley (UC) PETSc JSG ’11 156 / 199
Advanced PETSc DMDA
Updating Ghosts
Two-step process enables overlappingcomputation and communication
DMGlobalToLocalBegin(da, gvec, mode, lvec)gvec provides the datamode is either INSERT_VALUES or ADD_VALUESlvec holds the local and ghost values
DMGlobalToLocalEnd(da, gvec, mode, lvec)Finishes the communication
The process can be reversed with DALocalToGlobalBegin/End().
M. Knepley (UC) PETSc JSG ’11 157 / 199
Advanced PETSc DMDA
DMDA Stencils
Both the box stencil and star stencil are available.
proc 0 proc 1
proc 10
proc 0 proc 1
proc 10
Box Stencil Star Stencil
M. Knepley (UC) PETSc JSG ’11 158 / 199
Advanced PETSc DMDA
Setting Values on Regular Grids
PETSc provides
Mat SetVa luesStenc i l ( Mat A, m, Mat S t e n c i l idxm [ ] , n , Mat S t e n c i l idxn [ ] ,PetscScalar values [ ] , InsertMode mode)
Each row or column is actually a MatStencilThis specifies grid coordinates and a component if necessaryCan imagine for unstructured grids, they are vertices
The values are the same logically dense block in row/col
M. Knepley (UC) PETSc JSG ’11 159 / 199
Advanced PETSc DMDA
Creating a DMDA
DMDACreate2d(comm, bdX, bdY, type, M, N, m, n, dof, s, lm[], ln[], DMDA *da)
Several varieties of preconditioners can be supported:Block Jacobi or Block Gauss-SiedelSchur complementBlock ILU (approximate coupling and Schur complement)Dave May’s implementation of Elman-Wathen type PCs
which only require actions of individual operator blocks
Notice also that we may have any combination of“canned” PCs (ILU, AMG)PCs needing special information (MG, FMM)custom PCs (physics-based preconditioning, Born approximation)
Finite Element Integrator And Tabulator by Rob Kirby
http://www.fenics.org/fiat
FIAT understandsReference element shapes (line, triangle, tetrahedron)Quadrature rulesPolynomial spacesFunctionals over polynomials (dual spaces)Derivatives
User can build arbitrary elements specifying the Ciarlet triple (K ,P,P ′)
FIAT is part of the FEniCS project, as is the PETSc Sieve module
FFC is a compiler for variational forms by Anders Logg.
Here is a mixed-form Poisson equation:
a((τ,w), (σ, u)) = L((τ,w)) ∀(τ,w) ∈ V
where
a((τ,w), (σ, u)) =
∫Ωτσ −∇ · τu + w∇ · u dx
L((τ,w)) =
∫Ω
wf dx
M. Knepley (UC) PETSc JSG ’11 191 / 199
Future Plans FEniCS Tools
FFCMixed Poisson
shape = " t r i a n g l e "
BDM1 = Fin i teE lement ( " Brezzi−Douglas−Mar in i " , shape , 1 )DG0 = Fin i teE lement ( " Discont inuous Lagrange " , shape , 0 )
element = BDM1 + DG0( tau , w) = TestFunct ions ( element )( sigma , u ) = T r i a l F u n c t i o n s ( element )
a = ( dot ( tau , sigma ) − d iv ( tau ) * u + w* d iv ( sigma ) ) * dx
f = Funct ion (DG0)L = w* f * dx
M. Knepley (UC) PETSc JSG ’11 192 / 199
Future Plans FEniCS Tools
FFC
Here is a discontinuous Galerkin formulation of the Poisson equation:
a(v ,u) = L(v) ∀v ∈ V
where
a(v ,u) =
∫Ω∇u · ∇v dx
+∑
S
∫S− < ∇v > ·[[u]]n − [[v ]]n· < ∇u > −(α/h)vu dS
+
∫∂Ω−∇v · [[u]]n − [[v ]]n · ∇u − (γ/h)vu ds
L(v) =
∫Ω
vf dx
M. Knepley (UC) PETSc JSG ’11 193 / 199
Future Plans FEniCS Tools
FFCDG Poisson
DG1 = Fin i teE lement ( " Discont inuous Lagrange " , shape , 1 )v = TestFunct ions (DG1)u = T r i a l F u n c t i o n s (DG1)f = Funct ion (DG1)g = Funct ion (DG1)n = FacetNormal ( " t r i a n g l e " )h = MeshSize ( " t r i a n g l e " )a = dot ( grad ( v ) , grad ( u ) ) * dx− dot ( avg ( grad ( v ) ) , jump ( u , n ) ) * dS− dot ( jump ( v , n ) , avg ( grad ( u ) ) ) * dS+ alpha / h* dot ( jump ( v , n ) + jump ( u , n ) ) * dS− dot ( grad ( v ) , jump ( u , n ) ) * ds− dot ( jump ( v , n ) , grad ( u ) ) * ds+ gamma/ h* v *u* ds