Numerical methods for on-line power system load flow analysis ...

Energy Syst (2010) 1: 273–289DOI 10.1007/s12667-010-0013-6

O R I G I NA L PA P E R

Numerical methods for on-line power system load flowanalysis

Siddhartha Kumar Khaitan · James D. McCalley ·Mandhapati Raju

Received: 4 February 2010 / Accepted: 20 April 2010 / Published online: 6 May 2010© Springer-Verlag 2010

Abstract Newton-Raphson method is the most widely accepted load flow solutionalgorithm. However LU factorization remains a computationally challenging task tomeet the real-time needs of the power system. This paper proposes the application ofvery fast multifrontal direct linear solvers for solving the linear system sub-problemof power system real-time load flow analysis by utilizing the state-of-the-art algo-rithms for ordering and preprocessing. Additionally the unsymmetric multifrontalmethod for LU factorization and highly optimized Intel® Math Kernel Library BLAShas been used. Two state-of-the-art multifrontal algorithms for unsymmetric matricesnamely UMFPACK V5.2.0 and sequential MUMPS 4.8.3 (“Multifrontal MassivelyParallel Solver”) are customized for the AC power system Newton-Raphson basedload flow analysis. The multifrontal solvers are compared against the state-of-the-artsparse Gaussian Elimination based HSL sparse solver MA48. This study evaluatesthe performance of above multifrontal solvers in terms of number of factors, compu-tational time, number of floating-point operations and memory, in the context of loadflow solution on nine systems including very large real power systems. The results ofthe performance evaluation are reported. The proposed method achieves significantreduction in computational time.

This work was supported in part by U.S. Department of Energy Consortium for Electric ReliabilityTechnology Solutions (CERTS).

S.K. Khaitan (�) · J.D. McCalleyDepartment of Electrical and Computer Engineering, Iowa State University, Iowa, IA 50011, USAe-mail: [email protected]

J.D. McCalleye-mail: [email protected]

M. RajuOptimal Inc., Plymouth, MI 48312, USAe-mail: [email protected]

mailto:[email protected]



274 S.K. Khaitan et al.

Keywords Load flow analysis · Linear solvers · Multifrontal methods · MUMPS ·UMFPACK · MA48

1 Introduction

Load flow solutions are among the most frequently performed power network calcu-lations for steady state operating conditions of the power system. In the operationalcontext, for the load flow simulations to be of prime value to the operators, it mustbe performed in a very computationally efficient manner. Over the past few decadesseveral methods have been developed to perform load flow simulations in a reliable,versatile and computationally efficient manner with optimal storage requirements [1,38–42]. However the Newton-Raphson (NR) methods are the most widely acceptedsolution methods due to their robustness and fast (quadratic) convergence charac-teristics. The core of the iterative NR method is the solution of a system of linearequations which accounts for almost 90% of the solution time. Solving the linear sys-tem sub-problem of the NR method involves a computationally challenging task ofLU factorization at every iteration. Sparse Gaussian elimination and matrix inversiontechniques have been used to reduce computational burden. Any further reduction ofcomputational time is advantageous for a heavily used tool, like load flow simulationstudies, as it allows the power engineers and operators to perform more simulationsfor different scenarios. Fast decoupled methods have been developed for real-timeload flow solutions. They require a single simpler LU factorization followed by for-ward elimination and backward substitution on a much smaller matrix for successiveiterations. However the fast decoupled methods fail to converge for systems understressed operation conditions and/or with high R/X ratios, and one has to resort tofull NR load flow simulations. To reduce the computational burden and the wall clocktime, for the load flow solution, multifrontal methods [2] have been proposed in thispaper.

On the windows platforms, the multifrontal methods were used almost a decadeago for power flow in reference [3] which implements UMFPACK V2 [4] with de-fault ordering. No information is provided regarding BLAS (Basic Linear AlgebraSubprograms) which provides standard building blocks for performing basic vectorand matrix operations. The focus of this paper was to promote automatic code dif-ferentiation in power flow algorithms and it was stated that the resulting code wasslower than conventional packages with hand coded Jacobian. Since then, there hasbeen a lot of advancement in the area of multifrontal methods (including algorithmicimprovements in UMFPACK [4–9]) to make them highly computationally efficient.Currently, there is an abundance of advanced algorithms for ordering schemes toreduce fill-in, for post-ordering and restructuring of elimination trees. Efficient andoptimized elimination trees, preordering strategies, reduced working storage and re-duction in indirect memory access give higher speed and performance [10–12].

Reference [13] proposes FPGA hardware technology for the solution of sparseequations and compares the result for the solution phase alone with multifrontalmethods on a Linux platform. The ordering was done using MATLAB functions. Ref-erence [14] uses MUMPS, a parallel multifrontal solver [15–21], on a Linux platform

Numerical methods for on-line power system load flow analysis 275

for symmetric indefinite matrices that arises in the solution of security constrainedAC-OPF problem. We are not aware of any other multifrontal implementations for thepower system AC load flow solutions with unsymmetric Jacobian matrices. Specifi-cally there is no instance, in the open literature, of the application of MUMPS for thepower flow solution with unsymmetrical Jacobian matrices.

The objective of this paper is to aid in security assessment through enhancedcomputational speed for the solution of the linear system of equations that arisein the NR solution of the power system load flow. To achieve the desired compu-tational speed for load flow solution, this paper implements sparse direct solversbased on unsymmetric multifrontal methods MUMPS [15–21] and UMFPACK [4–9].Both MUMPS and UMFPACK are available free in the public domain. MUMPSallows for external interface of other up-to-date sequential ordering softwares likeMETIS [22], PORD [23] and SCOTCH [24] and several internal orderings likeAMD [25], QAMD [26], and AMF [26] whereas in UMFPACK the preferred order-ing for unsymmetrical matrices is COLAMD [4–9]. In this study except SCOTCHall other orderings have been interfaced with MUMPS and their performances areevaluated in terms of number of entries in the factors, computational time, numberof floating-point operations and memory, and the results are reported. Highly opti-mized Intel® Math Kernel Library BLAS are used which are very critical for per-formance [27]. The results are compared against the state-of-the-art sparse Gaussianelimination solver MA 48 [28, 29], available free to the researchers.. This study isthe first of its kinds in implementing latest unsymmetrical multifrontal solvers withdifferent choice of ordering schemes, highly optimized BLAS and intelligent choiceof different phases within a sparse solver to save computational time to allow forreal-time security assessment.

Section 2 provides a brief background on the numerical formulation of the loadflow problem, the solution of non linear equations and the linear solvers. Section 3presents an overview of multifrontal solvers MUMPS [15–21] and UMFPACK [4–9]proposed in this paper for linear system sub-problem of the NR load flow solution.Section 4 discusses the implementations issues regarding the solvers. Section 5 con-tains the results and the discussion and Sect. 6 concludes.

2 Numerical methods

Efficient algorithms and hardware together or independently can offer great advan-tage in achieving the desired speed for real-time operations and security assessment.In this paper, we focus on algorithmic improvements to achieve high computationalgain for full NR load flow solutions.

We divide our load flow software into three parts, namely (1) Preprocessor(2) Newton-Raphson solver and (3) the output assembler. Preprocessor includes theuser interface, assembling the admittance matrix and the Jacobian building. Our stud-ies indicate about 90–95% of the computational time is spent on the NR solver. TheNR solver involves numerical analysis techniques which are broadly classified intotwo categories namely (1) solution of nonlinear equations and (2) the solution oflinear equations.


2.1 Non linear equation solution

The developed equations have the same notations as in [1]. Consider the set of equa-tions below

y1 = f1(x1, x2, x3, . . . , xn)

y2 = f2(x1, x2, x3, . . . , xn)

. . .

yn = fn(x1, x2, x3, . . . , xn)

(1)

Expanding the first equation about the initial solution [x1(0), x2(0), . . . , xn(0)],we get:

y = f1[x1(0), x2(0), . . . , xn(0)] + ∂f1

∂x1

∣∣∣∣x1=x1(0)

[x1 − x1(0)]

+ ∂f1

∂x2

∣∣∣∣x2=x2(0)

[x1 − x1(0)] · · · + ∂f1

∂xn

∣∣∣∣xn=xn(0)

[xn − xn(0)] (2)

after expanding other equations similarly and expressing in the matrix form we get:

⎡

⎢⎢⎢⎣

y1y2...

yn

⎤

⎥⎥⎥⎦

=

⎡

⎢⎢⎢⎣

f1[x1(0), x2(0), . . . , xn(0)]f2[x1(0), x2(0), . . . , xn(0)]

...

fn[x1(0), x2(0), . . . , xn(0)]

⎤

⎥⎥⎥⎦

+

⎡

⎢⎢⎢⎢⎢⎣

∂f1∂x1

∂f1∂x2

. . .∂f1∂xn

∂f2∂x1

∂f2∂x2

. . .∂f2∂xn

......

...∂fn

∂x1

∂fn

∂x2. . .

∂fn

∂xn

⎤

⎥⎥⎥⎥⎥⎦

⎡

⎢⎢⎢⎣

x1 − x1(0)

x2 − x2(0)...

xn − xn(0)

⎤

⎥⎥⎥⎦

(3)Equation (3) can be expressed in compact form as

y = f [x(0)] + J (0)[x − x(0)] (4)

where

x = x(0) + J (0)−1[y − f {x(0)}] (5)

In recursive form using iteration count i, the above equation becomes

xi+1 = xi + J−1i [y − f (xi)] (6)

This the Newton-Raphson solution formulation.

2.2 Problem formulation

The fundamental bus bar equations at node k are given below:

Ik =n

∑

j=1

YkjVj (7)


S∗k = V ∗

k Ik (8)

where Ik is the injected current at node k, Vk is the voltage at node k, Ykj is theadmittance between node k and j and Sk is the complex power. Substituting (7)in (8), the complex power can be written as

Sk = f (Vk) (9)

Equation (9) has the same form as (1) and thus can be solved using the Newton-Raphson method. The set of nonlinear algebraic equations in (9) are solved at eachiteration using the Newton-Raphson method, and the unknowns are updated as in (6).In short form (3) is written as

⎡

⎣

�P

−−�Q

⎤

⎦ =⎡

⎣

J1 | J2−−−−J3 | J4

⎤

⎦

⎡

⎣

�V/V

−−�θ

⎤

⎦ (10)

where �V and �� are the correction variables for voltage magnitude and voltagephase angle. The Newton iterations terminate when the largest absolute mismatchvector is less than a pre-specified tolerance. Computation of the correction vector[�V,��]T requires the solution of the set of linear equations given by (10) whichis discussed in the next section.

2.3 Linear solver

The most computationally intensive step in the solution of the load flow is the so-lution of the sparse system of linear equations. For the above power system loadflow problem formulation, the Jacobian matrix J is highly sparse, has good structuralsymmetry but is poor in numerical symmetry. We use this fact to gain computationalefficiency for real-time load flow and security assessment by employing multifrontalbased sparse linear solvers with optimized BLAS library and effective ordering strate-gies to reduce fill-in.

A typical sparse solver consists of four distinct phases in the solution of linearsystem:

1. The preprocessing and the ordering step which minimizes the fill-in and exploitsspecial structures such as block triangular form.

2. The symbolic factorization step determines the nonzero structures of the factors L

and U , sets up the data structures for storing nonzeros of L and U and allocatesmemory for the nonzeros.

3. The numerical factorization step involves inputting the numerical values, comput-ing L and U via intelligent factorization on multiple small, but dense “fronts orfrontal matrix”, with pivoting to maintain numerical stability.

4. The solve step performs forward and/or backward substitutions and eliminationsto solve linear system.

All these steps can be called separately or as a combination of each other. This canbe exploited to save computational effort during the solution of subsequent iterations


in the solution of a set of linear equations. For example if the structure of the matrixdoes not change during every iteration, the analysis step can be skipped after evalu-ating once. Similarly, if the left hand matrix does not change, both the analysis andthe factorization steps can be skipped. The first three steps are most time consumingand an effective strategy is incorporated to intelligently select required steps duringiterations. This results in significant computational saving and is discussed in Sect. 5towards the end.

In the following sections we give a brief overview of the two multifrontal solversand the different ordering schemes.

3 Multifrontal methods

The multifrontal method, a generalization of the frontal method, was originally de-veloped for symmetric systems [19]. Subsequently, an unsymmetric multifrontal al-gorithm [5] was developed for general sparse unsymmetric matrices. An overviewon the theory and mathematical foundation of multifrontal methods is given in [2]and [30]. Multifrontal methods reorganize the task of factorizing a large sparse ma-trix into a sequence of partial factorization of smaller dense frontal matrices andmakes full use of the high performance computer architecture by invoking the level3 Basic Linear Algebra Subprograms (BLAS) library. Thus memory requirement isheavily reduced, and computing speed is greatly enhanced.

In the present study, MUMPS 4.8.3 and UMFPACK V5.2.0 are used as the multi-frontal engine for the solution of (10).

3.1 MUMPS

MUMPS 4.8.3 (“Multifrontal Massively Parallel Solver”) written in Fortran 90, is apackage, based on multifrontal algorithms, for solving systems of linear equationsof the form in (10), where J is a square sparse matrix that can be either unsymmet-ric, symmetric positive definite, or general symmetric. It performs a direct factoriza-tion A = LU or A = LDLT depending on the symmetry of the matrix. MUMPSis primarily a parallel distributed solver designed for computational efficiency andexploits both parallelism arising from sparsity in the matrix A and from dense fac-torizations kernels. However the analysis phase is sequential. The parallel versionof MUMPS requires MPI [31] for message passing and makes use of the BLAS,BLACS, and ScaLAPACK [32] libraries. The sequential version only relies on BLAS.The MUMPS solver facilitates error analysis, iterative refinement, scaling of the orig-inal matrix, out-of-core capability, detection of null pivots, basic estimate of rank de-ficiency and null space basis, and computation of a Schur complement matrix alongwith the factorization. MUMPS has several built-in ordering algorithms, and providesa tight interface to external ordering packages such as PORD, SCOTCH or METISand also a possibility for the user to input a given ordering. PORD is provided as apart of the MUMPS package. METIS and SCOTCH need to be separately linked tothe MUMPS solver.


3.1.1 BLAS

The BLAS (Basic Linear Algebra Subprograms) provide standard building blocks forperforming basic vector and matrix operations. The Level 1 BLAS performs scalar,vector and vector-vector operations, Level 2 BLAS performs matrix-vector opera-tions, and the Level 3 BLAS performs matrix-matrix operations. The BLAS are effi-cient, portable, and widely available, and they are commonly used in the developmentof high quality linear algebra software. ATLAS, ACML and Intel® Math Kernel Li-brary BLAS are examples of BLAS libraries.

3.1.2 Ordering methods

A fill-reducing ordering is crucial for efficient numerical solution of sparse linear sys-tems. The goal of the preordering is to find a permutation matrix P so that the subse-quent factorization has the minimum fill-in. However, this problem is NP-complete,so heuristics are used. Over the years two main classes of heuristics have evolvednamely (a) minimum-degree (MD)-based heuristics, and (b) graph-partitioning (GP)-based heuristics. MD-based heuristics are local greedy heuristics that reorder thecolumns of a sparse matrix such that the column with the fewest non-zeros at a givenstage of factorization is the next one to be eliminated at that stage.

The minimum degree ordering algorithm produces factors with relatively low fill-in on a wide range of matrices. Because of this, the algorithm has received muchattention and research over the past three decades to improve their run time and qual-ity. The multiple minimum-degree (MMD) algorithm and the approximate minimum-degree (AMD) represent the state of the art in MD based heuristics.

GP-based heuristics regard the sparse matrix as the adjacency matrix of a graphand follow a divide-and-conquer strategy to label the nodes of the graph by partition-ing it into smaller subgraphs.

Nested Dissection is one more effective method of finding an elimination ordering.The algorithm uses a divide and conquer strategy on the graph. Removal of a setof vertices results in two new graphs on which this dissection may be performedseparately. The results for the two parts may then be combined to find the solution ofthe entire graph.

A new class of hybrid ordering packages, that combines the above two algorithmsto produce better and robust ordering, has been developed in the recent past. The hy-brid orderings are sometimes called incomplete nested dissection. Those algorithmscombine top-down multilevel nested dissection algorithms with bottom-up minimumdegree (AMD or MMD) or minimum fill (MMF) local heuristics. A more generalclassification of hybrid schemes is known as multi-section ordering. PORD (Pader-born Ordering tool) uses the multi-section ordering.

MUMPS has in-built approximate minimum degree ordering algorithm AMD,AMF (approximate minimum fill algorithm), a nested dissection (ND) or a exter-nally computed given ordering. MUMPS also provides for external interfacing ofSCOTCH, PORD and METIS. SCOTCH is a hybrid of nested dissection and AMD.PORD implements a tighter coupling between nested dissection and approximateminimum fill, whereas METIS uses multilevel nested dissection and minimum de-gree. The degree of coupling depends on the partitioner. AMF, SCOTCH and PORD


have been tightly coupled in the sense that the assembly tree is obtained as an outputof the partitioner. In the case of METIS only ordering is obtained. It is thus said to beloosely coupled.

3.2 UMFPACK

UMFPACK consists of a set of ANSI/ISO C routines for solving unsymmetric sparselinear systems using the unsymmetric multifrontal method. It requires the matrixto be input in a sparse triplet (compressed sparse column) format. More detaileddiscussion on the solver can be found in [4–9]. UMFPACK has the option to useBLAS for high performance. For general unsymmetrical matrices, the multifrontalmethod offers a significant performance advantage over more conventional factoriza-tion schemes [33].

4 Implementation

Our simulator is written in C++. The MUMPS solver is a set of Fortran 90 routines,whereas the UMFPACK solver is a set of ANSI/ISO C routines and MA48 is a set ofFortran 77 routines. Regarding the implementation, because of the language differ-ences, proper interface driver routines were written and libraries were developed bothfor the solvers and the ordering algorithms (METIS and PORD). MUMPS is writtenprimarily as a parallel solver for Linux architecture and MUMPS is not distributedfor windows architecture. Significant effort was spent on building sequential librariesfor the windows architecture on which our current simulator is developed. SimilarlyMETIS source code along with the makefile is distributed for building libraries onLinux. Only binaries are distributed for windows. Libraries were built for the win-dows architecture. Both MUMPS and UMFPACK were linked to a highly optimizedBLAS libraries from Intel® Math Kernel Library.

5 Results and discussion

Both the multifrontal solvers and MA48 are tested on the nine systems of varyingsizes. Table 1 gives the details of these nine systems in terms of the order of theJacobian for these systems and the number of non zeros in the Jacobian.

The IEEE systems IEEE9, IEEE30, IEEE39 and the Bench system are from thePSS/e example collections. The IEEE RTS 96 is from [34, 35]. The system 1 isfrom [36]. The US Utility systems are from real US grid systems for which we ob-tained the Jacobian of the power flow for our tests. The results obtained with thethree solvers were compared with those obtained from the PSS/e and were found tobe exactly the same for all the systems except for the US Utility systems for whichwe had only the Jacobian and so they were not tested with PSS/e. But the results ofthe solution of the Jacobian with an arbitrary right hand side were found to be exactlysame with all the solvers.

Experiments have been done to study the effect of different ordering routines onthe performance of MUMPS solver both in terms of memory and computational time


Table 1 Test systems withJacobian order and no of nonzeros

System Order of Jacobian No. of non-zeros

IEEE9 14 130

IEEE RTS 96 36 401

IEEE30 53 616

IEEE39 67 725

Bench 14508 188120

System 1 18540 141630

System 2 32356 262624

System 3 44102 328346

System 4 87520 602245

Table 2 Performancecomparison for System 4 Ordering Computational time (seconds) Peak memory (MB)

AMD 1.25 730

AMF 1.28 720

PORD 2.33 720

METIS 2.15 720

QAMD 1.26 720


AMD 0.657 670

AMF 0.658 680

PORD 1.281 690

METIS 1.079 680

QAMD 0.657 670

for the solution of the system of linear equations arising in the load flow solution,for one iteration. The results for three representatives systems namely System 4, Sys-tem 3 and System 1 are shown below as they have widely different system sizes andhence the Jacobian sizes are also widely different. Table 2 shows the performancecomparison of the MUMPS solver for the different ordering schemes in terms of thewall clock computational time and CPU memory on System 4.

It is clear from Table 2 that AMD results in least computational time and PORDin maximum computational time. The computational time difference is almost twotimes. Computational time for AMD and QAMD are almost similar and very closeto that of AMF. In terms of peak memory requirement all the orderings performsimilarly.

Table 3 shows the performance comparison of the MUMPS solver for the differentordering schemes in terms of the wall clock computational time and CPU memoryfor System 3. From Table 3 it is clear that AMD again results least computationaltime and PORD in maximum computational time. The computational time difference



AMD 0.25 660

AMF 0.265 660

PORD 0.485 660

METIS 0.422 660

QAMD 0.266 660

Table 5 Performancecomparison for System 4 Ordering No. of entries in factors (L + U ) FLOPS (×106)

AMD 1259168 29.15

AMF 1251490 28.01

PORD 1427512 43.72

METIS 1478748 42.61

QAMD 1259168 29.15

is almost two times. Computational time for AMD and QAMD are again similar andAMF is almost same. In terms of peak memory requirement all the orderings behavesimilarly.

Similarly Table 4 shows the results for the System 1. Once again the conclusionsare same as for the above two systems.

The conclusion from Table 2, Table 3 and Table 4 is that AMD and AMF orderinggive almost comparable results however AMD gives the best computational perfor-mance for systems of all sizes for the MUMPS solver for load flow solution.

Experiments have also been done to study the effect of different ordering routineson the performance of MUMPS solver both in terms of the total number of floatingpoint operations (FLOPS), which directly impacts the computational time, and interms of the number of non zero entries in the factors based on which the memory isallocated. Table 5 shows the performance comparison of the MUMPS solver for thedifferent ordering schemes in terms of the FLOPS and number of entries in factorsfor System 4.

It is clear from Table 5 that AMF performs the minimum number of FLOPS andhas minimum number of entries in the factors whereas METIS is worst in perfor-mance in terms of number of entries in the factors. QAMD and AMD are the nextbest.

Table 6 shows the performance comparison of the MUMPS solver for the differentordering schemes in terms of the FLOPS and number of entries in factors for Sys-tem 3. We can see from Table 6 that AMF and AMD are close to each other and AMFperforms the minimum number of FLOPS and has minimum number of entries in thefactors whereas METIS is worst in performance in terms of number of entries in thefactors and PORD has maximum number of FLOPS. Table 7 shows the performancecomparison of the MUMPS solver for the different ordering schemes in terms of theFLOPS and number of entries in factors for System 1. We can see from Table 8 thatAMF and AMD are again close to each other and AMD performs the minimum num-



AMD 799874 32.03

AMF 791450 29.61

PORD 916946 33.50

METIS 930874 32.61

QAMD 799874 32.03


AMD 272302 5.28

AMF 278082 5.61

PORD 303278 8.06

METIS 322964 8.58

QAMD 272302 5.28

Table 8 Performancecomparison for systems inTable 1

System MUMPS UMFPACK MA48

(seconds) (seconds) (seconds)

Bench 0.25 0.390 0.48

System 1 0.25 0.422 0.587

System 2 0.5 0.891 3.297

System 3 0.656 1.09 3.659

System 4 1.25 2.215 5.297

ber of FLOPS and has minimum number of entries in the factors whereas METIS isworst in performance both in term of FLOPS and number of entries in factors.

In further analysis only the result based on AMF or AMD ordering (dependingon whichever gives the best result) for MUMPS is presented. Table 8 shows the per-formance results for the systems in Table 1 with the MUMPS, UMFPACK and theMA48 solver in terms of the computational time for the solution of the system oflinear equations arising in the load flow solution, for one iteration. The time shown isthe total time for the four phases in seconds. For the first four systems in Table 1 (forsmall test systems) all the solvers take almost the same time and there is no computa-tional gain The time is in the order of milliseconds and less and hence the results arenot shown for these systems However for large practical systems different algorithmshave large impacts on the computational time for the load flow solution.

Table 9 gives the speed-up comparison of the three solvers. Column 2 is the speed-up comparison between UMFPACK and MUMPS. Column 3 is the speed-up com-parison between MA48 and MUMPS and column 4 is the speed-up comparison be-tween MA48 and UMFPACK. MUMPS is almost 1.78 times faster than UMFPACKfor load flow solutions and UMFPACK is almost 1.23–3.7 times faster than MA48.MUMPS is almost 6 times faster than MA48 for large practical systems. The results


Table 9 Speed-up comparison for systems in Table 1

System UMFPACK /MUMPS MA48/MUMPS MA48/UMFPACK

Bench 1.56 1.92 1.231

System 1 1.688 2.348 1.391

System 2 1.782 6.594 3.700

System 3 1.662 5.577744 3.357

System 4 1.772 4.2376 2.391

Table 10 Computational timein seconds for numeric andsolution phases for systems inTable 1

System Numeric Solve Total

Bench 0.124 0.015 0.139

System 1 0.125 0.031 0.156

System 2 0.250 0.031 0.281

System 3 0.329 0.047 0.376

System 4 0.641 0.094 0.735

are presented for each iteration and so the overall saving for each load flow solutionand hence for the entire load flow solutions in a day would be huge by implementingoptimized MUMPS solver.

Experiments were also done with UMFPACK and MA48 regarding memory, num-ber of entries in factors and FLOP counts but are not presented because the overallcomputational time for both these solvers is much more than MUMPS. To the op-erator or the power system engineer the wall clock time is the first most importantfactor.

Thus MUMPS offers computational efficiency over UMFPACK and MA48 for theproblem in context. It can be seen as a highly feasible tool both for the on-line oper-ational context and for the planning studies. It offers tight interface with a number ofexternal orderings and a number of in-built orderings. It also has a capability to auto-matically select the best ordering amongst the in-built orderings and external coupledorderings for the particular problem at hand. This would be especially important asthe structure of the power system changes. From this study AMD seems to performthe best.

The above results were presented for one iteration and therefore all the four phasesof the sparse linear solver namely the ordering, symbolic factorization, numeric fac-torization and the solve phases were performed. However for successive iterationsthe first two phases are not required to be performed as the structure of the Jacobianremains same, unless the topology of the power system changes. Also if we use theapproximate Jacobian for the next iteration by not updating it, the numeric phasecan also be avoided and only solution phase is required which would result in hugesaving. However the number of iterations would slightly increase for convergence.

In Table 10, column 2 shows the time in seconds for the numeric phase alone withthe MUMPS solver. Column 3 shows the time in seconds for the solve phase alone


Table 11 Comparison ofnumeric and solution phasestimes with total computationaltime as in Table 8

System MUMPS/Numeric MUMPS /Solve MUMPS /Total

Bench 2.016 8.267 1.799

System 1 2 4.032 1.603

System 2 2 8.064 1.779

System 3 1.994 7 1.745

System 4 1.950 6.819 1.701

Table 12 Convergencecharacteristics and robustness tonon typical network parameters

R/X 1 2 4 8

Iteration 2 2 2 2

with the MUMPS solver. Column 4 is the sum of the numeric and the solve phasetimes in seconds with the MUMPS solver.

Table 11 compares the total computational time as given in Table 8 with the in-dividual numeric phase time, solve phase time and the combined numeric and solvephase time as given in Table 10.

Column 2 is the ratio of the total computational time by MUMPS for all the fourphases as compared to the time for the numeric phase alone. It can be seen thatnumeric phase takes almost half (50%) of the total time. Column 3 is the ratio ofthe total computational time by MUMPS for all the four phases as compared to thetime for the solve phase alone. Solve phase is almost 8 times faster as compared tothe overall time. When possible just performing the solve phase would result in hugecomputational gain. Column 4 is the ratio of the total computational time by MUMPSfor all the four phases as compared to the time taken for combined numeric and solvesphases. It is observed that the ratio is almost 1.7 and hence all the following iterationswould be 1.7 faster than the first one for Jacobian with similar structural symmetry.

Apart from the high speed of the software package, that there are other factorswhich are very important with respect to the quality of the power flow software.In keeping with our main focus of the paper (to enhance computational speed) it wasassumed that a good load flow package would have good convergence features, abilityto model tap changers and resistance for non typical network parameters. Thus ourgoal in the paper was to enhance speed assuming that the above factors are satisfied.

At the same time our developed research grade load flow software package hasthe ability to model tap changers and deal with high R/X network parameters. Thisis further confirmed by a simulation on a sample system from DigSilent [37]. Thesystem is shown in Fig. 1. The 11 bus, 7 transformer system was subjected to differentR/X ratios. The ratio was increased from the original value to up to 8 times to ensurehigh R/X ratio. In all the cases the system converged in 2 iterations. The results arereported in Table 12. The first row corresponds to the R/X ratios where 1 refers tothe original ratio and 2, 4 and 8 refers to the number of times of the original valueof R/X. The second column refers to the number of iterations it took to convergewith different R/X ratios. The results were compared with the DigSilent softwareand were found to be exactly the same. The results establishes good convergence androbustness characteristics. Convergence characteristics were further compared withPSS/e and found comparable for the nine systems reported in the paper.


Fig

.1Sa

mpl

esy

stem

from

Dig

Sile

nt1

[37]

for

sim

ulat

ing

high

R/X


6 Conclusions

Very fast on-line computational capability for real-time security assessment of dis-turbances and proposed corrective actions is an important attribute of power sys-tem analysis. Of all the power system analysis tools, load flow solutions are themost routinely done, both for operation and in planning. NR load flow solutionsare most widely accepted but suffer from huge LU factorization computational bur-den. Our particular interest is for on-line deployment of load flow solution basedon full NR, and multifrontal methods based solvers are proposed for the linear sys-tem sub-problem, to gain computational efficiency. This paper proposes the applica-tion of the state-of-the-art algorithms for ordering and preprocessing, unsymmetricmultifrontal method for LU factorization and highly optimized Intel® Math Ker-nel Library BLAS. Two state-of-the-art multifrontal algorithms UMFPACK V5.2.0and sequential MUMPS 4.8.3 (“Multifrontal Massively Parallel Solver”) are imple-mented and their performances are thoroughly investigated for power systems of allpractical sizes. The multifrontal methods are compared against state-of-the-art sparseGaussian elimination solver MA48 from the HSL library. This paper also presentsa thorough investigation of the impact of different ordering schemes on the perfor-mance of MUMPS, with respect to computational time, memory, number of entriesin the factors and the floating point operations, for the problem of load flow solution.The proposed multifrontal methods (MUMPS and UMFPACK) with appropriate or-derings, optimized BLAS and intelligent factorization phases, are highly appealingfor enhancing computational efficiency of power system load flow simulation espe-cially as the system size grows. The computational speed is a main concern, togetherwith data measurement, network processing, and modeling in security assessment.Decreasing computational time for any heavily used tool is advantageous in increas-ing the operators’ ability to take quick, apt and accurate action and analyze morescenarios. Our particular interest is for on-line deployment of load flow solutionbased on full Newton Raphson, and in this context, multifrontal methods, in par-ticular MUMPS, may be viewed as a fundamentally important enabling technology.

References

1. Powell, L.: Power System Load Flow Analysis. McGraw-Hill, New York (2004)2. Khaitan, S., McCalley, J., Chen, Q.: Multifrontal solver for online power system time domain simu-

lation. IEEE Trans. Power Syst. 23, 4 (2008)3. Orfanogianni, T., Bacher, R.: Using automatic code differentiation in power flow algorithms. IEEE

Trans. Power Syst. 14, 1 (1999)4. Davis, T., Duf, I.: UMFPACK version 2.0: unsymmetric-pattern multifrontal package (1995). See:

http://www.cis.ufl.edu/-davis5. Davis, T., Duff, I.: A combined unifrontal/multifrontal method for unsymmetric sparse matrices. ACM

Trans. Math. Softw. 25, 1–19 (1997)6. Davis, T.: Algorithm 832: UMFPACK—an unsymmetric-pattern multifrontal method. ACM Trans.

Math. Softw. 30, 196–199 (2004)7. Davis, T.: A column pre-ordering strategy for the unsymmetric-pattern multi-frontal method. ACM

Trans. Math. Softw. 30, 165–195 (2004)8. Davis, T., Amestoy, P., Duff, I.: Algorithm 837: AMD, an approximate minimum degree ordering

algorithm. ACM Trans. Math. Softw. 30, 381–388 (2004)

http://www.cis.ufl.edu/-davis


9. Davis, T., Gilbert, J., Larimore, E.: Algorithm 836: COLAMD, an approximate column minimumdegree ordering algorithm. ACM Trans. Math. Softw. 30, 377–380 (2004)

10. Ashcraft, C., Grimes, R.: The influence of relaxed supernode partitions on the multifrontal method.ACM Trans. Math. Softw. 15, 291–309 (1989)

11. Heath, M., Raghavan, P.: A Cartesian parallel nested dissection algorithm. SIAM J. Matrix Anal.Appl. 16, 235–253 (1995)

12. Gupta, A., Gustavson, F., Joshi, M., Karypis, G., Kumar, V.: PSPASES: an efficient and parallel sparsedirect solver. In: Yang, T. (ed.) Kluwer Int. Series in Engineering and Science, vol. 515. KluwerAcademic, Dordrecht (1999)

13. Johnson, J., Vachranukunkiet, P., Tiwari, S., Nagvajara, P., Nwankpa, C.: Performance analysis ofloadflow computation using FPGA. In: Proc. of 15th Power Systems Computation Conference 2005

14. Zaoui, F., Fliscounakis, S.: A direct approach for the security constrained optimal power flow problem.In: Power Systems Conference and Exposition 2006

15. Amestoy, P., Duff, I.: Vectorization of a multiprocessor multifrontal code. Int. J. Supercomput. Appl.3, 41–59 (1989)

16. Amestoy, P., Duff, I., L’Excellent, J., Koster, J.: A fully asynchronous multifrontal solver using dis-tributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23, 15–41 (2001)

17. Amestoy, P., Puglisi, C.: An unsymmetrized multifrontal LU factorization. SIAM J. Matrix Anal.Appl. 24, 553–569 (2002)

18. Amestoy, P., Duff, I., Puglisi, C.: Multifrontal QR factorization in a multiprocessor environment.Numer. Linear Algebra Appl. 3, 275–300 (1996)

19. Duff, I., Reid, J.: The multifrontal solution of indefinite sparse symmetric linear systems. ACM Trans.Math. Softw. 9, 302–325 (1983)

20. Duff, I., Reid, J.: The multifrontal solution of unsymmetric sets of linear systems. SIAM J. Sci. Stat.Comput. 5, 633–641 (1984)

21. Guermouche, A., L’Excellent, J.-Y.: Constructing memory-minimizing schedules for multifrontalmethods. ACM Trans. Math. Softw. 32(1), 17–32 (2006)

22. Karypis, G., Kumar, V.: METIS—A Software Package for Partitioning Unstructured Graphs, Parti-tioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices—Version 4.0. Universityof Minnesota (1998)

23. Schulze, J.: Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods.BIT 41(4), 800–841 (2001)

24. Pellegrini, F.: SCOTCH 5.0 User’s guide. Technical Report, LaBRI, Université Bordeaux I (August2007)

25. Amestoy, P.R., Davis, T.A., Duff, I.S.: An approximate minimum degree ordering algorithm. SIAMJ. Matrix Anal. Appl. 17, 886–905 (1996)

26. Richardy, G.: Coupling MUMPS and ordering software. CERFACS report WN/PA/02/24 (January2002)

27. Dongarra, J., Du Croz, J., Hammarling, S.: A set of level 3 basic linear algebra subprograms. ACMTrans. Math. Softw. 16, 1–17 (1990)

28. Duff, I.S., Reid, J.K.: The design of MA48, a code for the direct solution of sparse unsymmetric linearsystems of equations. ACM Trans. Math. Softw. 22, 187–226 (1996)

29. Li, X.: Direct solvers for sparse matrices. Available online http://crd.lbl.gov/~xiaoye/SuperLU/SparseDirectSurvey.pdf (2006)

30. Liu, J.: The multifrontal method for sparse matrix solution: theory and practice. SIAM Rev. 34, 82–109 (1992)

31. Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI: The Complete Reference.MIT Press, Cambridge (1996)

32. Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammar-ling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide.SIAM Press, Philadelphia (1997)

33. Gupta, A.: Recent advances in direct methods for solving unsymmetric sparse systems of linear equa-tions. IBM Research Report, RC 22039 (98933) April 20 (2001)

34. The reliability test system task force of the application of probability methods subcommittee. IEEEreliability test system. IEEE Trans. Power Apparatus Syst. PAS-98, 2047–2045 (1979)

35. The reliability test system task force of the application of probability methods subcommittee. TheIEEE reliability test system—1996. IEEE Trans. Power Syst. 14(3), 1010–1018 (1999)

36. http://www.powerworld.com/DocumentLibrary.asp

http://crd.lbl.gov/~xiaoye/SuperLU/SparseDirectSurvey.pdf

http://crd.lbl.gov/~xiaoye/SuperLU/SparseDirectSurvey.pdf

http://www.powerworld.com/DocumentLibrary.asp


37. http://www.digsilent.de/38. Nanda, J., Lai, L.L., Ma, J.T., Rajkumar, N., Nanda, A., Prasad, M.: A novel approach to compu-

tational efficient algorithms for transmission loss and line flow formulations. Int. J. Electr. PowerEnergy Syst. 555–560 (1999)

39. Li, M., Zhao, Q., Luh, P.B.: Decoupled load flow and its feasibility in systems with dynamic topology.In: PES’09, 26–30 July 2009, pp. 1–8 (2009)

40. Jean-Jumeau, R., Chiang, H.-D.: Parameterizations of the load-flow equations for eliminating ill-conditioning load flow solutions. IEEE Trans. Power Syst. 3(3), 1004–1012 (1993)

41. Dasgupta, K., Swarup, K.S.: Distributed fast decoupled load flow analysis. In: POWERCON, pp. 1–6(2008)

42. Nanda, J., Bijwe, P.R., Henry, J., Bapi Raju, V.: General purpose fast decoupled power flow. IEE Proc.Gen. Trans. Dist. 139(2), 87–92 (1992)

http://www.digsilent.de/

Numerical methods for on-line power system load flow analysis ...

Documents