Lecture 9. Linear Solveraice.sjtu.edu.cn/msda/data/courseware/SPICE/lect09... · Let y = Ux. 2. Solve y from Ly = b 3. Solve x from Ux = y The task of L & U factorization is to find

2010-11-15 Slide 1

PRINCIPLES OF CIRCUIT SIMULAITONPRINCIPLES OF CIRCUIT SIMULAITON

Lecture 9. Lecture 9. Linear Solver:Linear Solver:

LU Solver and Sparse MatrixLU Solver and Sparse Matrix

Guoyong Shi, [email protected] of Microelectronics

Shanghai Jiao Tong UniversityFall 2010

2010-11-15 Lecture 9 slide 2

OutlineOutlinePart 1:• Gaussian Elimination• LU Factorization• Pivoting• Doolittle method and Crout’s Method• SummaryPart 2: Sparse Matrix


MotivationMotivation• Either in Sparse Tableau Analysis (STA) or in

Modified Nodal Analysis (MNA), we have to solve linear system of equations: Ax = b

0 0 00 0

0

T

i v

A iI A v

K K e S

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟− =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎝ ⎠ 2 2

1 3 3 1

52

3 4 3

1 1 10

1 1 1 S

G GR R R e

IeR R R

⎡ ⎤+ + − −⎢ ⎥ ⎛ ⎞⎛ ⎞⎢ ⎥ = ⎜ ⎟⎜ ⎟⎢ ⎥ ⎝ ⎠ ⎝ ⎠− +⎢ ⎥⎣ ⎦


MotivationMotivation• Even in nonlinear circuit analysis, after

"linearization", again one has to solve a system of linear equations: Ax = b.

• Many other engineering problems require solving a system of linear equations.

• Typically, matrix size is of 1000s to millions. • This needs to be solved 1000 to million times

for one simulation cycle.• That's why we'd like to have very efficient

linear solvers!


Problem DescriptionProblem DescriptionProblem:

Solve Ax = bA: nxn (real, non-singular), x: nx1, b: nx1

Methods:– Direct Methods (this lecture)

Gaussian Elimination, LU Decomposition, Crout– Indirect, Iterative Methods (another lecture)

Gauss-Jacobi, Gauss-Seidel, Successive Over Relaxation (SOR), Krylov


Gaussian Elimination Gaussian Elimination ---- ExampleExample2 5

2 4x y

x y+ =⎧

⎨ + =⎩

2 1 51 2 4⎛ ⎞⎜ ⎟⎝ ⎠

12

−2 1 5

3 302 2

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

21

xy=⎧

⎨ =⎩

1 0 2 12 11 31 2 1 02 2

⎛ ⎞⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

LU factorization


Use of LU FactorizationUse of LU Factorization

1 0 2 12 1 51 31 2 41 02 2

x xy y

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

2 1: 30

2

u xv y

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠1 0 51 412

uv

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

Define

532

uv

⎛ ⎞⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Triangle systems are easy to solve (by back-substitution.)

Solve

L U

Solving the L-system:

L


Use of LU FactorizationUse of LU Factorization

2 1302

x uy v

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

1 0 51 412

uv

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

532

uv

⎛ ⎞⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

2 1 53 302 2

xy

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

21

xy⎛ ⎞ ⎛ ⎞

=⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Solving the U-system:

UU


LU FactorizationLU Factorization

( )Ax L Ux Ly b= = =

1. Let y = Ux. 2. Solve y from Ly = b3. Solve x from Ux = y

The task of L & U factorization is to find the elements in matrices L and U.

== LUA LU


Advantages of LU FactorizationAdvantages of LU Factorization• When solving Ax = b for multiple b, but the

same A, then we only LU-factorize A only once.

• In circuit simulation, entries of A may change, but structure of A does not alter. – This factor can used to speed up repeated LU-

factorization.– Implemented as symbolic factorization in the

“sparse1.3” solver in Spice 3f4.


Gaussian EliminationGaussian Elimination• Gaussian elimination is a process of row

transformation

Ax = b11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn

a a a aa a a aa a a a

a a a a

1

2

3

n

bbb

b

Eliminate the lower triangular part


Gaussian EliminationGaussian Elimination

11 12 13 1

22 23 2

32 33 3

2 3

00

0

n

n

n

n n nn

a a a aa a aa a a

a a a

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

1

2

3

n

bbb

b

11 0a ≠

1

2

3

n

bbb

b

1

11

iaa

−

Entries updated


Eliminating 1Eliminating 1stst ColumnColumn• Column elimination is equiv to row transformation

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

2 1

1 1

3 1

1 1

1

1 1

1 0 0 0

1 0 0

0 1 0

0 0 1n

aaaa

aa

⎡ ⎤⎢ ⎥⎢ ⎥−⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥

⎣ ⎦

X

L1-1 A = A(2)

11 12 13 1(2) (2) (2)22 23 2(2) (2) (2)32 33 3

(2) (2) (2)2 3

0

0

0

n

n

n

n n nn

a a a a

a a a

a a a

a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1

2

3

n

bbb

b

1(2)2(2)3

(2)n

b

b

b

b


11 12 13 1(2) (2) (2)22 23 2

(3) (3)33 3

(3) (3)3

0

0 0

0 0

n

n

n

n nn

a a a a

a a a

a a

a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1(2) (2) (2)22 23 2(2) (2) (2)32 33 3

(2) (2) (2)2 3

0

0

0

n

n

n

n n nn

a a a a

a a a

a a a

a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Eliminating 2Eliminating 2ndnd ColumnColumn

( 2 )3 2( 2 )2 2

( 2 )2

( 2 )2 2

1 0 0 00 1 0 0

1 0

0 0 1n

aa

aa

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥

−⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥

−⎢ ⎥⎣ ⎦

X

L2-1 A(2) = A(3)

1(2)2(3)3

(3)n

b

b

b

b

(2)22 0a ≠


Continue on EliminationContinue on Elimination• Suppose all diagonals are nonzero

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1

2

3

n

bbb

b

Ln-1 Ln-1

-1 ••• L2-1 L1

-1

11 12 13 1(2) (2) (2)22 23 2

(3) (3)33 3

( )

0

0 0

0 0 0

n

n

n

nnn

a a a a

a a a

a a

a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1(2)2(3)3

( )nn

b

b

b

b

A(n) x = b(n)

Upper triangular

A(n)


Triangular SystemTriangular System

11 12 13 1 11(2) (2) (2) (2)22 23 2 22

(3) (3) (3)333 3 3

( ) ( )

0

0 0

0 0 0

n

n

n

n nnnn n

a a a a bxa a a bx

xa a b

xa b

⎡ ⎤ ⎛ ⎞⎛ ⎞⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ =⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟

⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎝ ⎠

Gaussian elimination ends up with the following upper triangular system of equations

Solve this system from bottom up: xn, xn-1, ..., x1


LU FactorizationLU Factorization• Gaussian elimination leads to LU factorization

( Ln-1 Ln-1

-1 ••• L2-1 L1

-1 ) A = U

A = (L1 L2 ••• Ln-1 Ln) U = LU

2 1

1 1( 2 )

3 1 3 2( 2 )

1 1 2 2

( 2 )1 2

( 2 )1 1 2 2

1 0 0 0

1 0 0

1 0

* 1n n

aa

a aa a

a aa a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1(2) (2) (2)22 23 2

(3) (3)33 3

( )

0

0 0

0 0 0

n

n

n

nnn

a a a a

a a a

a a

a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

L = (L1 L2 ••• Ln-1 Ln)


Complexity of LUComplexity of LU11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

11 0a ≠21

11

aa

−

31

11

aa

−

1

11

naa

−

# of mul / div = (n-1)*n ≈ n2

2 3

1

1 ( 1)(2 1) ( )6

n

i

i n n n O n=

= + +∑ ∼

n >> 1


Cost of BackCost of Back--SubstitutionSubstitution

11 12 13 1 11(2) (2) (2) (2)22 23 2 22

(3) (3) (3)333 3 3

( ) ( )

0

0 0

0 0 0

n

n

n

n nnnn n

a a a a bxa a a bx

xa a b

xa b

⎡ ⎤ ⎛ ⎞⎛ ⎞⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ =⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟

⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎝ ⎠

( )

( )

nn

n nnn

bxa

=( 1) ( 1)

1 1,1 ( 1)

1, 1

n nn n n n

n nn n

b a xx

a

− −− −

− −− −

−=

Total # of mul / div = 2

1

1 ( 1) ( )2

n

i

i n n O n=

= +∑ ∼


Zero DiagonalZero Diagonal

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

××××

−

−

××××

00000000

101100

011100

00000000

RR

RR

Example 1: After two steps of Gaussian elimination:

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

××××

−

××××

00000000

110000

011100

00000000

RR

Gaussian elimination cannot continue


PivotingPivotingSolution 1:Interchange rows to bring a non-zero element into position (k, k):

Solution 2: How about column exchange? YesThen the unknowns are re-ordered as well.

⎡ ⎤⎢ ⎥× ×⎢ ⎥⎢ ⎥× ×⎣ ⎦

0 1 100

⎡ ⎤⎢ ⎥× ×⎢ ⎥⎢ ⎥× ×⎣ ⎦

0 1 100

1 0 100

x xx x

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

00 1 1

0

x x

x x

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

In general both rows and columns can be exchanged!


Small DiagonalSmall DiagonalExample 2: ⎥

⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡

⎥⎥⎦

⎤

⎢⎢⎣

⎡ × −

7525.6

5.125.1225.11025.1

2

14

xx

⎥⎥⎦

⎤

⎢⎢⎣

⎡

×−=⎥

⎦

⎤⎢⎣

⎡

⎥⎥⎦

⎤

⎢⎢⎣

⎡

×−

× −

52

15

4

1025.6

25.6

1025.10

25.11025.1xx

⎪⎩

⎪⎨⎧

=+×

=− 25.6)25.1()1025.1(

5

214

2

xx

x01 =x

Assume finite arithmetic: 3-digit floating point, we have

12.5 rounded off

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡

⎥⎥⎦

⎤

⎢⎢⎣

⎡ × −

7525.6

5.125.1225.11025.1

2

14

xx

pivoting

510−

Unfortunately, (0, 5) is not the solution. Considering the 2nd

equation: 12.5 * 0 + 12.5 * 5 = 62.5 ≠ 75.


Accuracy Depends on PivotingAccuracy Depends on Pivoting

Reason:a11 (the pivot) is too small relative to the other numbers!

Solution: Don't choose small element to do elimination. Pick a large element by row / column interchanges.

Correct solution to 5 digit accuracy isx1 = 1.0001x2 = 5.0000

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡

⎥⎥⎦

⎤

⎢⎢⎣

⎡ × −

7525.6

5.125.1225.11025.1

2

14

xx

pivoting

510−


What causes accuracy problem?What causes accuracy problem?• Ill Conditioning: The A matrix close to singular• Round-off error: Relative magnitude too big

01

=−=+

yxyx

01.001.10=−

=−yx

yx 0=− yx

resulting in numerical errors

0=− yx

1=+ yx

ill-conditioned


Pivoting StrategiesPivoting Strategies1. Partial Pivoting

2. Complete Pivoting

3. Threshold Pivoting


Pivoting Strategy 1Pivoting Strategy 11. Partial Pivoting: (Row interchange only)

Choose r as the smallest integer such that:

)(

,...,

)( max kjk

nkj

krk aa

==

Search Area

U

L rk

k

Rows k to n


Pivoting Strategy 2Pivoting Strategy 2

2. Complete Pivoting: (Row and column interchange)Choose r and s as the smallest integer such that:

)(

,...,,...,

)( max kij

nkjnki

krs aa

==

=

Search Area

rows k to n;cols k to n

U

L r

s

k

k


Pivoting Strategy 3Pivoting Strategy 3

3. Threshold Pivoting:a. Apply partial pivoting only ifb. Apply complete pivoting only if

)()( krkp

kkk aa ε<

)()( krsp

kkk aa ε<

U

L r

s

)(

,...,

)( max kjk

nkj

krk aa

==

)(

,...,,...,

)( max kij

nkjnki

krs aa

==

=

Implemented in Spice 3f4

user specified


Variants of LU FactorizationVariants of LU Factorization• Doolittle Method• Crout Method• Motivated by directly filling in L/U elements in the

storage space of the original matrix "A".

== LUAL

U21

31 32

1 2 3

1 0 0 01 0 0

1 0

1n n n

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

22 23 2

33 3

00 0

0 0 0

n

n

n

nn

u u u uu u u

u u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Reuse the storage


Variants of LU FactorizationVariants of LU Factorization

== LUAL

U

21

31 32

1 2 3

1 0 0 01 0 0

1 0

1n n n

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

22 23 2

33 3

00 0

0 0 0

n

n

n

nn

u u u uu u u

u u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Reuse the storage

Hence we need a sequential method to process the rows and columns of A in certain order – processed rows / columns are not used in the later processing.


Doolittle Method Doolittle Method –– 1 1

21

31 32

1 2 3

1 0 0 01 0 0

1 0

1n n n

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

22 23 2

33 3

00 0

0 0 0

n

n

n

nn

u u u uu u u

u u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

( )11 12 13 1na a a a( )11 12 13 1nu u u u =

First solve the 1st row of U, i.e., U(1, :)

=

Keep this row



Then solve the 1st column of L, i.e., L(2:n, 1)

11 12 13 1

22 23 2

33 3

00 0

0 0 0

n

n

n

nn

u u u uu u u

u u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

21

31 32

1 2 3

1 0 0 01 0 0

1 0

1n n n

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

=

21

31

1n

aa

a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

21

3111

1n

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

=

11 11u a=



21

31 32

1 2 3

1 0 0 01 0 0

1 0

1n n n

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

22 23 2

33 3

00 0

0 0 0

n

n

n

nn

u u u uu u u

u u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

( )22 23 2na a a

( ) ( )21 12 13 1 22 23 2n nu u u u u u+

=

Solve the 2nd row of U, i.e., U(2, 2:n)

=

(1)

(2)

(3)



=UL \ 135

2 4 6

21

31 32

1 2 3

1 0 0 01 0 0

1 0

1n n n

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

11 12 13 1

22 23 2

33 3

00 0

0 0 0

n

n

n

nn

u u u uu u u

u u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

The computation order of the Doolittle Method:

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

n

n

n

n n n nn


a a a a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

=


CroutCrout MethodMethod• Similar to the Doolittle Method, but starts from the 1st

column (Doolittle starts from the 1st row.)

=UL \ 246

1 3 5

11

21 22

31 32 33

1 2 3

0 0 00 0

0

n n n nn

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

The diagonals of U are normalized !

== LUAL

U

The computation order of the Crout Method:

12 13 1

23 2

3

10 10 0 1

0 0 0 1

n

n

n

u u uu u

u

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦


Storage of LU FactorizationStorage of LU Factorization

• In sparse matrix implementation, this type of storage requires increasing memory space because of fill-ins during the factorization.

)3(A

U

L

U

L )4(A

Using only one 2-dimensional array !


SummarySummary• LU factorization has been used in virtually all

circuit simulators – Good for multiple RHS and sensitivity calculation

• Pivoting is required to handle zero diagonals and to improve numerical accuracy– Partial pivoting (row exchange): tradeoff between

accuracy and efficiency– Matrix condition number is used to analyze the

effect of round-off errors and numerical stability

2010-11-15 Slide 38

PRINCIPLES OF CIRCUIT SIMULAITONPRINCIPLES OF CIRCUIT SIMULAITON

Part 2. Part 2. Programming Techniques Programming Techniques

for Sparse Matrices for Sparse Matrices


OutlineOutline• Why Sparse Matrix Techniques?• Sparse Matrix Data Structure• Markowitz Pivoting• Diagonal Pivoting for MNA Matrices• Modified Markowitz pivoting• How to Handle Sparse RHS• Summary


Why Sparse Matrix?Why Sparse Matrix?

Motivation:– n = 103 equations– Complexity of Gaussian elimination ~ O(n3) – n = 103 ~ 109 flops operations

(1 GHz computer) 10 secstorage 106 words

Exploiting Sparsity– MNA 3 nonzeros / row– Can reach complexity for Gaussian elimination

• ~ O(n1.1) – O(n1.5) (Empirical complexity)


Sparse Matrix ProgrammingSparse Matrix Programming• Use linked-list data structure

– to avoid storing zeros – used to be hard before 1980s: in Fortran!

• Avoid trivial operations 0x = 0, 0+x = x• Two kinds of zero

– Structural zeros – always 0 independent of numerical operations

– Numerical zeros – resulting from computation• Avoid losing sparsity (very important!)

– sparsity changes with pivoting


NonNon--zero Fillzero Fill--insins• Gaussian elimination causes nonzero fill-ins

3 4 5 0 6 0 0 1 2 8

5 0 -3 2 6 1 2 7 0 03 0 0 0 -1 0 0 3 -2 4

0 4 8 0 10 0 0 2 2 31 5 4 0 0 0 0 1 0 6

3 4 5 0 6 0 0 1 2 8

5 0 -3 2 6 1 2 7 0 00 -4 -5 0 -7 0 0 2 -4 -4

0 4 8 0 10 0 0 2 2 31 5 4 0 0 0 0 1 0 6

-1

fill-ins


How to Maintain How to Maintain SparsitySparsity• One should choose appropriate pivoting (during

Gaussian Elimination, G.E.) to avoid large increment of fill-ins.

x x x xx x x x

x x x xx x x x

x x x xx x x x x x x x x x x

fill-ins

after GEbefore GE

x x x x x x x x x x x xx x x x x xx x x x x xx x x x x xx x x x x xx x x x x x

before GE after GE

x

xx

xx

After row/colreordering no fill-ins

introduced


Markowitz CriterionMarkowitz Criterion• Markowitz criterion

– kth pivot; – A(k) is the reduced matrix– NZ = nonzero– The num of nonzeros in a row (column) is also called the row

(column) degree.– The column degrees can be used for column ordering.

A(k)

# NZ this col (column degrees)

# NZ this row

(row degrees)

x x x x 4xx xx xx 3

x x 2xx xx 2

x x 23 3 2 2 3 ci \ ri

If chosen for pivoting


Markowitz ProductMarkowitz Product• If Gaussian Elimination to pivot on (i, j)• Markowitz product = (ri –1)(cj-1)

= maximum possible number of fill-ins if pivoting at (i, j)

• Recommendations: (implemented in Sparse1.3)– Best with largest magnitude of pivot element and

smallest Markowitz product– Try threshold test after choosing smallest

Markowitz product (M.P.)– Break ties (if equal M.P.) by choosing element with

largest magnitude


Sparse Matrix Data StructureSparse Matrix Data StructureExample Matrix

r \ c 1 2 31.2 0

01.7

1.50

1

02.1

1

23

struct elem{real value;int row;int col;struct elem *next_in_row;struct elem *next_in_col;

} Element;

Matrix Element structure


Data Structure in Sparse 1.3Data Structure in Sparse 1.3• Sparse 1.3 – Written by Ken Kundert, 1985~1988, then PhD

student at Berkeley, later with Cadence Design Systems, Inc.

2.1 3 1

value row col

FirstInRow[1] 1.0 1 1 1.2 1 2diag[1]

FirstInRow[2] 1.5 2 2diag[2]

FirstInRow[3] 2.1 3 1 1.7 3 3diag[3]

FirstInCol[1] FirstInCol[2] FirstInCol[3]


ASTAP Data StructureASTAP Data Structure• ASTAP is an IBM simulator using STA (Sparse

Tableau Analysis).

1.70

03

01.5

1.22

2.10

11

32

1r \ c

Row Pointers point to the beginning of Col Indices.

Nonzeros in the same row are indexed by their col indexes continuously.

Used by many iterative sparse solvers

values stored row-wise

Row Pointers 1 3 4 6

Col Indices 1 2 2 1 3

Values 1.0 1.2 1.5 2.1 1.7

1 2 3 4 ...-1

-1


Key Loops in a SPICE ProgramKey Loops in a SPICE Program

+ + +

−+ + +

=

= + ⋅ +

∂⎡ ⎤ ⎡ ⎤= − +⎢ ⎥ ⎣ ⎦∂⎣ ⎦

1 1 1

( ) ( ) ( 1)1 1 1

( , )

( , )n n n n

k k kn n n

dxC f x tdt

Cx Cx h f x tfCx x xx

( )−+ +∂

=∂

( 1)1 1,k

n nf x tA

x

Newton-Raphson(at point x)

Invoke linear solver

x := x + x := x + ΔΔxx

Update stamps related to time

t := t + t := t + ΔΔtt


Linear Solves in SimulationLinear Solves in Simulation

time t

time points

At each time point, Ax = b has to be solved for many times

Newton-Raphson(at point x)

Invoke linear solver

x := x + x := x + ΔΔxx

Update stamps related to time

t := t + t := t + ΔΔtt


Structure of Matrix StampsStructure of Matrix Stamps• In circuit simulation, matrix being solved

repeatedly is of the same structur;• only some entries vary at different frequency

or time points.

C = ConstantT = Time varyingX = Nonlinear (varying even at the same time point)

Typical matrix structure

XX X

XX

T CT

A = T CT

C CC


Strategies for EfficiencyStrategies for Efficiency• Utilizing the structural information can greatly

improve the solving efficiency.

• Strategies:– Weighted Markowitz Product– Reuse the LU factorization– Iterative solver (by conditioning)– ...


A Good (Sparse) LU SolverA Good (Sparse) LU SolverProperties of a good LU solver:• Should have a good column ordering algorithm.• With a good column ordering, partial (row)

pivoting would be enough !• Should have an ordering/elimination separated

design:– i.e., ordering is separated from elimination. – SuperLU does this, – but Sparse1.3 doesn’t.


Optimal Ordering is NPOptimal Ordering is NP--hardhard• The ordering has a significant impact on the

memory and computational requirements for the latter stages.

• However, finding the optimal ordering for A(in the sense of minimizing fill-in) has been proven to be NP-complete.

• Heuristics must be used for all but simple (or specially structured) cases.

M.R. Garey and D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-CompletenessW.H. Freeman, New York, 1979.


Column OrderingColumn OrderingWhy Important ?• A good column ordering greatly reduces the

number of fill-ins, resulting in a vast speedup.• However, searching a pivot with minimum

degree at each step (in Sparse 1.3) is not efficient.

• Best to get a good ordering before elimination (e.g. SuperLU), but not easy!


Available Ordering AlgorithmsAvailable Ordering AlgorithmsSuperLU uses the following algorithms:

• Multiple Minimum Degree (MMD) applied to the structure of (ATA). – Mostly good

• Multiple Minimum Degree (MMD) applied to the structure of (AT+A). – Mostly good

• Column Approximate Minimum Degree (COLAMD). – Mostly not good!


SummarySummary• Exploiting sparsity reduces CPU time and

memory• Markowitz algorithm reflects a good tradeoff

between overhead (computation of MP) and savings (less fill-ins)

• Use weighted Markowitz to account for different types of element stamps in nonlinear dynamic circuit simulation

• Consider sparse RHS and selective unknowns for speedup


NoNo--turnturn--in Exercisein Exercise• Spice3f4 contains a solver called Sparse 1.3 (in

src/lib/sparse)• This is a independent solver that can be used outside

Spice3f4.• Download the sparse package from the course web

page (sparse.tar.gz) (or ask TA).• Find the test program called "spTest.c".• Modify this program if necessary so that you can run

the solver.• Create some test matrices to test the sparse solver.• Compare the solved results to that by MATLAB.


SoftwareSoftware• Sparse1.3 is in C and was programmed by Dr. Ken

Kundert (fellow of Cadence; architect of Spectre). • Source code is available from

http://www.netlib.org/sparse/• SparseLib++ is in C++ and comes from NIST. The

authors are J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington.

• See``A Sparse Matrix Library in C++ for High Performance Architectures”, Proc. of the Second Object Oriented Numerics Conference, pp. 214-218, 1994.

• The paper and the C++ source code are available from http://math.nist.gov/sparselib%2b%2b/

http://math.nist.gov/sparselib%2b%2b/


ReferencesReferences1. G. Dahlquist and A. Bjorck, Numerical Methods

(translated by N. Anderson), Prentice Hall, Inc. Englewood Cliffs, New Jersey, 1974.

2. W. J. McCalla, Fundamentals of Computer-Aided Circuit Simulation, Kluwer Academic Publishers. 1. Chapter 3, “Sparse Matrix Methods”

3. Albert Ruehli (Ed.), “Circuit Analysis, Simulation and Design”, North-Holland, 1986. 1. K. Kundert, “Sparse Matrix Techniques”

4. J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington, ``A Sparse Matrix Library in C++ for High Performance Architectures,” Proc. of the Second Object Oriented Numerics Conference, pp. 214-218, 1994.

Lecture 9. Linear Solveraice.sjtu.edu.cn/msda/data/courseware/SPICE/lect09... · Let y = Ux. 2. Solve y from Ly = b 3. Solve x from Ux = y The task of L & U factorization is to find

Documents