Top Banner
1 Implementation of Polymorphic Matrix Inversion using Viva Arvind Sudarsanam, Dasu Aravind Utah State University
29

Implementation of Polymorphic Matrix Inversion using Viva

Jan 17, 2016

Download

Documents

wood

Implementation of Polymorphic Matrix Inversion using Viva. Arvind Sudarsanam, Dasu Aravind Utah State University. Overview. Problem definition Matrix inverse algorithm Types of Polymorphism Design Set-up Hardware design flow (For LU Decomposition) Results Conclusions. Problem Definition. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Implementation of Polymorphic Matrix Inversion using Viva

1

Implementation of Polymorphic Matrix Inversion using Viva

Arvind Sudarsanam, Dasu Aravind

Utah State University

Page 2: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1712/29

Overview

Problem definition Matrix inverse algorithm Types of Polymorphism Design Set-up Hardware design flow (For LU Decomposition) Results Conclusions

Page 3: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1713/29

Problem Definition Given a 2-D matrix, A[N][N],

A = A[1,1] A[1,2] A[1,3]…….. A[1,N]

A[2,1] A[2,2] A[2,3]…….. A[2,N]

A[3,1] A[3,2] A[3,3]…….. A[3,N]

.

A[N,1] A[N,2] A[N,3]…….. A[N,N]

Determine the Inverse matrix A-1, defined as

AxA-1 = I

Page 4: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1714/29

Algorithm flow

Step 1: LU Decomposition Matrix A is split into two triangular matrices,

L and U

For i = 1:N

For j = I+1:N

A(j,i) = A(j,i)/A(I,i));

A(j,(i+1):N) = A(j,(i+1):N) - A(j,i)*A(i,(i+1):N);

End For j

End For i

Page 5: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1715/29

Algorithm flow

Step 2: Inverse computation for triangular matrices L-1 and U-1 are computed using a variation of

Gaussian elimination

For i = 1:N

For j = i+1:N

Linv(j,i+1:N) = Linv(j,i+1:N) - L(j,i)* Linv(i,i+1:N);

End For j

End For i

Page 6: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1716/29

Algorithm flow

Step 3: Matrix multiplication L-1 and U-1 are multiplied together to generate A-1

For i = 1:N

For j = 1:N

Ainv[i,j] = Ainv[i,j] +U[i,k]*L[k,j]

End For j

End For i

Page 7: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1717/29

Types of Polymorphism

Following parameters can be varied for the input matrix: Data type – variable precision, signed/unsigned,

and float Information rate – Rate at which input arrives into,

and leaves the system (pipelining/parallelism) Order tensor – matrix size (16x16, 32x32 etc.)

Page 8: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1718/29

Polymorphism and Viva

Viva supports polymorphic hardware implementation, just as any software programming language.

A large library of polymorphic arithmetic, control and memory modules is available.

Page 9: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/1719/29

Data Type Polymorphism

Poly-morphi

c

Page 10: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17110/29

Information Rate Polymorphism

Clock speed can be changed based on the input data rate

This ‘Mul’ unit is a Truly polymorphic object. Based on the input list size, the Viva compiler will generate the required number of parallel multiplier units. The number of parallel

units will be denoted as ‘K’

Page 11: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17111/29

Order Tensor Polymorphism

Value of ‘N’ set at run time

Page 12: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17112/29

Design Flow – Top level block diagram

Central Control

Unit(CCGU)

Memory Unit for A

Memory Unit for L

Memory Unit for U

Memory Unit for L-1

Memory Unit

for U-1

Memory Unit

for A-1

LUDecompose

Loop Unit

Inverse ofL

Loop Unit

Inverse ofU

Loop Unit

U-1X L-1

Loop Unit

From Files

Page 13: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17113/29

Design FlowMain Steps Operation Sub Steps Sub Module

1 Initialize 0 Generate address

1 Write A onto BRAM

2 LU Decompose 0 Generate ‘i’, ‘j’, ‘k’

1 Read A[j,i], A[j()]…

2 Compute new A[j,()]

3 Write A[j,()],A[j,i]

3 A2LU Convert 0 Generate ‘j’,’k’

1 Read A[j,()]

2 Compute L[j,()], U[j()]

3 Write L[j()], U[j()]

Page 14: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17114/29

Design FlowMain Steps Operation Sub Steps Sub Module

4 L inverse 0 Generate ‘i’,‘j’, ‘k’

1 Read L[j()],L-1[j()]..

2 Compute new L-1[j()]

3 Write L-1[j,()]

5 U inverse 0 Generate ‘i’, ‘j’, ‘k’

1 Read U[j,()],U-1[j,()]..

2 Compute U-1[j,()]

3 Write U-1[j,()]

6 A inverse 0 Generate ‘i’, ‘j’, ‘k’

1 Read L[I,()], U[j,()]

2 Compute Ainv[i,j,()]

3 Update Ainv[i,j]

Page 15: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17115/29

Hardware Design Set-up

Hardware:

PE6 (Xilinx 2V6000 FPGA) of the Starbridge Hypercomputer, connected to an Intel x86 processor. (66 MHz / 33,768 Slices)

Software:

Viva 2.3, developed at Starbridge Systems

Page 16: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17116/29

Implementation – LU Decomposition

Loop Unit

i,j,k

Address Generation

Unit

MemoryUnit

A[j,()],A[i,()],

A[j,i], A[i,i]

ComputationUnit

i,j,k

A[j,()],

A[j,i]

Page 17: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17117/29

Loop Unit - FunctionalityGiven the order of the matrix ‘N’ and the parallelism to be supported ‘K’,

The following loop structure needs to be generated.

For i = 1 to N

For k = ((i-1)/K)*K to N+1-K in steps of K

For j = i to N

Generate(i,k,j);

End j

End k

End i

Page 18: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17118/29

Loop Unit - Architecture

A simple register-based implementation is shown. The overall latency is 2 Clock cycles.

Page 19: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17119/29

Memory Unit - DistributionA[1,1:8] A[2,9:16] A[1,17:24] A[1,25:32] …………

…..A[2,1:8] A[2,9:16] A[2,17:24]

A[3,1:8] A[3,9:16]

A[4,1:8] .

.

.

.

One Block

Page 20: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17120/29

Memory Unit - ArchitectureBRAM memories are used to store data internally. (Matrix is expected to fit into the BRAMs. Maximum value of N is 128)

There are ‘K’ [(NxN)/K]x(variable Data Size) individual BRAMs.

The ‘K’ values in each block in Matrix is distributed over the ‘K’ BRAMs. This results in a single clock access time for internal memory.

A[j] and A[j,i] will be fetched one after the other on every iteration.

The overall latency was found to be 3 clock cycles.

Page 21: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17121/29

Address Generation - FunctionalityInputs: i,j,k from the Loop Unit

Outputs: Address in the BRAM for the A[j,()] and A[i,()] blocks of data

Address in the BRAM of A[j,i] and A[i,i]

The computations have been organized in such a way that A[i,()] needs to be fetched only once for processing a complete column of blocks.

Thus, only one port is required to access both A[i,()] and A[j,()]

Page 22: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17122/29

Address Generation - Architecture

‘Shift’ used instead of multipliers: N,K assumed to be powers of 2. (Latency = 1 cc)

Page 23: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17123/29

Computation Units - FunctionalityInputs: - A[j,()] and A[i,()] blocks from BRAM unit

- A[j,i] and A[i,i] from the BRAM unit.

- Indices i,j,k from the loop unit.

Output: The modified A[j,()] block and the A[j,i] value.

Three steps are performed:

1. Modify A[i,()] based on the loop indices

2. Perform computations: Divide, Multiply, Subtract

3. Include A[j,i] on A[j,()] if required

Page 24: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17124/29

Computation Units – Architecture (K=8)

Page 25: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17125/29

Results for LUD – Slice Counts (N=16)

List Type Fix16 Fix32 Float

Size=4 1862 (8) 7305 (32) 5012 (12)

Size=8 3731 (16) 14472 (64) 9802 (24)

Size=16 7502 (32) 29018 (128) 19024 (48)

Number of ROM multipliers used shown in brackets.

Page 26: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17126/29

Results for LUD – Time Taken (in cycles)

List Type Fix16 Fix32 Float

Size=4 1212 1276 1232

Size=8 590 654 610

Size=16 279 343 299

Page 27: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17127/29

Time taken Vs Size of Matrix (Fix16, K = 8)

Size of the matrix Time taken (in cycles)

16x16 590

32x32 4348

64x64 33528

128x128 264688 (3970320 ns)

A ‘C’ code (N=128;Fix16) will take O(M*N3) time ~ 702545*M ns (where ‘M’ is number of cycles per iteration ~ 30) (On Intel Centrino 1.5GHz) ~ M/6 speed-up

Page 28: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17128/29

Conclusions

A polymorphic design for matrix inverse was implemented Data type - Float/Fix16/Fix32 Information rate (K) - 4/8/16 Order Tensor (N) – 16/32/64/128

Viva’s effectiveness in polymorphic implementation was evaluated.

Hardware design flow and Results were shown for LU Decomposition.

Page 29: Implementation of Polymorphic Matrix Inversion using Viva

Sudarsanam

MAPLD2005/17129/29

Lessons learned

Pseudo polymorphism Some of the polymorphic objects in the Viva library are

pseudo polymorphic. For e.g. floating point and fixed point implementations of adder unit.

Need for timing analysis tool It was difficult to compute the delays associated with each

block in the Viva library Fix32 Vs Float

The division unit in the Viva library is optimized for Floating point and not for fixed point (as shown in the results)