Final Presentation Annual project (Part A) Winter semester תשע"ב ( 2011 /12)

Final PresentationAnnual project (Part A)

Winter semester )2011/12תשע"ב (

Students: Dan Hofshi, Shai Shachrur Supervisor: Mony Orbach

INS/GPS navigation system using RPF

Implemented with Bluspec HDL.

Using Xilinx Virtex5 FPGA

Intro

1. Abstract 2. Algorithm Reminder 3. Previous projects background.4. Solution approaches5. Detailed information on the final

implementation. 6. summary

This project is a part of a continues effort to implement a RPF based navigation system in the laboratory of high speed digital systems at the Technion university. The project and the algorithm initially written by Professor Yaakov Oshman and Mark Koifman from the faculty of Aerospace.

Previous to our project, another group of students tested and simulated the algorithm in a C++ environment and verified the algorithm functionality [1]

Later on, a group of several students designed the algorithm blocks to work on several Altera FPGA simultaneously , as the hardware resources requirements was too much to meet a single FPGA capability.

Abstract

1. ^ ("Gps computer program", by Neta Galil and Moti Perets, Winter 2010) .

Reminder – The Algorithm Principle of operation

A visual demonstration of the particle filter navigation , excluding data correction process.

Measurement update

Previous projects information & conclusions

Retrieving information on Timing and location complexity for each of the algorithm blocks and parameters. (Data busses widths, Number of particles , mathematical implementation of certain blocks).

A particle filter implementation on a single FPGA require a fundamental thinking about the way you parallelize the algorithm or reducing mathematical complexity.

A particle filter project is too big to be designed by a single\group without a proper structural design in advance.

Location complexity requires an external memory use.

First Approach for solution

Trying to reduce mathematical complexity - Failed Algorithm Phase Quaternion Euler

Trigonometric calculation

Multipliers Trigonometric calculation

Multipliers

Initialization 8 12 2 0

Propagation 0 24 6 13

Measurement update

This phase is identical for both

State vector Revaluation

17 3

N-effective calculation

This phase is identical for both

Covariance matrix calculation

3 8 0 0

Re-sampling This phase is identical for both

Regularization 8 12 0 0

Re-Weight This phase is identical for both

First approach for solution

Indeed looks very convincing as 53*N multipliers & 11*N trigonometric calculations can be reduced only by using Euler angles through all the algorithm run. But with a close look at the algorithm calculations, you can notice many cases of singularity that can't be solved by Euler angles without leading the algorithm to diverge.

Thus we choose to continue the project with the current verified algorithm using Quaternion calculations.

Second approach for solutionFrom sequential to Parallel

implementation. Initialization: Creates a new Set of N particles

Propagation: using the INS data to propagate the particles in time

Measurement Update: Using the GPS data to give weight to each particle

Normalization: Normalize all particles weight to a total sum = 1

Covariance matrix calculation

Re-Sampling

Regularization

Effective number of particles check

Re-weight

Good

Bad

Routine operation

Data correction

State vector revaluation

To User

Second approach for solutionWith a proper parallelization of the

algorithm the sequential blocks number can be reduced from 9 to 5 with a real feasibility to be implemented on the desired single FPGA.

Tools and hardware

Starting from a point of view that Xilinx Virtex5 FPGA is our board for this tasks, we’ve defined the rest of the working tools.

Bluespec HDL . Bluespec GUI (Compiler, Simulator) DDR2 SDRAM external memory. XUPV5-110T development enviornment.

Project goals

Learning Bluespec and pointing the language advantages/drawbacks.

Design, Built & simulate the top level design of the complete Algorithm infrastructure allowing future design of each of the algorithm blocks by individual groups.

Well describing the future tasks to accomplish the project.

Why Bluespec

Bluespec language syntax corresponds to fit today's large scale digital system design methodologies, with a special respect to parallel design.

It is interesting new design methodology.

Introduction to Bluespec

Bluespec system verilog or in short, Bluespec, is a relatively new high level HDL language.Bluespec language is designed to provide a way to express high level hardware constructs in an easy and highly parameterized way. The language syntax enables you to concentrate on high level details of the design and to bring closer the way you think to the way you write. "methods" define an abstract, user defined, interface which can be translated into Verilog outputs and inputs, "rules" which define a group of abstract operations which can be translated into combinational logic.


Atomicity Bluespec rules is considered as an atomic

operation: meaning that once you fired a rule, the operation of the rule cannot be interrupted till the rule have finished its logic.


Methods

Rules

Rules

Methods

FIFO's, memory components, other submodules

Methods

Rules

Registers ,

Methods

FIFO's, memory components, other submodules

Registers ,

The Parallelized algorithm

Stage 5 – normalization using the same module as stage 2

Stage 1

Initialization. Sequentially randomizing N particle

according to GPS and INS data.

Only Write to main memory.

Normalization

Particle Memory

Stage 2

Propagation:Measurement update:State vector revaluation:The above 3 modules includes a sequential calculation

required a single particle at a time, Thus we can allow all the 3 modules to work in parallel.

Each particle first being propagated and than cascaded to the next two modules in parallel.

Measurement update rules works only when a GPS data is valid.

This stage already reads all N particles from the memory. Thus in order to save memory calls the measurement update module prepares the total weight for next stage.

1 ( , )i ik kX f X Ins

1 ( , )i ik kX f X Gps

( : 1 )i ik kS f X i N

MemoryPropagation

State Vector

Revaluation

Measurement Update

Stage 3

Normalization:Resampling:Covariance matrix:Covariance matrix square root:

The covariance matrix calculation process is too big to stand the time constrains when taking in sequence to normalization.

In any Case, Normalization module cascades the data to prepare the covariance matrix square root in parallel with re-sampling modules.

At the end of normalization, if the data correction process is irrelevant, the process stops and the data is flushed.

( , )i ik k totX f X W

1 ( : 0 )i mk kX f X m i

( )ik ki

S f X( )k kD f S

MemoryNormalization Resampling

Covariance calculation

Memory 1 Memory 2Matrix

Memory

Regularization:Reweight:Where is a randomized vector Regularization uses the pre-prepared data of the

covariance matrix square root and the resampling data and cascades the results to Rewight.

The same as in stage 2, in order to save memory call cycles, Reweight prepare the total weight for stage 5, Normalization.

Stage 4

1 ( , , )i ik k kX f X D R

PPPPPPPPPPPPPP

RPPPPPPPPPPPPPP 1 ( , )i i

k kX f X Gps

Memory

Normalization Resampling Regularization

Memory 2

MatrixMemory

Stage 5

NormalizationThe same module as in stage 2 is

operating.

Quaternion to Euler and back Those operation is a separate module

that can be cascaded on the way where needed.

A word about timing

Roughly choosing a 150 MHz clock.

Total Time = 6.66 [ns] x 36 x 30,000 = 7.2 ms

Number of clocks

Throughput Unit

30,000 1 Propagation

1 Measurement update

17x30,000 1 Normalization

Follows Normalization

Re-sampling

1/17 Covariance calculation

17x30,000 1/17 Regularization

1 Re-weight

30,000 1 Normalization

Particles memories

Bluespec enables the user to encapsulate a Verilog code with a Bluespec methods.

The Particle memory controller was designed with 3 different spaces.

1. Main memory – for normal quaternion particles used in the routine operation of the algorithm.

2. Second and third memory – design to keep particles in their Euler angles form for the data correction process

Particles memories

Particle memory is N sequential & address independent. A Start signal is Asserted in the beginning of each stage. (inner design of the controller should control the addresses)

Read commands are given in advance by the memory controller to avoid data acquisition delay. The data is stored in a local FIFO.

The main memory controller is available at the top level design.

Note: a DDR2 optional burst write\read mode consists of 4*128 bit data that should fit the above tasks.

Covariance matrix memory

The covariance matrix calculation is a set of 17^2 multiplications per particle, creating an additive value of the complete matrix.

In order to stand in the time constrains, a single row calculation of the matrix with a data bus of 17*56 Bit need to be opened to the memory.

Concerning the above, “add” method (instead of write) is added to the covariance matrix memory. (adding entire row to the matrix SUM)

read command is done element by element. The Covariance matrices memories is available only within

the covariance matrix top module. The Covariance matrix memory is the virtex5 internal block

RAMs

Covariance matrix square root memory

The same as the Covariance matrix memory but,

Write method is done element by element [row,col]

Read method is done per Matrix row. (17*56 bit).

Modules design

The user receives an empty module, with predesigned interfaces (Methods) and inner Fifos & registers containing all the relevant data needed for a single particle calculation of its relevant algorithm phase.

In some cases, when a time constrain forces a certain register size or certain data flow, the data flow arrives at the correct size & flow sequence.

Modules design

In Fifo

Out Fifo

Data registers

Operational Rule or inner modules

Single module block – for future individual design

Future tasks

Project B Understanding and creating a Bluespec wrapper for

encapsulating a DDR2 memory controller for Xilinx Virtex5 FPGA.

Writing the Bluespec memory controller for the sequential Particles memory.

Simulating the controller. Future generations: Writing all of the algorithm inner modules according

to the final report descriptions of necessary constrains.

The modules can be written in Verilog and encapsulated to Bluespec.

Summary

Top down design processes are easier to implement in Bluespec as no time scheduling is required.

Bluespec HDL encapsulation capabilities allows fast parallelism, simulations and test benches of large systems even if already written in Verilog.

the current BSV structure is operating properly, an inner design of each module can be done and simulated with the same code.

The Number of particles in the algorithm is open for changes without harming the algorithm operation.

The END