Top Banner
1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented by Damon Van Buren SEAKR Engineering MAPLD 2004 Submission 133
20

1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

1

Aerospace Data Storage and Processing Systems

Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable

Computing Board

Presented by Damon Van Buren

SEAKR Engineering

MAPLD 2004

Submission 133

Page 2: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

2

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

The Sensor Bandwidth ProblemThe Sensor Bandwidth Problem

Commercial satellite imaging systems are experiencing growth in imaging capability...

• Higher resolution: < 1 m• Larger images: >10k image width and height• More spectral components

– Panchromatic

– Red/Green/Blue

– Multi-spectral

Improved capabilities are leading to high sensor data rates• Data output rates > 2 Gbps for some systems

Providing storage and downlink bandwidth for the data is becoming a significant challenge for system designers

• The largest data recorders can store less than 20 minutes of data at 2 Gbps• Downlinks must be several hundred Mbps to downlink 15 minutes of data in under

an hour• Data storage and high-bandwidth downlinks require lots of power

By reducing the amount of image data, compression provides a solution to the bandwidth problem!

Page 3: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

3

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Desired Compressor FeaturesDesired Compressor Features

Real Time• Compression must be performed in real time, prior to storage.• High throughput (> 2 Gbps)

Excellent Performance in Lossy and Lossless Modes• Purchasers of satellite imagery are sensitive to reductions in image quality

caused by lossy compression.• Scientific users prefer undistorted data (bit true).

Space-Qualified• Must survive hazards of launch and space operation, including radiation.

Low Risk• Satellite imaging companies seek high reliability solutions..

Low Cost • Commercial customers require cost effective solutions.

Flexible• The ability to support varying compression ratios and contents would allow more

effective use of available storage and bandwidth.

Page 4: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

4

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

JPEG2000 AlgorithmJPEG2000 Algorithm

JPEG2000 is an excellent choice for satellite image compression.• Latest still image compression standard from the JPEG committee

Meets two key requirements for satellite image compression:• Excellent performance in both lossy and lossless modes.

– ~1.7 to 1 lossless compression for typical satellite imagery - 70% improvement!

– Visually lossless compression > 2 to 1 - 100% improvement in storage and downlink performance.

• Very flexible:– Many options for compressed images.

Other advantages:• International Standard• Wavelet based

– High quality lossy images with comp. ratios > 100:1

• Packet oriented– Allows random access to the compressed code stream.

– Makes compressed data more robust in the presence of bit errors.

– Allows selection of image quality, spatial region, resolution, and color component after compression.

Page 5: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

5

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

JPEG2000 Implementation ChallengesJPEG2000 Implementation Challenges

JPEG2000 is a very complex algorithm.• More Features = More Complexity.

Operation intensive• Several hundred operations per pixel, because each bit must be processed many times,

for the wavelet transform, entropy coding, MQ coding, packet generation, etc.

Complex• Many different stages to produce compressed output.

– Wavelet transform.

– Quantization.

– Context generation.

– Arithmetic coding.

– Packet generation.

• Many parameters must be tracked individually for each code block (64x64).

Memory intensive• Each pixel must be accessed many times, so many small buffers are needed to get good

throughput.

Few processors are capable of implementing JPEG2000 at high rates!

Page 6: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

6

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

High-Performance Processing Using High-Performance Processing Using Xilinx FPGAsXilinx FPGAs

Xilinx FPGAs have many advantages for fast parallel processing:• Millions of gates.• System clocks of several hundred MHz.• High speed I/O

– 622 Mbps LVDS

– Multi-Gigabit serial I/O

• Hundreds of internal block RAMS.• Hundreds of internal 18 bit multipliers.

Xilinx FPGAs are available in a space qualified versions:• Radiation testing is complete on the Virtex and Virtex-II devices.

– ~200 kRad total dose, latchup immune.

• Radiation testing to begin on the Virtex-II Pro devices soon.

Xilinx FPGAs are very flexible, reducing risk:• May be re-programmed an infinite number of times.• Configurations may be uploaded at any time during the mission to fix errors or add

new capability.

Xilinx FPGAs are the best solution for fast compression in space!

Page 7: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

7

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Challenges for Xilinx Use in SpaceChallenges for Xilinx Use in Space

The effects of radiation in spacecraft electronics are well known.• Caused primarily by charged particles.• May cause permanent damage over time by ionizing SiO2 (total dose).• May also cause errors in digital logic by upsetting registers (single event

effects).• Mitigation techniques are used to reduce or eliminate the effect of radiation

upsets.– Triple Modular Redundancy (TMR) uses voting to select the correct output from 3 separate instances of

the design.

Mitigation of radiation effects in SRAM-based FPGAs presents an additional challenge:

• As with other digital electronics, the functional logic of the device is susceptible to upset, however...

• Another layer of logic (configuration logic) controls the routing of the part, giving the device its capability to be reprogrammed to perform different functions.

• Configuration logic is also susceptible to radiation upsets. Xilinx FPGAs require system level mitigation strategies in addition to the

device level mitigation techniques (such as TMR) that are commonly used for space electronics.

• Configuration data must be continuously re-written, or scrubbed using a read-and-correct approach.

Page 8: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

8

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

SEAKR’s RCC Board Processing SEAKR’s RCC Board Processing SolutionsSolutions

SEAKR has developed a line of Reconfigurable Computing (RCC) products based on the Xilinx FPGAs.

• RCC 1 – 4x Virtex 1000s• RCC 2 – 4x Virtex II 6000s• RCC 3 (NTRCC) – 4x Virtex II Pro 70/100s

Boards include system-level upset mitigation (scrub) for the Xilinx devices.

• Configuration data is continuously read and checked for errors.• Errors are corrected by overwriting the corrupted frames, without interrupting

the operation of the device. Other devices on board employ radiation mitigation strategies as well:

• Radiation hardened• EDAC

Boards also have dedicated resources to support high-performance processing:

• High speed I/O.• External memories.

Industry standard form-factor: 6U Compact PCI.

Page 9: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

9

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Network RCC (NTRCC)Network RCC (NTRCC)

Four Xilinx XC2VP70-6FF1704 FPGA CO-Processors• Design compatible with XC2VP100-6FF1706 and V2P-X

(4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for each COP 512MB of DDRII Shared SDRAM memory for prototype

• 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM shared memory (19.2Gbps @150MHz) using 1Gbit memory

Network IF• (2) parallel 16bit RapidIO ports to front panel (8 Gbps)

• (1) 4x3.125 Gbps serial port to front panel (>10Gbps)

• 4x3.125 Gbps ports from NIC to each COP (>10Gbps)

• 4x3.125 Gbps ports from each COP to each neighbor COP (>10Gbps)

Shared Data Buses• Cop Interconnect Bus (~4.224 Gbps)

• cPCI 32bit 33Mhz

Read and write COP configurations via cPCI Extended 6U form factor Configuration RAM SEU detection and correction

• DDRII SDRAM on configuration controller for shadow config program storage

Non-Volatile memory for 16 different configurations (1 Gbit Flash)

Page 10: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

10

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Network RCC Block DiagramNetwork RCC Block Diagram

Page 11: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

11

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

NTRCC LayoutNTRCC Layout

24 Layer board MicroVias, blind vias, via-in-padHigh speed 3.125 Gbps Serial links82 pages of schematic capture10 weeks of PCB layout time

Page 12: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

12

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Implementation of the JPEG2000 Implementation of the JPEG2000 AlgorithmAlgorithm The JPEG2000 core has been in development for over a year.

• Eventual target data rate 600 Mbps/device.• Written in VHDL.• Simulations performed in Modelsim.• Synthesis in Synplify_Pro.

Targeted to the NTRCC-R summer ‘04.• Targeted to a reduced version of the NTRCC with a single coprocessor.• Take advantage of improved external memory throughput.• Ultimately use the high-speed serial I/O to move image information on the

board.

Designed for high throughput.• Cycle efficient coding style.• Highly parallel design.• Pipelined architecture.• Rolling wavelet transform.

Designed for flexible output file format.• Output is divided into quality layers for easy selection of compression ratio.

Page 13: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

13

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

JPEG2000 Block DiagramJPEG2000 Block Diagram

Page 14: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

14

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

JPEG2000 Coding StepsJPEG2000 Coding Steps

Image is broken into tiles Tiles are wavelet transformed

• 5/3 reversible or 9/7 irreversible, also user defined.• Selectable number of transform levels.

Each subband from the transform is further broken up into code blocks (typically 32x32 or 64x64) for entropy coding.

Each code block is entropy coded, starting from the top bit plane and working down.

• The current bit of each pixel is passed to an arithmetic coder, along with context information.

• The MQ encoder takes advantage of any skewing of the probability for each context, and adapts contexts as the coding progresses.

Packets are formed by combining the entropy coder outputs from a single resolution.

Tile parts are formed from all the packet in a given bit plane.

Page 15: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

15

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

JPEG2000 Architecture DriversJPEG2000 Architecture Drivers

To achieve high data rates, the processing must be paralleled as much as possible.

The “tall pole in the tent” is the arithmetic coding, because the coding of a single data bit with its context can take several clock cycles.

Significance propagation coding is also a challenge, because each coefficient must be accessed many times, as each bit plane is processed.

Other operations, such as wavelet transform, code block loading, and packet generation are much more efficient, and require fewer parallel paths.

A pipelined architecture with many entropy coders in parallel was used to achieve the required throughput.

Page 16: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

16

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Architecture DescriptionArchitecture Description

Processes 256x256 tiles. Pipelined architecture, using separate external memories for image,

tile, and compressed data storage. 19 Entropy coders working in parallel to improve throughput, one for

each code block.• 64x64 code blocks.

FIFO buffering between the stages improves data flow efficiency. A rolling wavelet transform is used to reduce memory accesses and

improve efficiency. Entropy coder outputs are formed into layers, giving each tile a

progressive output format. Tile parts are interleaved as the image tiles are processed. Performs lossy or lossless compression.

Page 17: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

17

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

NTRCC-R Implementation ResultsNTRCC-R Implementation Results

The JPEG2000 encoder was targeted to the V2Pro 70 FPGA on the NTRCC-R.• Lossless or Lossy compression.• Data precision up to 13 bits.

Simulation and Routing Results:• Slices: 30043 out of 33088, 90%• Block RAMS: 148 out of 328, 45%• Max system clock ~43 MHz without optimization.

Hardware Throughput:• ~140 Mbps w/ 33 MHz clock (depending on image.)• ~180 Mbps w/ 43 Mhz clock.

Page 18: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

18

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

JPEG2000 FloorplanJPEG2000 Floorplan

The Pro 70 Device is quite full!

Page 19: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

19

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

Planned ImprovementsPlanned Improvements

Optimize design to hit 66 MHz.• Un-optimized design will operate at up to 43 MHz.• Use of asynchronous fifos will allow optimal clocking of various parts of the

design.

Improve pipelining of code block loader and wavelet transform.• Allow “autonomous” operation of each stage, so that operations take place as

soon as input data and output buffers are ready.

Make use of additional QDR SRAMs available to each coprocessor by creating separate buffers for wavelet transform and packetizer output.

• NTRCC has 4 QDR memories for each coprocessor.

Arithmetic coder bypass.• Arithmetic coder requires > 2 cycles per bit coded, on average.

9/7 wavelet transform with quantization.• Use of the 9/7 wavelet results in better SNR and max error performance for

lossy compression.

Add RapidIO serial interface to Network Interface Chip (NIC).

Page 20: 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

20

Aerospace Data Storage and Processing Systems

Van Buren Submission 133

ConclusionsConclusions

The JPEG2000 core is expected to provide a valuable option for satellite imagery systems.

• Compression will result in a dramatic improvement in system performance.

• Lossless compression will allow ~70% more image data to be stored and downlinked by a system.

• Lossy compression will allow even greater improvements.

NTRCC hardware is an excellent platform for the compressor.• High bandwidth interconnect and I/O (several Gbps).• High bandwidth external memories.• Excellent processing capability with the Virtex-II Pro devices.

The sky’s the limit!• Target rate of 600 Mbps per device appears to be a realistic goal.• Some improvements are left to be made to the clock rate and pipelining

of the design.