Top Banner
Fast JPEG Coding on the GPU Fyodor Serzhenko, Fastvideo, Dubna, Russia Victor Podlozhnyuk, NVIDIA, Santa Clara, CA © Fastvideo, 2011
24

Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Oct 08, 2018

Download

Documents

trinhphuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Fast JPEG Coding on the GPU

Fyodor Serzhenko, Fastvideo, Dubna, Russia

Victor Podlozhnyuk, NVIDIA, Santa Clara, CA

© Fastvideo, 2011

Page 2: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Key Points

We implemented the fastest JPEG codec

Many applications using JPEG can benefit from our codec

Page 3: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

High Speed Imaging

Data Path for High Speed Camera (500 – 1000 fps)

Camera data rate from 600 MB/s to 2400 MB/s.

Problem: how to record 1 hour or more?

Possible Solutions

RAID, SSD, online compression on FPGA / DSP / CPU / GPU

The fastest solution: JPEG compression on GPU

Camera External cables PCI-E Frame grabber Host Storage

Page 4: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Why JPEG

Popular open compression standard

Good image quality at 10x-20x compression ratio

Moderate computational complexity

Page 5: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Main Stages of Baseline JPEG Algorithm

Source Image Upload RGB→YUV Transform Image split to blocks 8x8

2D DCT Quantization Zig-Zag

RLE + DPCM Huffman Bitstream Download

Page 6: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Codecs: GPU vs. CPU

Performance summary for the fastest JPEG codecs

(*) - as reported by manufacturer

JPEG Codec (Q=50%, CR=13) Encode, MB/s Decode, MB/s

Fastvideo FVJPEG + GTX 680 5200 4500

Fastvideo FVJPEG + GTX 580 3500 3500

Intel IPP-7.0 + Core i7 3770 680 850

Intel IPP-7.0 + Core i7 920 430 600

Vision Experts VXJPG 1.4 (*) 500 --

Accusoft PICTools Photo (*) 250 380

Page 7: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Best JPEG encoder IP Cores

Results as reported by manufacturer

JPEG IP Core Encode MB/s

Cast Inc. JPEG-E 750

Alma-Tech SVE-JPEG-E 500

Visengi JPEG Encoder 405

Page 8: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Encoding Rates for GPU & CPU

% 25% 50% 75% 100%

0

1000

2000

3000

4000

5000

6000

7000 GTX 680 + FVJPEG

GTX 580 + FVJPEG

GT 555M + FVJPEG

GT 240 + FVJPEG

Core i7 3770 + IPP-7

Core i7 920 + IPP-7

Quality level

JPEG compression throughput, MB/s

Page 9: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Encoding Time for GeForce 580

1

2

3

4

5

6

7

8

9

10Time for JPEG Compression Stages (ms)

100%

95%

75%

50%

25%

10%

Host-to-Device

DCT/Quant/Zig

RLE+DPCM

HuffmanDevice-to-Host

Quality level

Page 10: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Encoding Time for GeForce 680

1

2

3

4

5

6

7

8

9

10Time for JPEG Compression Stages (ms)

100%

95%

75%

50%

25%

10%

Host-to-Device

DCT/Quant/Zig

RLE+DPCM

Huffman

Device-to-Host

Quality level

Page 11: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

DCT and Entropy Encoding (GeForce 580)

% 25% 50% 75% 100%

0

10

20

30

40

50

DCT

RLE+DPCM

Huffman

Quality level

Throughput for JPEG encoding stages (GB/s)

Page 12: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Decoding

No good parallel algorithm is known for Huffman decoding

Restart markers is a standard feature supported by all decoders

Fully parallel JPEG decoding is still possible

Currently supported restart intervals: 0, 1, 2, 4, 8, 16, 32

Page 13: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Decoding Rates for GPU & CPU

% 25% 50% 75% 100%

0

1000

2000

3000

4000

5000

6000

7000 GTX 680 + FVJPEG

GTX 580 + FVJPEG

GT 555M + FVJPEG

GT 240 + FVJPEG

Core i7 3770 + IPP-7

Core i7 920 + IPP-7

Quality level

JPEG decompression throughput, MB/s

Page 14: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Decoding Time for GeForce GTX 580

1

2

3

4

5

6

7

8

9Time for JPEG Decompression Stages (ms)

100%

95%

75%

50%

25%

10%

Host-to-Device

IDCT/Quant/Zig

RLE+DPCM

Huffman Device-to-Host

Quality level

Page 15: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

JPEG Decoding Time for GeForce GTX 680

1

2

3

4

5

6

7

8

9Time for JPEG Decompression Stages (ms)

100%

95%

75%

50%

25%

10%

Host-to-DeviceIDCT/Quant/Zig

RLE+DPCM Huffman

Device-to-Host

Quality level

Page 16: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Getting More Speed-up

GPUs with PCI-Express 3.0 interface

Concurrent copy and execution

Multi-GPU computing

Page 17: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Applications to 3D rendering

• Modern 3D applications are working with increasingly high-

resolution data sets

• JPEG is a standard color map storage format

• Decoding JPEG on the CPU has major drawbacks

• CPU-based decoding can be unacceptably slow even with

partial GPU acceleration

• Transferring raw decoded image or intermediate decoding

results over PCI-Express is much more expensive

• JPEG decoding on the GPU is a perfect solution to both problems

Page 18: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Applications to JPEG Imaging for Web

• Server-side image scaling to fit client devices.

• Thumbnail generation for big image databases.

Problem: how to cope with 100’s of millions images per day?

Method outline

• Get images from the database and load them to Host

• Image Decompression → Resize → Compression

• Store final images to the database or send them to users

Page 19: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Conclusion

Fast image coding on the GPU is reality

Modern GPUs are capable of running many non-floating point

algorithms efficiently

Page 20: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Future Work

SDK for FVJPEG codec for Windows / Linux

Optimized JPEG, MJPEG, JPEG2000

Multi-GPU computing

Custom software design

Page 21: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Questions?

Fyodor Serzhenko [email protected]

Victor Podlozhnyuk [email protected]

More info at www.fastcompression.com

Page 22: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

PCs & Laptop for testing

ASUS P6T Deluxe V2 LGA1366, X58, Core i7 920, 2.67 GHz, DDR-III 6 GB, GPU GeForce GTX 580 or GeForce GT 240

ASUS P8Z77-PRO, Z77, Core i7 3770, 3.4 GHz, DDR-III 8 GB, GPU GeForce GTX 680 (cc = 3.0, 1536 cores)

OS Windows-7, 64-bit, CUDA 4.1, driver 296.10

Laptop

ASUS N55S, Core i5 2430M, DDR III 6 GB

GeForce GT 555M (cc = 2.1, 144 cores)

OS Windows-7, 64-bit, CUDA 4.1, driver 296.10

Page 23: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Baseline JPEG parameters for test

8-bit grayscale images

Compression quality from 10% to 100%

Default static quantization and Huffman tables

Test image: 7216 x 5408, 8-bit, CR = 12.8

8-thread encode/decode option for CPU

Conclusion: These parameters define the same calculation

procedures for CPU & GPU.

Page 24: Fast JPEG Codec on the GPU - Nvidiadeveloper.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S... · JPEG Codecs: GPU vs. CPU Performance summary for the fastest JPEG codecs Accusoft

Test image