Top Banner
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
25

The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

The FFT on a GPU

Graphics Hardware 2003

July 27, 2003

Kenneth Moreland Edward AngelSandia National Labs U. of New Mexico

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Page 2: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20032

Overview

• Introduction– Motivation, FFT review.

• FFT Techniques– Exploitable FFT properties.

• Implementation• Results

– Performance, applications, conclusions.

Page 3: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20033

• The Fourier transform is a principal tool for digital image processing.– Filtering.

– Correction.

– Compression.

– Classification.

– Generation.

• As such, should not our graphics hardware support such a tool?

Motivation

Page 4: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20034

The Discrete Fourier Transform

• Converts data in the spatial or temporal domain into frequencies the data comprise.

1

0

1 N

x

uxNWxfN

uFxfF

1

0

1N

u

uxNWuFxfuFF

NjN eW 2

Page 5: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20035

The Discrete Fourier Transform

• 2D transform can be computed by applying the transform in one direction, then the other.

1

0

1

0

,1

,,N

y

M

x

vyN

uxMWWyxf

MNvuFyxfF

1

0

1

0

1 ,,,N

v

M

u

vyN

uxM WWvuFyxfvuFF

DFT

IDFT

Page 6: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20036

The Fast Fourier Transform

• Divide and Conquer Algorithm– Input sequence is divided into subsequences

consisting of values from even and odd indices, respectively.

uFWuFuF uN

oe

xfxf 2e 12o xfxf

Page 7: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20037

Index Magic

• Do not use recursion.– Use dynamic programming: iterate over entire array

computing all values for each recursive depth together, like mergesort.

• Indexing is non-obvious.– Unlike mergesort, recursive step does not divide

array into contiguous chunks.

– At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions?

Page 8: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20038

Index Magic

• Common solution: rearrange data by reversing the bits of indices.– FFT can occur with contiguous partitions.

– Requires an extra data copy.

• Our solution, determine indexing in place.

iii

uiii NNunAWNunAnA i 222 121

Note that the paper has a typo.

iNnu 2 div

Page 9: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 20039

Fourier Symmetry of Real Sequences

• In general, the frequency spectra of even real functions contain imaginary values.– Captures magnitude and phase shift of sinusoids.

• Brute force FFT doubles computation and storage costs.

• But, Fourier transforms of real functions have symmetry.–

– Values at and are real (because they are conjugates with themselves).

uNFuFu *, 0F 2

NF

Page 10: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200310

Fourier Transform of Real Functions

• Pick two functions, let them be f(x) and g(x).

• Let h(x) = f(x) + j g(x).– Note that there is no loss of

information.• Can perform FFT of h in half the

time as performing the brute force FFT of f and g individually.– Simply point to one row of

image as real components and another as imaginary components.

f

g

Page 11: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200311

Untangling Fourier Transform Pairs

• Fourier transform is linear.– H(u) = F(u) + j G(u)

• We can “untangle” using symmetry of F and G.– Add and subtract H(u) and H(N – u) to cancel out

conjugate terms of F and G.

II

RR

2j2

j22

j

uGuFuNHuH

uGuF

uNGuGuNFuFuNHuH

Page 12: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200312

Untangling Fourier Transform Pairs

RR2

1I

II21

R

II21

I

RR21

R

uNHuHuG

uNHuHuG

uNHuHuF

uNHuHuF

Page 13: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200313

Packing Transforms of Real Functions

• We can store Fourier transform in an array the same size as the input.– Throw away

conjugate duplicates.

– Throw away imaginary values known to be zero.

0 1N 2N 12 N12 N 1N

Real Values Imaginary Values

Page 14: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200314

Column-wise FFT

• We have two columns with real values.– Use same “tangled”

approach.

• All other columns are complex numbers.– Use regular FFT.

Real Real

Paired forComplex

Page 15: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200315

Packing 2D Transforms of Real Functions

• Rows transformed from complex values are already packed appropriately.

• The two rows transformed from real values are untangled and packed to follow suite. 0

0

1M 2M 12 M12 M 1M

1

12 N

2N

12 N

1N

Real Values Imaginary Values

Page 16: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200316

Available Resources

• nVidia GeForce FX 5800 Ultra.– Full 32-bit floating point pipeline and frame buffers.

– Fully programmable vertex and fragment units.

• Cg– High level language for vertex and fragment

programs.

• Traditional CPU: 1.7 GHz Intel Zeon– Freely available high performance FFT

implementations.

Page 17: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200317

Implementation

• Using a SIMD model for parallel computation.– Draw quadrilateral parallel to screen.

– Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral.

– Inputs/output dependent on location of pixel the fragment program is running.

• We require many rendering passes.– Use “render to texture” extension.

– Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation.

Page 18: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200318

Implementation

ImaginaryTangled

RealTangled

RealG

RealF

Imag.F

Imag.G

Sca

le

Sca

le

Rea

lU

ntan

gled

Rea

l, T

angl

ed

Imag

., T

angl

ed

Imag

inar

yU

ntan

gled

Scale Scale

R, F

I, F

R, G

I, G

ImaginaryTangled

RealTangled

RealG

RealF

Imag.F

Imag.G

Pas

s

Pas

s

Rea

lU

ntan

gled

Rea

l, T

angl

ed

Imag

., T

angl

ed

Imag

inar

yU

ntan

gled

Pass Pass

R, F

I, F

R, G

I, G

FFT FFTUntangle Untangle

FFT FFTUntangle Untangle

Frequency S

pectraIm

ages

Page 19: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200319

Fragment Programs

• Written in Cg, compiled for GeForce FX.

Program Instructions

Arithmetic Texture

FFT 27 3

Untangle 4 2

Scale 1 1

Tangle 1 2

Pass 0 1

Multiply 66 4

Page 20: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200320

Applications

• Digital image filtering.

Page 21: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200321

Applications

• Texture generation.

• Volume rendering.

Page 22: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200322

Performance

• Computation speed: 2.5 GigaFLOPS• Texture read rate: 3.4 GB/sec

Image Size Rendering Rate (Hz)

Arithmetic (sec)

Texture Lookup (sec)

10242 0.37 1.9 0.6

5122 1.6 0.44 0.13

2562 6.7 0.09 0.03

1282 25 0.01 0.007

Page 23: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200323

Conclusions

• The Fourier transform on the GPU has many potential applications.

• A well established FFT on the CPU (FFTW) still has an edge over GPU implementation.– Both software and hardware of GPU are first

generations.

– Room for improvement.

Page 25: The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.

Graphics Hardware 200325

Questions?