FAST MAP PROJECTION ON CUDA.ppt

FAST MAP PROJECTION ON CUDA

Yanwei Zhao

Institute of Computing Technology

Chinese Academy of Sciences

July 29, 2011

OutlineOutline

Institute of Computing Technology,Chinese Academy of Sciences

OutlineOutline

Map Projection Establish the relationship between two different

coordinate systems. geographical coordinates → planar cartesian map space

coordinate system

Complicated and time consuming arithmetic operations. Fast answer with desired accuracy→ Slow exact

answer It's need to be accelerated for interactive GIS scenarios.

GPGPU(The general purpose computing on graphics processing units)

GPGPU is a young area of research. Advantage of GPU

Flexibility Power processing Low cost

GPGPU in applications other than 3D graphics GPU accelerates critical path of application

CUDA(Common Unified Device Architecture) NVIDIA's parallel computing

architecture C base programming

language and development toolkit

Advantage: Programmer can focus on the

important issues rather than an unfamiliar language

No need of graphics APIs and write efficient parallel code

The characteristic of Map Projection

Huge amount of coordinates to handle

The complexity of arithmetic operations

The requirement of a realtime response

Our proposals

using the new technology CUDA on the GPU

Take Universal Transverse Mercator (UTM) projection as an example

Performance: Improvement of up to 6x to 8x

(include transfer time) Speed up 70x to 90x

(not include transfer time)Institute of Computing Technology,

OutlineOutline

Algorithm frameworkCPU

3. Copy the data from CPU to GPU global memory

5. Copy the result from GPU to CPU

1.Open the shapefile2.Read the coordinates of all features

6.free up the device memory

Block 0

…………

Block m

……

4. Execute the kernel function

7.Save or display the result

Striped partitioning

Matrix distribution

Define the number of block and thread: Block_num,Thread_num

CUDA built-in parameters: GridDim, BlockDim

Geographic feature number: fn

Each block runs features: fn/GridDim.x

Block 0

Block 1

Block m

feature 0

feature 1

feature m

feature m+1

feature m+2

feature 2m

……

coord 0

coord 1

coord n

coord 0

coord 1

coord n

thread 0

thread 1

thread n

The relationship between blocks

and features

The relationship between threads and coordinates

For surrounding loop: Blocks and features Block → Feature[i] i = blockidx.x*(fn/GridDim.x)

Block → next Feature[k] k = i + fn/GridDim.x (2)

For inner loop: Threads and coordinates thread→coord[j]

j = threadIdx.x thread→next coord[k]

k = j +Thread_numInstitute of Computing Technology,

Block 0

Block 1

Block m

feature 0

feature 1

feature m

feature m+1

feature m+2

feature 2m

……

coord 0

coord 1

coord n

coord 0

coord 1

coord n

thread 0

thread 1

thread n

and features

For surrounding loop: Blocks and features Block → Feature[i]

i = blockidx.x*(fn/GridDim.x) Block → next Feature[k]

k = i + fn/GridDim.x

For inner loop: Threads and coordinates thread→coord[j]

j = threadIdx.x (1) thread→next coord[k] k = j +Thread_num (2)

Block 0

Block 1

Block m

feature 0

feature 1

feature m

feature m+1

feature m+2

feature 2m

……

coord 0

coord 1

coord n

coord 0

coord 1

coord n

thread 0

thread 1

thread n

and features

Matrix distribution

fn gridDim x grdiDim yk

gridDim x grdiDim y

Define the number of block and thread: grid(br,bc), block(tr,tc)

Each block run k features, where: (1)

Feature[i]: (2)

i blockIdx y GridDim x k

i blockIdx y GridDim x k k

Matrix distribution

Each block run s coordnates, where:

coord[j]:

[ ]. . . 1

feature i size blockDim x blockDim ys

blockDim x blockDim y

j threadIdx y BlockDim x s

j threadIdx y BlockDim x s s

OutlineOutline

Experiment Environment

Hardware: CPU: Intel Core2 Duo CPU E8500 at 3.18GHz with

2GB of internal memory GPU: NVIDIA GeForce 9800 GTX+ graphics card

which has 512MB memory, 128 CUDA cores and 16 multiprocessors

Software: Microsoft Windows XP Pro SP2 Microsoft Visual Studio 2005 NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2

The data parallel degree

total CPU time : initialization and file reading time serial projection time

The data parallel degree

total CPU time : initialization and file reading time serial projection time

Map projection can achieve more than 90 percent of parallelism.

Comparing with CPU

Block_num=64 Thread_num=512

Comparing with CPU

Total time = map projection time + data transfer time

Comparing with CPU

If consider the total time, the performance can obtain 6x to 8x.

Comparing with CPU

If only compare map projection time, we can obtain 70x to 90x speedups.

The performance of different task assignments

striped partitioning : Block_num=64, Thread_num=512

matrix distribution: dim_grid(32,32) = 32*32 blocks dim_block(256,256) = 256*256 threads

striped partitioning : Block_num=64, Thread_num=512

matrix distribution: dim_grid(32,32) = 32*32 blocks dim_block(256,256) = 256*256 threads

Striped: 6x to 8x

Matrix: 4x to 6x

……

…………

Block 0 Block 1 Block m-1

Global Memory

……………… …… …… ……0 1 n-1 n n+

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(0,0)

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(m,0)

… … …Global

Memory

B(0,0) B(1,0) B(m,0)

B(0,m) B(1,m) B(m,m)

… … … …

Grid 0

BlockDim.x*GridDim.x

Matrix Striped

……

…………

Block 0 Block 1 Block m-1

Global Memory

……………… …… …… ……0 1 n-1 n n+

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(0,0)

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(m,0)

… … …Global

Memory

B(0,0) B(1,0) B(m,0)

B(0,m) B(1,m) B(m,m)

… … … …

Grid 0

BlockDim.x*GridDim.x

Matrix Striped

All threads in the block accessing consecutive memory.it can only ensure each row of

threads in the block handle consecutive data

OutlineOutline

Conclusion and Future work Implement a fast map projection method.

CUDA-enabled GPUs high speed-up compared to the CPU-based

method the power of modern GPU is able to considerably

speed up in the field of geoscience DEM-based spatial interpolation raster-based spatial analysis

Future work: GPU implementation of other GIS application

Thank you!Q & A

Yanwei Zhao

Institute of Computing Technology

Contact: zhaoyanwei@ict.ac.cn

FAST MAP PROJECTION ON CUDA.ppt

chinese academy ofsciences

x block

x thread

cpu block

new technology cuda

map projection time

number of block

cpu total time

Technology

Map Projection Using ArcGIS -...

Elements of map projection with applications to map and ...

NEW MAP PROJECTION PARADIGMS: Bresenham Poly …

Map Projection 1

CONVERSION BETWEEN HUNGARIAN MAP PROJECTION …

Real Geodetic Map (Map without Projection)

MAP PROJECTION THEORY - mygeodesy.id.au

Projection, Datum, and Map Scale

TUTORIAL 1 - MAP PROJECTION (COLOR).pdf

Map projection - lecture-notes.tiu.edu.iq

A Fast Mask Projection Stereolithography Process for ...

Application of Map Projection Transformation in ...

MAP PROJECTION SYSTEMS - seismicconsolidation.com

HOTINE OBLIQUE MERCATOR MAP PROJECTION IN PROJ4JS … · 1....

Map projection - marksmath.org

Robinson Projection Map & Mercator Projection Map