Top Banner
1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling
42

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

Dec 26, 2015

Download

Documents

Geoffrey Gibbs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

1

CS4402 – Parallel Computing

Lecture 7

Parallel Graphics – More Fractals

Scheduling

Page 2: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 2

FRACTALS

Page 3: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 3

Fractals

A fractal is a set of points such that:

- its fractal dimension is infinite [infinite detail at every point].

- satisfies self-similarity: any part of the fractal is similar with the fractal.

Generating a fractal is a iterative process:

- start from P0

- iteratively generate P1=F(P0), P2=F(P1), …, Pn=F(Pn-1), …

P0 is a set of initial points

F is a transformation:

Geometric transformations: translations, rotations, scaling, …

Non-Linear coordinate transformation.

Page 4: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 4

We work with 2 rectangular areas.

The user space:

- Real coordinates (x,y)

- Bounded between [xMin,xMax]*[yMin,yMax]

The screen space

- Integer coordinates (i, j)

- Bounded between [0,w-1]*[0,h-1]

- Is upside down with the Oy axis downward

How to squeeze the user space into the screen space?

How to translate (x,y) in (i,j)?

Points vs Pixels

Page 5: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 5

Julia Sets – Self-Squaring Fractals

Consider the generating function F(z)=z2+c, z,c C.

Sequence of complex numbers: z0C and zn+1= zn2 + c.

Chaotic behaviour but two attractors for |zn|: 0 and +.

For a c C, Julia’s set Jc represents all the points whose orbit is finite.

Page 6: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 6

Julia Sets – Algorithm

Inputs:

c C the complex number; [xmin,xmax] * [ymin,ymax] a region in plane.

Niter a number of iterations for orbits; R a threshold for the attractor .

Output: Jc the Julia set of c

Algorithm

For each pixel (i,j) on the screen

translate (i,j) into (x,y)

construct z0=x+j*y;

find the orbit of z0 [first Niter elements]

if (all the orbit points are under the threshold) draw (x,y)

Page 7: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 7

for(i=0; i<=width; i++) for(j=0; j<width; j++){

int k =0;// construct the orbit of zz.re = XMIN + i*STEP; z.im = YMIN + j*STEP;for (k=0; k < NUMITER; k++) {

z = func(z,c);if (CompAbs(z) > R) break;

}

// test if the orbit in infiniteif (k>NUMITER-1) {

MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph);

}else {

MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph);

}}

Page 8: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 8

Julia Sets – || Algorithm

Remark 1.

The double for loop on (i,j) can be split into processors e.g.

uniform block or cyclic on i.

uniform block or cyclic on j.

No communication at all between processors, therefore this is

embarrassingly || computation.

Remark 2.

All processors draw a block of the fractal or several rows on the XGraph.

Prank knows the area to draw.

Page 9: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 9

for(i=rank*width/size; i<=(rank+1)*width/size; i++) for(j=0; j<width; j++){// for(i=rank; i<width; i+=size) for(j=0; j<width; j++){// for(i=0; i<width; i++) for(j=rank*width/size; j<=(rank+1)*width/size; j++)// for(i=0; i<width; i++) for(j=rank; j<width; j+=size)

int k =0;// construct the orbit of zz.re = XMIN + i*STEP;z.im = YMIN + j*STEP;for (k=0; k < NUMITER; k++) {

z = func(z,c);if (CompAbs(z) > R) break;

}

// test if the orbit in infiniteif (k>NUMITER-1) {

MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph);

}else {

MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph);}

}

Page 10: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 10

Page 11: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 11

Page 12: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 12

The Maldelbrot Set

THE MANDELBROT FRACTAL IS AN INDEX FOR JULIA FRACTALS

Maldelbrot Set contains all the points cC such that

z0=0 and zn+1= zn2 + c has an finite orbit.

Inputs: [xmin,xmax] * [ymin,ymax] a region in plane.

Niter a number of iterations for orbits; R a threshold for the attractor .

Output: M the Mandelbrot set.

Algorithm

For each (x,y) in [xmin,xmax] * [ymin,ymax]

c=x+i*y;

find the orbit of z0=0 while under the threshold.

if (all the orbit points are not under the threshold) draw c(x,y)

Page 13: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 13

for(i=0; i<=width; i++) for(j=0; j<width; j++){

int k =0;// construct the point cc.re = XMIN + i*STEP; c.im = YMIN + j*STEP;// construct the orbit of 0z.re = z.im = 0;for (k=0; k < NUMITER; k++) {

z = func(z,c);if (CompAbs(z) > R) break;

}

// test if the orbit in infiniteif (k>NUMITER-1) {

MPE_Draw_point(graph, i,j, MPE_YELLOW); MPE_Update(graph);

}else {

MPE_Draw_point(graph, i,j, MPE_RED); MPE_Update(graph);

}}

Page 14: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 14

The Mandelbrot Set – || Algorithm

Remark 1.

The double for loop on (i,j) can be split into processors e.g.

uniform block or cyclic on i.

uniform block or cyclic on j.

No communication at all between processors, therefore this is

embarrassingly || computation.

Remark 2.

When the orbit goes to infinity in k steps then we can draw the pixel (i,j)

with the k-th color from a palette.

Bands color-ed similarly contain points with the same behaviour.

Page 15: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 15

Page 16: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 16

Fractal and Prime Numbers

Prime numbers can generate fractals.Remarks:

- If p>5 is prime then p%5 is 1,2,3,4.- 1,2,3,4 represent direction to do e.g. left, right, up down.- The fractal has the sizes w and h.

Step 1. Initialise a matrix of color with 0.Step 2. For each number p>5

If p is prime thenif(p%5==1)x=(x-1)%w;if(p%5==2)x=(x+1)%w;if(p%5==3)y=(y-1)%w;if(p%5==4)y=(y+1)%w;

Increase the color of (x,y)

Step 3. Draw the pixels with the color matrix.

Page 17: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 17

Simple Remarks

The prime number set is infinite, furthermore it has no patter.

prime: 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, …

move: 3, 0, 2, 1, 3, 2, 4, 3, 4, 1, 2, …

The set of moves satisfies:

- it does not have any pattern moves are quite random.

- the number of 1-s, 2-s, 3-s and 4-s moves are quite similar,

hence the central pixels are reached more often.

The computation of the for loop is the most expensive operation.

Page 18: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 18

// initialise a matrix with 0for(i=0;i<width;i++)for(j=0;j<width;j++)map[i][j]=0;

//start from the image centreposX = posY = width/2;

// traverse the set of prime numbersfor(i=0;i<n;i++){

if(isPrime(2*i+1)){

// move to a new position on the map and increment itmove = (2*i+1)%5;if (move==1) posX = (posX-1)%width;if (move==2) posX = (posX+1)%width;if (move==3) posY = (posY-1)%width;if (move==4) posY = (posY+1)%width;

map[posY][posX]++}

}

Page 19: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 19

Parallel Computation: Simple Remarks

Processor rank gets some primes to test using some partitioning.

Processor rank therefore will traverse the pixels according with some moves.

Processor rank has to work with its own matrix map.

The map must be reduce on processor 0 to find the total number of hits.

Page 20: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 20

Parallel Computation: Simple Remarks

The parallel computation of processor rank follows the steps:

1. Initialise the matrix map.

2. For each prime number assigned to rank do

1. Find the move and go to a new location

2. Increment the map

3. Reduce the matrix map.

4. If processor 0 then draw the map.

Page 21: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

21

Splitting Loops

How to split the sequential loop if we have size processors?

Maths: n iterations & size processors n/size iterations per processor.

for(i=0;i<n;i++){

// body of looploop_body(data,i);

}

Page 22: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

22

Splitting Loops in Similar Blocks

P rank gets the iterations rank*n/size, rank*n/size+1,…, (rank+1)*n/size-1

for(i=rank*n/size;i<(rank+1)*n/size;i++){

//aquire the data for this iterationloop_body(data,i);

}

rank*n/size (rank+1)*n/size-1

P rank

Page 23: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

23

Splitting Loops in Cycles

P rank gets the iterations rank, rank+size, rank+2*size,….

for(i=rank;i<n;i+=size){

//aquire the data for this iterationloop_body(data,i);

}

P rank

Page 24: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

24

Splitting Loops in Variable Blocks

P rank gets the iterations l[rank], l[rank]+1,…, u[rank]

for(i=l[rank];i<=u[rank];i++){

//aquire the data for this iterationloop_body(data,i);

}

l[rank] u[rank]

P rank

Page 25: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 25

// initialise a matrix with 0for(i=0;i<width;i++)for(j=0;j<width;j++)map[i][j]=0;

//start from the image centreposX = posY = width/2;

// traverse the set of prime numbersfor(i=rank*n/size;i<(rank+1)*n/size;i++){

if(isPrime(p=2*i+1)){

// move to a new position on the map and increment itmove = p%5;if (move==1) posX = (posX-1)%width;if (move==2) posX = (posX+1)%width;if (move==3) posY = (posY-1)%width;if (move==4) posY = (posY+1)%width;

map[posY][posX]++}

}MPI_Reduce(&map[0][0], &globalMap[0][0], width*width, MPI_LONG, MPI_SUM, 0,

MPI_COMM_WORLD);

if(rank==0){

for(i=0;i<width;i++)for(j=0;j<width;j++)MPE_Draw_point(graph, i, j, colors[globalMap[i][j]);

}

Page 26: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 26

Page 27: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

04/19/23 27

Scheduling

Page 28: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

28

Parallel Loops

Parallel loops represent the main source of parallelism.

Consider a system with p processors P1,P2,…, Pp and

for i=1, n do

call loop_body(i)

end for

Scheduling Problem:

Map the iterations {1,2,…,n} onto processors so that:

- the execution time is minimal.

- the execution times per processors are balanced.

- the processor’s idle time is minimal.

Page 29: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

29

Parallel Loops

Suppose that the workload of loop_body is know and given by w1, w2,…, wn.

For Processor PJ the set of iteration is SJ={i1, i2, …, ik} so

- The execution time of Processor PJ is T(PJ)=∑ {wi: i in SJ}

- The execution time of the parallel loop is T=max{T(PJ): j=1,2,..,p}.

Static Scheduling: the partition is found at the compiling time.

Dynamic Scheduling: the partition is found at the running time.

Page 30: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

30

Data Dependency

A dependency exists between program statements when the order of statement

execution affects the results of the program.

A data dependency results from multiple use of the same location(s) in storage

by different tasks. A data is “input” for another data.

Dependencies are important to parallel programming because they are one of the

primary inhibitors to parallelism.

Loops with data dependencies cannot be scheduled.

Example: The following for loop contains data dependencies.

for i=1, n do

a[i]=a[i-1]+1

end for

Page 31: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

31

Load Balancing

Load balancing refers to the practice of distributing work among

processors so that all processors are kept busy all of the time.

If all the processor execution times are the same then a perfect load balance

is achieved.

Load Imbalance is the most important overhead of parallel computation

and reflects the case when there is a difference between two execution

times.

Page 32: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

32

Page 33: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

33

Page 34: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

34

Useful Rules:

- If the workloads are similar then use static uniform block scheduling.

- If the workloads increase/decrease then use static cyclic scheduling.

- If we know the workloads and they are simple then guide the load balance.

- If the workloads are not known they use dynamic methods.

Page 35: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

35

Balanced Workload Block Scheduling

w1, w2, …, wn the workload of the iterations

- total workload is w1+ w2+ …+ wn

- average per processor is

Each Processor gets consecutive iterations:

- lrank urank– the lower and upper indices of the block

- The workload is

size

wwwW n

...21

Wwww ull ...1

Page 36: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

36

Balanced Workload Block Scheduling

Simple to work with integrals:

Average Workload per a processor is

Each processor workload is

n

diiwsize

W0

)(1

Wdiiwid

id

x

x

1

)(

WidWxWiddiiwWdiiw id

xx

x

idid

id

1

0

)()(1

Page 37: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

37

Page 38: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

38

Page 39: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

39

Page 40: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

40

Page 41: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

41

Page 42: 1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

42

Granularity

Granularity is the ratio of computation to communication.

Periods of computation are typically separated from periods of communication by synchronization events.

Fine-grain Parallelism: Relatively small amounts of computational work are done between communication events.

Facilitates load balancing and Implies high communication overhead and less opportunity for performance enhancement

Coarse-grain Parallelism: Relatively large amounts of computational work are done between communication/synchronization events. Harder to load balance efficiently