Top Banner
CSCI-455/552 Introduction to High Performance Computing Lecture 11
21

CSCI-455/552

Jan 03, 2016

Download

Documents

mikayla-vargas

CSCI-455/552. Introduction to High Performance Computing Lecture 11. Bucket Sort. One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSCI-455/552

CSCI-455/552

Introduction to High Performance Computing

Lecture 11

Page 2: CSCI-455/552
Page 3: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.9

Bucket SortOne “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm.

Sequential sorting time complexity: O(nlog(n/m).Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

Page 4: CSCI-455/552
Page 5: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.10

Parallel Version of Bucket SortSimple approach

Assign one processor for each bucket.

Page 6: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11

Further ParallelizationPartition sequence into m regions, one region for each processor.

Each processor maintains p “small” buckets and separates numbers in its region into its own small buckets.

Small buckets then emptied into p final buckets for sorting, which requires each processor to send one small bucket to each of the other processors (bucket i to processor i).

Page 7: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.12

Another Parallel Version of Bucket Sort

Introduces new message-passing operation - all-to-all broadcast.

Page 8: CSCI-455/552
Page 9: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.13

“all-to-all” Broadcast RoutineSends data from each process to every other process

Page 10: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.14

“all-to-all” routine actually transfers rows of an array to columns:Transposes a matrix.

Page 11: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Bucket and Sample Sort • The critical aspect of the above algorithm is one of

assigning ranges to processors. This is done by suitable splitter selection.

• The splitter selection method divides the n elements into p blocks of size n/p each, and sorts each block by using quicksort.

• From each sorted block it chooses p – 1 evenly spaced elements.

• The p(p – 1) elements selected from all the blocks represent the sample used to determine the buckets.

• This scheme guarantees that the number of elements ending up in each bucket is uniformed (less than 2n/p).

Page 12: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Bucket and Sample Sort

An example of the execution of sample sort on an array with 24 elements on three processes.

Page 13: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Bucket and Sample Sort

• The splitter selection scheme can itself be parallelized.

• Each processor generates the p – 1 local splitters in parallel.

• All processors share their splitters using a single all-to-all broadcast operation.

• Each processor sorts the p(p – 1) elements it receives and selects p – 1 uniformly spaces splitters from them.

Page 14: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11

Parallel Complexity Analysis

Page 15: CSCI-455/552
Page 16: CSCI-455/552
Page 17: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.15

Numerical Integration Using Rectangles

Each region calculated using an approximation given by rectangles:Aligning the rectangles:

Page 18: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.16

Numerical Integration Using Trapezoidal Method

May not be better!

Page 19: CSCI-455/552
Page 20: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.17

Adaptive QuadratureSolution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas .

Page 21: CSCI-455/552

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.18

Adaptive Quadrature with False Termination.

Some care might be needed in choosing when to terminate.

Might cause us to terminate early, as two large regions are the same (i.e., C = 0).