Top Banner
Topology-aware Job allocation on HPC system Xu Yang ID: A20280429 Email: [email protected]
28

Topology-aware Job allocation on HPC system

Jan 24, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topology-aware Job allocation on HPC system

Topology-aware Job allocation on HPC system

Xu Yang!!

! ! ! ID: A20280429! Email: [email protected]

Page 2: Topology-aware Job allocation on HPC system

Outline

I. Motivation!

II. Solutions!

(1) Dimensional Ordering!

(2) Space Filling Curve Ordering!

III. Evaluation Metrics!

IV. Experimental Results

Page 3: Topology-aware Job allocation on HPC system

What is job allocation on HPC?

Why we care about job allocation

• Jobs submitted to HPC system always requires different number of processors.

Processors —> Nodes—>Midplanes(or blade cards)—>Racks(chassis)—>Cabinets

• HPS system consists of hundreds/thousands of processors. They are organized in the form of:

• HPC network resource is limited, especially like bandwidth,connection(routing path)

• Communication in HPC is expensive, more expensive than computation.

Page 4: Topology-aware Job allocation on HPC system

IBM! Cray

Blue Gene/L 0.375 XT3 8.77

Blue Gene/P 0.375 XT4 1.36

Blue Gene/Q 0.117 XT5 0.23

Table 1: Byte-to-flop ratios!

For each flop on the node, the interconnected network is able to communicate fewer and fewer bytes. !!

Topology aware job scheduling/allocating will have great importance for HPC systems.

Now, only 6% of the top500 machines(primarily the IBM Blue Gene series) provide contiguous node allocation for their jobs.

Page 5: Topology-aware Job allocation on HPC system

Contiguous VS Non-Contiguous job allocation

Contiguous! Non-Contiguous

Pros• Low communicat ion

cost/Network contention • fragmentation

• High system utilization • Short wait time • No fragmentation

Cons • Low system utilization • long wait time

• High communication cost/Network contention

Page 6: Topology-aware Job allocation on HPC system

Processor Ordering—Sequence of allocation

1. Dimensional Ordering

2. Space Filling Curve (Hilbert Curve)Ordering

3D torus topology, three dimension is w, l, d . For each node, its index is ind, coordinates is (x, y, z)

ind = z*w*l+y*w+x

ind = H(x, y, z) = (h(x), h(y), h(x))

Page 7: Topology-aware Job allocation on HPC system
Page 8: Topology-aware Job allocation on HPC system

0 1

2 3

4

0

5

1

6

2

7

3

12

8

13

9

14

10

15

11

Page 9: Topology-aware Job allocation on HPC system

colored by job and illustrates the planar and fragmented

nature of the default selection algorithm.

The new node selection algorithm was designed to select

nodes in a cubic geometry by using a node ordering mask,

a static, total ordering of all compute nodes, constructed

by taking the shortest path through the machine from node

to node. The mask was then used to order free nodes on

each scheduling cycle, assigning the first N nodes from

this list to a job requiring N nodes. The reader is

encouraged to view an animation[7] illustrating the

construction of the node ordering mask by comparing the

physical and wired views of the machine as nodes are

added to the mask.

Ordering the list of free nodes according to this mask is

computationally no more expensive than sorting them

numerically, so there is no additional overhead in using

this new algorithm.

Figure 5. Xt3dmon wired view showing planar nature of

default node selection algorithm leading to non

contiguous node assignment within a job. Jobs are color

coded. Service nodes are yellow.

To illustrate the node selection differences between the

default and new algorithms on a set of real jobs, a time

lapse animation[8] has been produced that shows a six

hour window starting from an empty state on the machine.

This animation contrasts the differences between the two

algorithms on the same set of jobs and shows how larger

jobs generally get contiguous nodes in a cubic geometry

using the new algorithm while jobs using the old default

node id ordering algorithm have a more planar and non-

contiguous geometry. Figures 5 and 6 also help to

illustrate these differences.

4.0 System Changes to Benefit Specific Jobs

The changes detailed in section 3 were made to help

improve interconnect performance for all jobs. In this

section system changes to accommodate applications that

understand the machine topology and that can assign tasks

to take advantage of node proximity will be reviewed.

For these topology-aware codes each must be given a

specific geometry or shape. In addition the codes must

know the coordinates of the nodes that have been assigned

so that they may assign tasks appropriately.

Figure 6. Xt3dmon wired view showing cubic nature of

new node selection algorithm leading to contiguous node

assignment within a job. Jobs are color coded. Service

nodes are yellow.

Figure 7. Xt3dmon wired view showing an 8x8x8 node

job allocation in red.

4.1 OpenAtom

OpenAtom is a quantum chemistry code that is highly

communications bound and its performance is highly

influenced by placement on a torus topology machine[9].

The goal of the researchers working with this code on

BigBen is to minimize the communication volume of

3

Dimensional Ordering job allocation algorithm leading to non contiguous node assignment within a job. Jobs are color coded. !

I. Dimensional Ordering

Page 10: Topology-aware Job allocation on HPC system

II. Hilbert Curve Ordering

Hilbert Curve on 2D Mesh

Page 11: Topology-aware Job allocation on HPC system

II. Hilbert Curve Ordering

Hilbert Curve on 3D Mesh—2 x 2 x 2

Page 12: Topology-aware Job allocation on HPC system

II. Hilbert Curve Ordering

Hilbert Curve on 3D Mesh—4 x 4 x 4

Page 13: Topology-aware Job allocation on HPC system

II. Hilbert Curve Ordering

Hilbert Curve on 3D Mesh—8 x 8 x 8

Page 14: Topology-aware Job allocation on HPC system

Evaluation Metrics

Parameter Geo-Metrics

α1 Average Pairwise Distance(m1)

α2 Diameter(m2)

α3 Max Dimension(m3)

α4 Distance between Logic Neighbors(m4)

• αi is obtained from running benchmark on Blue Gene/Q • Penalty function p = ∑ αi ·mi

Page 15: Topology-aware Job allocation on HPC system

Communication Pattern

• Broadcast

• P2P

Communication Pattern Dominate MetricAll-to-All Average Pairwise Distance

One-to-All Diameter

Communication Pattern Dominate Metric

Nearest Neighbor Distance Between Neighbors

Page 16: Topology-aware Job allocation on HPC system

I. SDSC Blue

• System: IBM SP at SDSC; 144 nodes; 1152 Processors!• Duration: Apr 2000 to May 2000!• Jobs: 2,440

Traces and Evaluation

Page 17: Topology-aware Job allocation on HPC system
Page 18: Topology-aware Job allocation on HPC system

SDSC-BLUE Trace

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Average_Pairwise_Distance_Difference HSFC vs DO

-2.25

0

2.25

4.5

6.75

9

Page 19: Topology-aware Job allocation on HPC system

Max-Dimension Difference HSFC vs DO

0

1.5

3

4.5

6

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Page 20: Topology-aware Job allocation on HPC system

Diameter Difference HSFC vs DO

-7.5

-5

-2.5

0

2.5

5

7.5

10

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Page 21: Topology-aware Job allocation on HPC system

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Job Runtime Improvement HSFC vs DO

-80%

-40%

0%

40%

80%

120%

160%

Page 22: Topology-aware Job allocation on HPC system

II. LLNL Thunder

• System: Linux Cluster (Thunder) at LLNL; 1024 Nodes; 4096 Processors!• Duration: Feb 2007 to Mar 2007!• Jobs: 2,662

Page 23: Topology-aware Job allocation on HPC system
Page 24: Topology-aware Job allocation on HPC system

LLNL Thunder

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Average_Pairwise_Distance_Difference HSFC vs DO

-15

-7.5

0

7.5

15

22.5

30

Page 25: Topology-aware Job allocation on HPC system

Max_Dimension_Difference HSFC vs DO

-3.5

0

3.5

7

10.5

14

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Page 26: Topology-aware Job allocation on HPC system

Diameter_Difference HSFC vs DO

-15

-7.5

0

7.5

15

22.5

30

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Page 27: Topology-aware Job allocation on HPC system

• Horizontal axis is the id of each job • HSFC—Hilbert Space Filling Curve Ordering • DO—Dimensional Ordering

Job Runtime Improvement HSFC vs DO

-40%

0%

40%

80%

120%

160%

Page 28: Topology-aware Job allocation on HPC system

Thank You

Q&A