Top Banner
Basics Selected Examples of . . . Load Balancing with . . . Page 1 of 30 Parallel and High-Performance Computing 5. Dynamic Load Balancing Hans-Joachim Bungartz 5) Dynamic Load Balancing
30

5) Dynamic Load Balancing Basics Selected Examples of Load ...

Jan 19, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 1 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5) Dynamic Load Balancing

Page 2: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 2 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5.1. Basics

Notions• central topic in distributed environments (nets, clusters, loosely coupled parallel com-

puters): distribution of the computational load among computers or processors

• difficulty: load situation is hard to predict or changes permanently

– example adaptive mesh refinement with partial differential equations: new gridpoints are created and, thus, change the current load situation

– example I/O interaction: time needed is hard to estimate

– example searching: can be successful earlier or later

• unequal load reduces parallel efficiency

• therefore: load distribution or load balancingor scheduling

• one distinguishes

– global scheduling(where do which processes run?) and local scheduling(whena processor does deal with which processes?)

– static load balancing(a priori) and dynamic load balancing(during runtime)

• in the following: dynamic global scheduling

• important: no significant overhead of the measures taken (otherwise: bureaucracywins)

Page 3: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 3 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Important Aspects for the Design of a Strategy

• Which is the objective of the strategy?

– optimization of the system load (computing centre’s or system-oriented point ofview) or optimization of the runtime of applications (users’ application-orientedpoint of view; different for exclusive (dedicated) or shared use)?

– only placementof new processes, or also migration of running processes?

• On which level of integration load distribution is realized?

– tasks to be done: record load, select a strategy, apply a strategy, take thenecessary measures

– Who does the job – the application program (a parallel data base, for example),the runtime system (the runtime system of PVM, e.g.), or the operating system?

• Of which structure is the parallel application?

– Are there any restrictions concerning the mapping of processes to processors(frequently true in numerical simulations – there are location-based relationssuch as geometric neighbourhood)?

• Which units shall be placed or distributed?

– processes (coarse-grain) or threads (fine-grain), parts of programs, objects, ordata (simulations)?

Page 4: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 4 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Classification of Strategies(1)

• with respect to the system model:

– origin of the underlying idea (physics, graph theory, economics)

– original target topology (nets, bus topologies, ... )

– underlying data topology (grids, trees, sets, ...)

• with respect to distribution mechanisms:

– handing over of load only between neighbours or large-distance distributionalso?

– just placement of new processes or real migration?

• with respect to the information flow :

– To whom a processor does communicate his load situation?

– From where does a processor get load-related informations?

Page 5: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 5 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Classification of Strategies (2)

• with respect to coordination:

– central or decentral decision on actions to get started?

– How are decisions taken (autonomous, cooperative, or competitive)?

– Who are the participants of arrangements (neighbours, all)?

• with respect to the underlying algorithms:

– static or dynamic process of decision?

– Who takes the initiative (the idle node, the overloaded node, some master node,the clock)?

– fixed, adaptively adjustable, or even smart strategy?

– Do cost arguments play an important part?

– Are there any security mechanisms against excrescences (load distributiondominates runtime and ruins efficiency)?

Page 6: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 6 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Load Models

• For recording or estimating the load, we need reliable load models.

• Load models are based upon load indices (quantitative measures for the load ofproviders of computing time (processors)):

– simple and composite load indices (one or more characteristic quantities)

– can refer to different functional units (CPU, bus, memory)

– snapshot quantities (describe the situation at one point of time) or integrated oraveraged quantities

– weightings may be fixed a priori or dynamically adjustable

– frequent use of stochastic quantities to take into account external influences

• properties of a good index:

– precisely reflects the target quantity at present

– allows for accurate predictions concerning future

– smoothing behaviour (in order to compensate peaks)

– based upon some simple formula, easy to compute

• example: UNIX load average xload : provides the average number of processes inthe CPU queue

Page 7: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 7 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Principles of Migration

• Which units migrate?

– parts of programs, processes, threads, or data?

• How big are the migrating units?

• Are migrations executed in a delayed way (only at certain points of time, for example)or immediately after the necessity has been perceived?

• Are all units handed over together, or is there some lazy copying (passing of furtherunits on demand only)?

• Where can units migrate to?

– to neighbouring nodes only, within a certain range, or to arbitrary nodes?

• in heterogeneous networks: Can migration only take place between nodes of thesame type, or aren’t there any restrictions?

– important in case of a limited functionality of certain nodes

Page 8: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 8 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5.2. Selected Examples of Load Distribution Strate-gies

Diffusion Model• analogy to diffusion processes in physics (salt in water, colour in water, ice-cube in a

drink)

• balancing of some (initially possibly heterogeneous) concentration

• grid-oriented, balancing only between a node i and its neighbours N(i)

• each pair of neighbours i, j records its local load difference and hands over a certainpercentage of this difference

l(t+1)i := l

(t)i +

Xj∈N(i)

αij

“l(t)j − l

(t)i

”, 1 ≤ i ≤ n, 0 < αij < 1

• the balancing can be

– Jacobi-like: differences are computed at the beginning of step number i, and alllocal migrations are realized according to these differences

– Gauß-Seidel-like: after each migration (also within the i-th balancing step), thelocal differences are computed again

• iterative method!

• For orthogonal d-dimensional grid structures, it can be shown that the choice αij =12d∀i, j is optimal concerning the speed of load balancing.

Page 9: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 9 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Example for the Diffusion Model

• two-dimensional 4× 4-grid (thus 16 processors)

• Jacobi-like diffusion

• hand over 25% of the load difference (round down, if necessary)

• consider the first two steps of the iteration

• average load is 10

• maximum deviation from the average is 22 at the beginning, then 7, then 6

Page 10: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 10 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Bidding

• analogy: mechanisms of price fixing in markets

• supply and demand regulate the load:

– The processor without enough load looks for additional work (sends the infor-mation that he has free capacities).

– The overloaded processor looks for support (sends the information that hewants to get rid of some work).

– The arriving answers are compared.

– If it is possible, a balancing is done; otherwise the processors communicateagain (extended range of recipients, other load or capacity packets, and so on).

• The analysis of the bidding model is quite complex.

Page 11: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 11 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Balanced Allocation

• objective: bidding variant which is easier to analyze

• principle:

– If there is some local overload, select (at random) r nodes.

– Hand over load to that node among the r with the smallest load.

• Both quality and costs increase with increasing r.

Page 12: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 12 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Broker System

• origin of the idea: brokers at the stock exchange

• designed and especially well-suited for hierarchical topologies (trees)

• principle:

– each processor has a broker (realized as a cooperating agent) with local (sub-tree) knowledge

– via an application server, tasks arrive at the local broker and are – dependingon the available budget – processed locally in the subtree or handed over tothe father node (recursion possible)

– on some level (at latest in the root), some price-based decision and allocationare done

– price has to be paid for using resources and for the broking itself (it is cheaperto stay in the subtree than to go to a remote broker)

• very flexible scheme for hierarchical or heterogeneous net topologies

Page 13: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 13 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Random Matching

• origin of the idea: graph theory

• principle:

– construct (by chance) a matching in the topology graph of the net

– topology graph: nodes are processors, edges are direct connections betweennodes

– matching: subset of the edges such that each node occurs at most once

– perfect load balancing along all edges of the matching

• iterative method, several steps are necessary

• matching must be found in parallel

– start with an empty set of edges in each node

– local selection (by chance) of one incident edge in each node

– coordination with neighbouring nodes, solution of conflicts

Page 14: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 14 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Precalculation of the Load

• all that has been said before is based upon local information and local actions

• often expensive (since, from a global point of view, balancing steps that are not reallyhelpful may occur)

• sometimes better:

– global determination of the load, at the beginning or at certain points of time,and global determination of a suitable load distribution

– migrations with less communication

• developed and used for hierarchical topologies, especially (load recording and loadbalancing from the son to the father and vice versa)

Page 15: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 15 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5.3. Load Balancing with Space Filling Curves

Space Filling Curves

• an unconventional load balancing strategy, a bit more in detail

• origin of the idea: analysis and topology (“topological monsters”)

• nice example of a construct from pure mathematics that gets practical relevance onlydecades later

• definition of a space filling curve (SFC), for reasons of simplicity only in 2 D:

– curve: image of a continuous mapping of the unit interval [0, 1] onto the unitsquare [0, 1]2

– space filling: curve covers the whole unit square (mapping is surjective) and,hence, covers an area greater than zero(!)

f : [0, 1] =: I → Q := [0, 1]2 , f surjective and continuous

• prominent representatives:

– Hilbert’s curve : 1891, the most famous space filling curve

– Peano’s curve: 1890, oldest space filling curve

– Lebesgue’s curve: quadtree principle, the most important SFC for computer sci-ence

Page 16: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 16 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Hilbert’s SFC

• the construction follows the geometric conception: if I can be mapped onto Q in thespace filling sense, then each of the four congruent subintervals of I can be mappedto one of the four quadrants of Q in the space filling sense, too

• recursive application of this partitioning and allocation process preserving

– neighbourhood relations: neighbouring subintervals in I are mapped onto neigh-bouring subsquares of Q

– subset relations (inclusion): from I1 ⊆ I2 follows f(I1) ⊆ f(I2)

• limit case: Hilbert’s curve

– from the correspondance of nestings of intervals in I and nestings of squaresin Q, we get pairs of points in I and of corresponding image points in Q

– of course, the iterative steps in this generation process are of practical rele-vance, not the limit case (the SFC) itself

* start with a generator or Leitmotiv (defines the order in which the sub-squares are “visited”)

* apply generator in each subsquare (with appropriate similarity transforma-tions)

* connect the open ends

Page 17: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 17 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Generation Processes with Hilbert’s Generator• classical version of Hilbert:

• variant of Moore:

• modulo symmetry, these are the only two possibilities!

Page 18: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 18 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Some Remarks on the Injectivity

• all iterations, i.e. the longer and longer curves or their generating mappings from I,respectively, are injective, of course

• Hilbert’s curve itself, however, is not injective: there are image points with morethan one corresponding original point in I (look at the defining correlated nestingprocesses in I and Q)

• this is necessary, since:

– Cantor 1878: between two arbitrary-, but finite-dimensional smooth manifolds,there exists a bijective mapping (injective and surjective)

– Netto 1879: if the dimensionalities of two such manifolds are different, the cor-responding bijection can never be continuous (and, hence, defines no SFC)

Page 19: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 19 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Peano’s SFC• ancestor of all SFCs

• subdivision of I and Q into nine congruent subdomains

• definition of a leitmotiv, again, defines the order of visit

• now, there are 273 different (modulo symmetry) possibilites to recursively apply thegenerator preserving neighbourhood and inclusion

serpentine type (left and centre) and meander type (right)

Page 20: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 20 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Lebesgue’s SFC

• many applications in computer science

• compared with the SFCs studied so far, there are several differences:

– both Hilbert’s and Peano’s SFC can be nowhere differentiated, whereas Lebesgue’sSFC can be differentiated almost everywhere

– both Hilbert’s and Peano’s SFC are self-similar (if we apply the mapping to anarbitrary subinterval of I, the result is an SFC of the same type), Lebesgue’sSFC is not self-similar

• continuity and differentiability can be shown

• missing self-similarity: easy to prove and understand, see exercises!

Page 21: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 21 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Definition of Lebesgue’s SFC – the Cantor Set

• Cantor set: remove from I the central third, and go on removing the inner third fromthe remaining subintervals

• binary and ternary numbers:

– 03.x1x2x3... =P...

i=1 xi · 3−i, xi ∈ {0, 1, 2}– 02.x1x2x3... =

P...i=1 xi · 2−i, xi ∈ {0, 1}

• Cantor set is of Lebesgue measure zero and can be formally represented with thehelp of ternary numbers:

C := {03.(2t1)(2t2)(2t3)... ; ti ∈ {0, 1}}

• definition of the mapping on the Cantor set:

fl(03.(2t1)(2t2)(2t3)...) :=

„02.t1t3t5...02.t2t4t6...

«• definition of the mapping between the points of the Cantor set: linear interpolation

Page 22: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 22 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Generator of Lebesgue’s SFC• Lebesgue’s SFC has a generator, too:

• this is just the lexicographic or Morton ordering, well-known from quadtrees and oc-trees

Page 23: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 23 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC – Applications in Computer Science• sequentialization of multi- (i.e. especially high-) dimensional data

• example: search indices in data bases

– high-dimensional data (entry = dimension), nevertheless especially one-dimensionalsearch indices (B-trees: 1 D primary index) and serial concatenation

– drawback of this 1 D proceeding is obvious: find all male Germans with size ofshoes 58 (about 50% of the German population has to be dealt with, further)

– complexity in case of a really high-dimensional index:Qd

i=1 pi with hit ratio pi

in dimension i

– ideal situation: locality-preserving (i.e. essentially continuous) sequentialization(multi-dimensionality is inherent, but 1 D index can be used)

• locality:

– data are stringed sequentially like pearls

– neighbouring points in the unit interval have neighbouring images in the unitsquare

– the other way round (more important): exceptions due to the missing injectivity(there may be several separated original regions of some subdomain in Q;however, originals are restricted to some clusters, in general)

• in the field of data bases (cf. UB trees) and of data mining widespread

Page 24: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 24 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC – Applications in Numerical Simulation

• multi-particle or N -body problems:

– N bodies correlate via forces (gravitation, e.g.)

– examples: astrophysics, molecular dynamics

– N typically very large (107, 108 and more)

– models lead to a system of N ordinary differential equations with a potential astheir right-hand side (summarizing the influence of the N − 1 other bodies)

– global couplings, but influence decreases with increasing distance (this effectallows for simplifications)

– bodies may be spread over space in an irregular way, their positions maychange

• adaptive grids for partial differential equations:

– N grid points are spread over the domain of discretization, typically no regularstructure

– generally, only loose couplings and stationary positions

– adaptivity: new points are created during the computations, others may beremoved

• in both cases: dynamic load balancing is important and nontrivial

Page 25: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 25 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC for Load Distribution

• idea:

1. to points in space, assign points on some iteration of an SFC

2. linear order of the respective SFC’s original points on I

3. simple partitioning (assign points to processors) based on this sequential order

• two techniques for the first step:

– change continuous coordinates (x, y) in Q into binary or quartary codes oflength k: „

xy

«7→

„02.x1x2x3...xk

02.y1y2y3...yk

«7→ 04.w1w2w3...wk

provides quadtree leaf addresses of depth k or an ordering on the k-th iterationof Lebesgue’s SFC

– again, start from the binary representation of length k and determine recursivelythe position on the k-th iteration of Hilbert’s SFC

Page 26: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 26 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC for Load Distribution (cont’d)

• in both cases: keys already provide positions on I

• now to be done: (parallel) sorting of the keys (main computational task of the algo-rithm)

– may be costly at the beginning

– however, later: already almost sorted inputs (only small motions, only a fewnew grid points created per iterative step of the PDE solver)

• finally: update the partitioning (weighted or unweighted), step of migration

Page 27: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 27 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Quality Considerations

• locality:

– continuity guarantees that originals that are close together will be mapped toclose image points (for self-similar SFC, we have even some Lipschitz-continuity)

– more important would be a “continuity of the inverse mapping”; but of course,this is not possible due to the missing injectivity

– numerous theoretical considerations exist

– at least: originals are clustered again (a few clusters only)

– Hilbert’s SFC has the best properties

• load distribution:

– excellent parallelization properties, almost perfect balancing

– communication costs are comparably small

– good efficieny of load distribution already for small problem sizes

Page 28: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 28 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Costs

• communication:

– more complicated (and, hence, longer) subdomain boundaries due to the par-titioning than we get with successive coordinate bisection in kd-trees (halve theload by a cut in y-direction, then halve load in both parts by a cut in x-direction,and so on)

– communication takes place along subdomain boundaries

– overall, a slightly higher communication than with bisection

• load distribution:

– even for large numbers p of processors small costs (just one sorting per step,in contrast to the logp sorting operations with coordinate bisection)

Page 29: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 29 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Relations to Fractals• notion of self-similarity shows close relations to fractals, whose definitions are also

based upon recursively applied similarity transformations

• two examples: Koch’s snow-flake (left) and Sierpinski’s triangle (right)

• both have a non-integer fractal dimension:

dk ≈ 1.2619, ds ≈ 1.580

• SFC, in contrast to that, are areas or volumes, and have, hence, an integer fractaldimension!

Page 30: 5) Dynamic Load Balancing Basics Selected Examples of Load ...

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 30 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

The Fractal Dimension

• start from the generator and the similarity transformations applied to it

• parameters:

– n: number of smaller versions of the generator in the next step of iteration

– r: 0 < r < 1, scaling factor by which the generator is reduced in each step

• definition:

d :=log(n)

log(r−1)

• for “real” curves and surfaces: the same as the conventional (topological) dimension

• examples:

– Koch’s snow-flake: n = 4, r = 13

– Sierpinski’s triangle: n = 3, r = 12

– Hilbert’s curve: n = 4, r = 12