CS528 Intro ParamIshanand Task Scheduling (Part I)

CS528

Intro Param Ishan and

Task Scheduling (Part I)

A Sahu

Dept of CSE, IIT Guwahati

1A Sahu

Outline

• Scheduling Concepts

• Independent Tasks, Dependent Tasks

A Sahu

PARAM ISHANPARAM ISHANPARAM ISHANPARAM ISHAN

• 250 Teraflops Peak computing facility• Total 162 compute Nodes• 2 Master Nodes• 4 Login Nodes• Mellanox FDR (56Gbps) 324 port chassis

switch as primary high speed interconnect• 300TB Storage with 15GB/s write throughput

based on lustre parallel file system

System Overview

Schematic Diagram

Compute NodesCompute NodesCompute NodesCompute Nodes

� 16 nodes� 384 cpu cores� 2 x Intel Xeon E5-2680 v3, 12-

core, 2.5 GHz processors per node

� 64 GB of physical memory per node

� GPU accelerator 2 x NVIDIA Tesla K40 per node

� Compute power of 60 Tfops/s

� 16 nodes� 384 cpu cores� 2 x Intel Xeon E5-2680 v3, 12-

core, 2.5 GHz processors per node� 64 GB of physical memory per

node� MIC accelerator 2x Intel Xeon Phi

7120 per node� Compute power of 47.36 Tfops/s

� 126 nodes� 3024 cores� 2 x Intel Xeon E5-2680 v3, 12-

core, 2.5 GHz processors per node

� 64 GB of physical memory per node

� Compute power of 121 Tflops

� 4 nodes� 96 cores� 2 x Intel Xeon E5-2680 v3, 12-

core, 2.5 GHz processors per node� 512 GB of physical memory per

node� Compute power of 3.8 Teraflops

Compute without any AcceleratorHigh Memory Compute Nodes without

any Accelerator

Compute Nodes with GPU Compute Nodes with Xeon Phi

Software StackSoftware StackSoftware StackSoftware Stack

HPC

Programming

Tools

Application Libraries Ferret/GRADS/PARAview

Development Tools Intel Cluster

Studio 2016

GNU

(GCC 4.4.7 & 5.2)

Driver/System

Libraries Intel MPSS 3.6.1 CUDA 7.5 Mellanox OFED 2.4-1.0.4

Resource

Management/

Job Scheduling

SLURM 15.08.6

Middleware

Applications

and

Management

File System NFS Local FS

(ext3, ext4, XSF) Lustre 2.5

Provisioning Bright Cluster Manager 7.2

Cluster Monitoring Bright Cluster Manager 7.2

Remote Power

Mgmt RMM4

Remote Console

Mgmt RMM4

Operating

System CentOS 6.6

HPC

Programming

Tools

Application Libraries Ferret/GRADS/PARAview

Development Tools Intel Cluster

Studio 2016

GNU

(GCC 4.4.7 & 5.2)

Driver/System

Libraries Intel MPSS 3.6.1 CUDA 7.5 Mellanox OFED 2.4-1.0.4

Resource

Management/

Job Scheduling

SLURM 15.08.6

Middleware

Applications

and

Management

File System NFS Local FS

(ext3, ext4, XSF) Lustre 2.5

Provisioning Bright Cluster Manager 7.2

Cluster Monitoring Bright Cluster Manager 7.2

Remote Power

Mgmt RMM4

Remote Console

Mgmt RMM4

Operating

System CentOS 6.6

HPC scheduling : Large Scale� When

� number of node is 162 and number of cores in system is 162*24=3888 cores

� Number of Jobs and users around 1000� Manual scheduling and Gant chart

depiction is not possible� SLURM : Simple Linux Resource

Management uses SQL data base to store Gant chart and scheduling

File-Systems� Home

− 100TB lustre based Storage

− 30GB default quota

� Scratch− 10GB/sec write throughput

− Users are recommended to use this file-

system during execution of their job

− They must transfer back their data to home

file-system

� Archive− Policy based movement of Home file-system

data to archive filesystem

Access to Cluster

� ssh to param-ishan.iitg.ernet.in�

� Users will get one login node out of 4 login nodes in round robin fashion

� For GPU jobs ssh to GPU login node� For Intel Xeon Phi/MIC jobs ssh to MIC

login node�

cpu-login1 cpu-login2 gpu-login mic-login

A Sahu

Google “Scheduling Algorithm Brucker pdf” to get

a PDF copy of the Book

• Find time slots in which activities (or jobs)

should be processed under given constraints.

• Constraints

– Resource constraints

– Precedence constraints between activities.

• A quite general scheduling problem is

– Resource Constrained Project Scheduling Problem

(RCPSP)

A Sahu

• We have

– Activities j = 1, ... , n with processing times pj.

– Resources k = 1, ... , r. A constant amount of Rk units

of resource k is available at any time.

– During processing, activity j occupies rjk units of

resource k for k = 1, ... , r.

– Precedence constrains i → j between some activities

i, j with the meaning that activity j cannot start

before i is finished..

A Sahu

• Objective : Determine starting times Sj for all

activities j in such a way that

– at each time t the total demand for resource k is

not greater than the availability Rk for k = 1, ... , r,

– the given precedence constraints are fulfilled, i. e.

Si+ pi ≤ Sj if i → j ,

A Sahu

• Some objective function f(C1, ... , Cn) is

minimized where Cj = Sj + pj is the completion

time of activity j.

• The fact that activities j start at time Sj and

finish at time Sj + pj implies that the activities j

are not preempted.

• We may relax this condition by allowing

preemption (activity splitting).

A Sahu

• Consider a project with n = 4 activities, r = 2

• resources with capacities R1 = 5 and R2 = 7,

• A precedence relation 2 → 3 and the following data:

i 1 2 3 4

pi 4 3 5 8

ri1 2 1 2 2

ri2 3 5 3 4

2 → 3

A corresponding schedule with minimal makespanTime

2

2

3

3

R2=7

R1=5

4

4

1

1

A Sahu

• Production scheduling

• Robotic cell scheduling

• Computer Processor scheduling

• Timetabling

• Personnel scheduling

• Railway sc

• Air traffic control, Etc.

A Sahu

• Most machine scheduling problems are special

cases of the RCPSP.

– Single machine problems,

• Online Problem: FCFS, SJF, SRF, RR…

– Parallel machine problems, and

– Shop scheduling problems, etc.

A Sahu

• We have n jobs j =1, ... , n to be processed on

a single machine. Additionally precedence

constraints between the jobs may be given.

• This problem can be modeled by an RCPSP

with r = 1, R1 = 1, and rj1 = 1 for all jobs j.

A Sahu

• P: We have jobs j as before and m identical

machines M1, ... , Mm .

• The processing time for j is the same on each

machine.

• One has to assign the jobs to the machines

and to schedule them on the assigned

machines.

• This problem corresponds to an RCPSP with r

= 1, R1 = m, and rj1 = 1 for all jobs j.

A Sahu

1

2

3

4

8

5

6 7

A Sahu

2 6 7

1 3

4 8

5

M1

M2

M3

0 1 2 3 4 5 6 7 8 9

• Q: The machines are called uniform if pjk = pj/rk.

• R: For unrelated machines the processing time pjk depends on the machine Mk on which j is processed.

• MPM: In a problem with multi-purpose machines a set of machines µj is isassociated with each job j indicating that j can be processed on one machine in µj

only.

A Sahu

Parallel Machines

Ti P1 P2 P3 P4

T1 10 10 10 10

T2 12 12 12 12

T3 16 16 16 16

T4 20 20 20 20

Ti P1 P2 P3 P4

T1 10 15 20 25

T2 12 18 24 30

T3 16 24 32 40

T4 20 30 40 50

Ti P1 P2 P3 P4

T1 10 8 12 2

T2 12 28 25 13

T3 16 4 32 14

T4 20 38 42 22

Q: Uniform : with

speed difference

(S1=1, S2=2/3,

S3=1/2, S4=2/5

P: Identical

R: Unrelated :

heterogeneous

Classes of scheduling problems can be specified

in terms of the three-field classification

α | β | γwhere

• α specifies the machine environment,

• β specifies the job characteristics, and

• γ describes the objective function(s).

A Sahu

If the number of machines is fixed to m we write

Pm, Qm, Rm, MPMm, Jm, Fm, Om.

Symbol Meaning

1 Single Machine

P Parallel Identical Machine

Q Uniform Machine

R Unrelated Machine

MPM Multipurpose Machine

J Job Shop

F Flow Shop

A Sahu

Symbol meaning

pmtn preemption

rj release times

dj deadlines

pj = 1 or pj = p or

pj ∈ {1,2}

restricted processing times

prec arbitrary precedence constraints

intree (outtree) intree (or outtree) precedence

chains chain precedence

series-parallel a series-parallel precedence graph

A Sahu

Two types of objective functions are most

common:

• bottleneck objective functions

max {fj(Cj) | j= 1, ... , n}, and

• sum objective functions Σ Σ Σ Σ fj(Cj) = f1(C1) +

f2(C2) + ... ... + fn(Cn) .

Cj is completion time of task j

A Sahu

• Cmax and Lmax symbolize the bottleneck

objective

– Cmax objective functions with fj(Cj) = Cj (makespan)

– Lmax objective functions fj(Cj) = Cj - dj (maximum

Lateness)

• Common sum objective functions are:

– Σ Σ Σ Σ Cj (mean flow-time)

– Σ Σ Σ Σ ωωωωj Cj (weighted flow-time)

A Sahu

• Σ Σ Σ Σ Uj (number of late jobs) and Σ Σ Σ Σ ωωωωj Uj

(weighted number of late jobs) where Uj = 1 if

Cj > dj and Uj = 0 otherwise.

• Σ Σ Σ Σ Tj (sum of tardiness) and Σ Σ Σ Σ ωωωωj Tj (weighted

sum of tardiness/lateness) where the

tardiness of job j is given by

Tj = max { 0, Cj - dj }.

A Sahu

• 1 | prec; pj = 1 | Σ Σ Σ Σ ωωωωj Cj

• P2 | | Cmax

• P | pj = 1; rj | Σ Σ Σ Σ ωωωωj Uj

• R2 | chains; pmtn | Cmax

• R | n = 3 | Cmax

• P | pij = 1; outtree; rj | Σ Σ Σ Σ Cj

• Q| pj = 1 | Σ Σ Σ Σ Tj

A Sahu

• A problem is called polynomially solvable if it

can be solved by a polynomial algorithm.

Example

1 | | Σ ωjCj can be solved by

Scheduling the jobs in an ordering of non-

increasing ωj/pj - values.

Complexity: O(n log n)

A Sahu

Example

1 | | Σ Cj can be solved by

Scheduling the jobs in an ordering of non-

increasing 1/pj - values. == > SJF

Ci =Qi+Pi : Waiting time + Processing time

(SJF is optimal)

Complexity: O(n log n)

A Sahu

CS528 Intro ParamIshanand Task Scheduling (Part I)

Documents