Top Banner
ource Manager for Grid with glo urce Manager for Grid with glob job queue and with planning job queue and with planning based on local schedules based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii, A.V.Orlov, E.V.Huhlaev E.Z.Ljubimskii, A.V.Orlov, E.V.Huhlaev {kvn,kei, {kvn,kei, koryagin,ljubimsk,ao,huh koryagin,ljubimsk,ao,huh }@keldysh.ru }@keldysh.ru Keldysh Institute of Applied Mathematics Keldysh Institute of Applied Mathematics Russian Academy of Sciences Russian Academy of Sciences 1 1
18

Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Resource Manager for Grid with global Resource Manager for Grid with global job queue and with planning job queue and with planning

based on local schedulesbased on local schedules

V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii, A.V.Orlov, E.V.HuhlaevE.Z.Ljubimskii, A.V.Orlov, E.V.Huhlaev

{kvn,kei,{kvn,kei,koryagin,ljubimsk,ao,huhkoryagin,ljubimsk,ao,huh}@keldysh.ru}@keldysh.ru

Keldysh Institute of Applied MathematicsKeldysh Institute of Applied Mathematics

Russian Academy of SciencesRussian Academy of Sciences

Keldysh Institute of Applied MathematicsKeldysh Institute of Applied Mathematics

Russian Academy of SciencesRussian Academy of Sciences

1111

Page 2: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Job submittingJob submitting inin GlobusGlobus systemsystem

Job submittingJob submitting inin GlobusGlobus systemsystem

Job submittingJob submitting by means of by means of BrokerBroker

Job submittingJob submitting by means of by means of BrokerBroker

BrokerBroker

2222

Page 3: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

GRID Resource Broker (GRB) – HPC lab, GRID Resource Broker (GRB) – HPC lab, University of Lecce, Italy and CACR, California University of Lecce, Italy and CACR, California Institute of Technology. http://sara.unile.It/grb/Institute of Technology. http://sara.unile.It/grb/

EZ-Grid - Department of Computer Science, EZ-Grid - Department of Computer Science, University of Houston. University of Houston.

http: //www.cs.uh.edu/~ ezgrid/http: //www.cs.uh.edu/~ ezgrid/

GRID Resource Broker (GRB) – HPC lab, GRID Resource Broker (GRB) – HPC lab, University of Lecce, Italy and CACR, California University of Lecce, Italy and CACR, California Institute of Technology. http://sara.unile.It/grb/Institute of Technology. http://sara.unile.It/grb/

EZ-Grid - Department of Computer Science, EZ-Grid - Department of Computer Science, University of Houston. University of Houston.

http: //www.cs.uh.edu/~ ezgrid/http: //www.cs.uh.edu/~ ezgrid/

Resource BrokersResource BrokersResource BrokersResource Brokers

MetaDispatcher – Keldysh Institute of MetaDispatcher – Keldysh Institute of Applied Mathematics, MoscowApplied Mathematics, Moscow

MetaDispatcher – Keldysh Institute of MetaDispatcher – Keldysh Institute of Applied Mathematics, MoscowApplied Mathematics, Moscow

3333

Page 4: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Job submittingJob submitting inin GlobusGlobus systemsystem

Job submittingJob submitting inin GlobusGlobus systemsystem

Job submittingJob submitting by means of by means of BrokerBroker

Job submittingJob submitting by means of by means of BrokerBroker

BrokerBroker

4444

Page 5: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Architecture of MetaDispatcherArchitecture of MetaDispatcherArchitecture of MetaDispatcherArchitecture of MetaDispatcher

Client Metadispatcher

JOBS SPOOL

Scheduler

(reacts to events)

Job monitor

Target GRAM Job manager

Request (RSL) status

Proxy

interception

submit status cancel clean get-output

Cli

en

t u

tili

tie

s

submit status cancel get-output clean C

om

ma

nd

in

terp

reta

tor

Deleg. Proxy

Loc.copies executable

stdin

Ga

tek

ee

pe

r

Jo

bm

an

ag

er-

me

ta

Start (jobid)

Gatekeeper

Deleg.-2 Proxy

Bufferized

stdout, stderr

Cancel tail Cleanup

MDS GRIS GIIS GIIS

Pro

ving

O

f Job

GIIS

Statics

Dyn

amic

s

Client Metadispatcher

JOBS SPOOL

Scheduler

(reacts to events)

Job monitor

Target GRAM Job manager

Request (RSL) status

Proxy

interception

submit status cancel clean get-output

Cli

en

t u

tili

tie

s

submit status cancel get-output clean C

om

ma

nd

in

terp

reta

tor

Deleg. Proxy

Loc.copies executable

stdin

Ga

tek

ee

pe

r

Jo

bm

an

ag

er-

me

ta

Start (jobid)

Gatekeeper

Deleg.-2 Proxy

Bufferized

stdout, stderr

Cancel tail Cleanup

MDS GRIS GIIS GIIS

Pro

ving

O

f Job

GIIS

Statics

Dyn

amic

s

5555

Page 6: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Problem of schedulingProblem of schedulingThe problem of scheduling is decided The problem of scheduling is decided

on two sets: 1) the set of jobs and 2) on two sets: 1) the set of jobs and 2) the set of computing elements. the set of computing elements.

Scheduling results: Scheduling results:

-The dispatch time for each jobThe dispatch time for each job

-The place, where the job should be The place, where the job should be directed and executed directed and executed

Problem of schedulingProblem of schedulingThe problem of scheduling is decided The problem of scheduling is decided

on two sets: 1) the set of jobs and 2) on two sets: 1) the set of jobs and 2) the set of computing elements. the set of computing elements.

Scheduling results: Scheduling results:

-The dispatch time for each jobThe dispatch time for each job

-The place, where the job should be The place, where the job should be directed and executed directed and executed

6666

Page 7: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Config. Config. Config. Config.

Config. fileConfig. fileConfig. fileConfig. file

Two management levels - local and global, each having Two management levels - local and global, each having own objects: job, queue, and management system - own objects: job, queue, and management system - Local Resource Monitor (LRM) and MetaDispatcher.Local Resource Monitor (LRM) and MetaDispatcher.

Two management levels - local and global, each having Two management levels - local and global, each having own objects: job, queue, and management system - own objects: job, queue, and management system - Local Resource Monitor (LRM) and MetaDispatcher.Local Resource Monitor (LRM) and MetaDispatcher.

Global levelGlobal levelGlobal levelGlobal level

LRMLRM

LocalLocalqueuequeue

Local levelLocal levelLocal levelLocal level

MetaDispatcherMetaDispatcherMetaDispatcherMetaDispatcher

jobjobjobjob

jobjobjobjob

jobjobjobjobjobjobjobjob

Global Global

queuequeue

7777

Page 8: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Question 1Question 1: : In What Order Should In What Order Should the Global Jobs Be Served?the Global Jobs Be Served?

Question 1Question 1: : In What Order Should In What Order Should the Global Jobs Be Served?the Global Jobs Be Served?

The order, in which the scheduler serves the job The order, in which the scheduler serves the job queue, should differ from FIFO.queue, should differ from FIFO.

User should have available the management User should have available the management facilities for placing his job at any position in the facilities for placing his job at any position in the global queue.global queue.

To achieve that:To achieve that:

Limited budget is allocated to each user.Limited budget is allocated to each user.

Within the budget limits user prices his jobs.Within the budget limits user prices his jobs.

Function GP evaluates Function GP evaluates global priorityglobal priority of the job: of the job:

GP=GP(price, required resources, run timeGP=GP(price, required resources, run time ))

The order, in which the scheduler serves the job The order, in which the scheduler serves the job queue, should differ from FIFO.queue, should differ from FIFO.

User should have available the management User should have available the management facilities for placing his job at any position in the facilities for placing his job at any position in the global queue.global queue.

To achieve that:To achieve that:

Limited budget is allocated to each user.Limited budget is allocated to each user.

Within the budget limits user prices his jobs.Within the budget limits user prices his jobs.

Function GP evaluates Function GP evaluates global priorityglobal priority of the job: of the job:

GP=GP(price, required resources, run timeGP=GP(price, required resources, run time ))

job

job

job

job

job

jobjobjobjobjob

jobjobjobjob

jobjobjobjob

jobjobjobjob

jobjobjobjob

jobjobjobjob

new jobnew jobnew jobnew job

8888

Page 9: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Question 2:Question 2: When When ForwardForward a Job to a a Job to a Target Computing Element?Target Computing Element?

Question 2:Question 2: When When ForwardForward a Job to a a Job to a Target Computing Element?Target Computing Element?

jobjobjobjobjobjobjobjob

jobjobjobjobjobjobjobjob

IfIf destination point of a job is determined at destination point of a job is determined at the moment, when it comes in to a global the moment, when it comes in to a global queue, and the job is immediately routed to queue, and the job is immediately routed to a local queue…a local queue…

IfIf destination point of a job is determined at destination point of a job is determined at the moment, when it comes in to a global the moment, when it comes in to a global queue, and the job is immediately routed to queue, and the job is immediately routed to a local queue…a local queue…

itit may be delayed there because of the local may be delayed there because of the local job arrival.job arrival. At the same time resources of At the same time resources of other computing elements may become free other computing elements may become free and idleand idle..

itit may be delayed there because of the local may be delayed there because of the local job arrival.job arrival. At the same time resources of At the same time resources of other computing elements may become free other computing elements may become free and idleand idle..

The conclusion:The conclusion:It is more reasonablly to store global jobs in global queue It is more reasonablly to store global jobs in global queue as long as possible, best of all up to the moment of start.as long as possible, best of all up to the moment of start.

The conclusion:The conclusion:It is more reasonablly to store global jobs in global queue It is more reasonablly to store global jobs in global queue as long as possible, best of all up to the moment of start.as long as possible, best of all up to the moment of start.

new jobnew job

jobjobjobjobjobjobjobjob

jobjobjobjobjobjobjobjob

jobjobjobjobjobjobjobjob

jobjobjobjob

9999

Page 10: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

The scheduling model of computing The scheduling model of computing installation:installation:

A set of resourcesA set of resources

Resource description:Resource description:Static attributes: (OS type, CPU time, memory volume)Static attributes: (OS type, CPU time, memory volume)

Dynamic attributes: free/busy, resource amountDynamic attributes: free/busy, resource amount

The scheduling model of computing The scheduling model of computing installation:installation:

A set of resourcesA set of resources

Resource description:Resource description:Static attributes: (OS type, CPU time, memory volume)Static attributes: (OS type, CPU time, memory volume)

Dynamic attributes: free/busy, resource amountDynamic attributes: free/busy, resource amount

Question 3:Question 3: To Which Computing To Which Computing Elements a Job Should Be Passed? Elements a Job Should Be Passed? Question 3:Question 3: To Which Computing To Which Computing Elements a Job Should Be Passed? Elements a Job Should Be Passed?

10101010

Page 11: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Resource Release TimeResource Release Time Resource Release TimeResource Release Time

However the scheduler must have a guarantee, However the scheduler must have a guarantee, that the planned global job will really start and that the planned global job will really start and will not stay waiting in a local queue.will not stay waiting in a local queue.

However the scheduler must have a guarantee, However the scheduler must have a guarantee, that the planned global job will really start and that the planned global job will really start and will not stay waiting in a local queue.will not stay waiting in a local queue.

Resource

TimeRunning jobRunning job

Running jobRunning job

Running jobRunning job

Busy resources have an Busy resources have an additional attribute – release additional attribute – release time estimated from the time estimated from the request of a running job. request of a running job. Being aware of the release Being aware of the release time, the scheduler is able to time, the scheduler is able to plan the future usage of the plan the future usage of the busy resource. busy resource.

Busy resources have an Busy resources have an additional attribute – release additional attribute – release time estimated from the time estimated from the request of a running job. request of a running job. Being aware of the release Being aware of the release time, the scheduler is able to time, the scheduler is able to plan the future usage of the plan the future usage of the busy resource. busy resource.

11111111

Page 12: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

+

Question 4:Question 4: How the Interaction of the Global How the Interaction of the Global Scheduler and Local Resource Monitor Should Be Scheduler and Local Resource Monitor Should Be

Organized?Organized?

Question 4:Question 4: How the Interaction of the Global How the Interaction of the Global Scheduler and Local Resource Monitor Should Be Scheduler and Local Resource Monitor Should Be

Organized?Organized?

Autonomy of computing element:Autonomy of computing element:Each computing element of the Grid belongs to a certain owner that Each computing element of the Grid belongs to a certain owner that could be able to restrict access for external jobs completely or partly.could be able to restrict access for external jobs completely or partly.

Autonomy of computing element:Autonomy of computing element:Each computing element of the Grid belongs to a certain owner that Each computing element of the Grid belongs to a certain owner that could be able to restrict access for external jobs completely or partly.could be able to restrict access for external jobs completely or partly.

If global and local jobs make demands for the same resources, their If global and local jobs make demands for the same resources, their priorities are compared. For this purpose each computing element i priorities are compared. For this purpose each computing element i determines the function LPi() that calculates the local priority of a determines the function LPi() that calculates the local priority of a global job. This function depends on job’s price, consumable global job. This function depends on job’s price, consumable resources and run time:resources and run time:

LPi = LPi (price, consumable resources, run time) LPi = LPi (price, consumable resources, run time)

If global and local jobs make demands for the same resources, their If global and local jobs make demands for the same resources, their priorities are compared. For this purpose each computing element i priorities are compared. For this purpose each computing element i determines the function LPi() that calculates the local priority of a determines the function LPi() that calculates the local priority of a global job. This function depends on job’s price, consumable global job. This function depends on job’s price, consumable resources and run time:resources and run time:

LPi = LPi (price, consumable resources, run time) LPi = LPi (price, consumable resources, run time)

If two jobs, local and global,If two jobs, local and global, ask for free resources, which one ask for free resources, which one should be preferred?should be preferred? If two jobs, local and global,If two jobs, local and global, ask for free resources, which one ask for free resources, which one should be preferred?should be preferred?

Question 4:Question 4: How should the interaction of the How should the interaction of the global scheduler and local resource monitor global scheduler and local resource monitor

be organized?be organized?

Question 4:Question 4: How should the interaction of the How should the interaction of the global scheduler and local resource monitor global scheduler and local resource monitor

be organized?be organized?

12121212

Page 13: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

+

Question 4:Question 4: How the Interaction of the Global How the Interaction of the Global Scheduler and Local Resource Monitor Should Be Scheduler and Local Resource Monitor Should Be

Organized?Organized?

Question 4:Question 4: How the Interaction of the Global How the Interaction of the Global Scheduler and Local Resource Monitor Should Be Scheduler and Local Resource Monitor Should Be

Organized?Organized?

The global scheduler should distribute its jobs so that the global jobs The global scheduler should distribute its jobs so that the global jobs would not withhold would not withhold the start of any more "expensive” local jobs. the start of any more "expensive” local jobs. The global scheduler should distribute its jobs so that the global jobs The global scheduler should distribute its jobs so that the global jobs would not withhold would not withhold the start of any more "expensive” local jobs. the start of any more "expensive” local jobs.

Resource

TimeRunning jobRunning job

Running jobRunning job

Global queueGlobal queue

PPGG<P<PLL

PPGG

PPGG= LP(job= LP(jobGG))

jobjobGG

PPLLLocal queueLocal queue

jobjobLL

13131313

Page 14: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

ScheduleScheduleScheduleSchedule

ResourceResource

Future

Time

Future

Time

Running jobRunning jobRunning jobRunning job

Running jobRunning jobRunning jobRunning job

Running jobRunning jobRunning jobRunning job

priority1priority1priority1priority1priority2priority2priority2priority2

priority4priority4priority4priority4

priority3priority3priority3priority3

The The local schedulelocal schedule is the plan of resource occupation by local jobs is the plan of resource occupation by local jobs for some period of time in the future. for some period of time in the future.

Local schedule: Local schedule: For each local jobFor each local job

{priority, assigned resources, occupation and release time}{priority, assigned resources, occupation and release time}

The The local schedulelocal schedule is the plan of resource occupation by local jobs is the plan of resource occupation by local jobs for some period of time in the future. for some period of time in the future.

Local schedule: Local schedule: For each local jobFor each local job

{priority, assigned resources, occupation and release time}{priority, assigned resources, occupation and release time}14141414

Page 15: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

The local schedule is drawn up by the special The local schedule is drawn up by the special agentsagents of the global scheduler. Such agents, of the global scheduler. Such agents, working on each computing installation, arrange the working on each computing installation, arrange the schedule in precise conformity with scheduling schedule in precise conformity with scheduling strategy and configuration parameters of the local strategy and configuration parameters of the local monitor.monitor.

The actual state of all local schedules is The actual state of all local schedules is delivered to the delivered to the information baseinformation base of the global of the global scheduler, and, thus, it has available the scheduler, and, thus, it has available the information about the usage plan of all virtual information about the usage plan of all virtual organization resources. organization resources.

On the basis of this aggregate schedule the On the basis of this aggregate schedule the scheduler can scheduler can make upmake up the layout of global jobs the layout of global jobs

allocation to resources.allocation to resources.

The local schedule is drawn up by the special The local schedule is drawn up by the special agentsagents of the global scheduler. Such agents, of the global scheduler. Such agents, working on each computing installation, arrange the working on each computing installation, arrange the schedule in precise conformity with scheduling schedule in precise conformity with scheduling strategy and configuration parameters of the local strategy and configuration parameters of the local monitor.monitor.

The actual state of all local schedules is The actual state of all local schedules is delivered to the delivered to the information baseinformation base of the global of the global scheduler, and, thus, it has available the scheduler, and, thus, it has available the information about the usage plan of all virtual information about the usage plan of all virtual organization resources. organization resources.

On the basis of this aggregate schedule the On the basis of this aggregate schedule the scheduler can scheduler can make upmake up the layout of global jobs the layout of global jobs

allocation to resources.allocation to resources.

15151515

Page 16: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Data BaseData Base

jobjobjobjob

jobjobjobjob

jobjobjobjob

jobjobjobjob

Global Global queuequeue

PProgram architecture of schedulingrogram architecture of schedulingPProgram architecture of schedulingrogram architecture of scheduling

AgentAgent

LRMLRMAgentAgent

LRMLRMAgentAgent

QueueQueue

LRMLRM

SchedulerSchedulerSchedulerScheduler

16161616

Page 17: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

The global schedulerThe global scheduler implement implementing ing certaincertain scheduling strategscheduling strategy make up the global schedule.y make up the global schedule.

The information baseThe information base resides adjacently with the resides adjacently with the scheduler and stores aggregate schedule. scheduler and stores aggregate schedule. ForFor data data management the distributed management the distributed systemsystem like like SSpitfire of pitfire of DDatagrid project atagrid project with relational data base as a core is with relational data base as a core is considered.considered.

TThe local agenthe local agentss of the scheduler works on each of the scheduler works on each computing computing elementelement. Interacting with the local . Interacting with the local resource monitor, the agent resource monitor, the agent arrangesarranges a local a local schedule of this computing element and transfers schedule of this computing element and transfers updates to the global scheduler. updates to the global scheduler. Proposed Proposed implementation is based on Maui schedulerimplementation is based on Maui scheduler. .

The global schedulerThe global scheduler implement implementing ing certaincertain scheduling strategscheduling strategy make up the global schedule.y make up the global schedule.

The information baseThe information base resides adjacently with the resides adjacently with the scheduler and stores aggregate schedule. scheduler and stores aggregate schedule. ForFor data data management the distributed management the distributed systemsystem like like SSpitfire of pitfire of DDatagrid project atagrid project with relational data base as a core is with relational data base as a core is considered.considered.

TThe local agenthe local agentss of the scheduler works on each of the scheduler works on each computing computing elementelement. Interacting with the local . Interacting with the local resource monitor, the agent resource monitor, the agent arrangesarranges a local a local schedule of this computing element and transfers schedule of this computing element and transfers updates to the global scheduler. updates to the global scheduler. Proposed Proposed implementation is based on Maui schedulerimplementation is based on Maui scheduler. .

17171717

Page 18: Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii,

Future directions:Future directions:

Backfill algorithm implementation at the Backfill algorithm implementation at the global level to avoid blocking of the global level to avoid blocking of the jobs.jobs.

AdvanceAdvancedd resource reservation for resource reservation for distributed multiprocessor jobs.distributed multiprocessor jobs.

Economical model of virtual Economical model of virtual organiorganizzation as applied to scheduling. ation as applied to scheduling.

Future directions:Future directions:

Backfill algorithm implementation at the Backfill algorithm implementation at the global level to avoid blocking of the global level to avoid blocking of the jobs.jobs.

AdvanceAdvancedd resource reservation for resource reservation for distributed multiprocessor jobs.distributed multiprocessor jobs.

Economical model of virtual Economical model of virtual organiorganizzation as applied to scheduling. ation as applied to scheduling.

18181818