Top Banner
6d.1 Schedulers and Resource Brokers ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.
45

Schedulers and Resource Brokers

Mar 20, 2016

Download

Documents

finnea

Schedulers and Resource Brokers. ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson. Scheduler. Job manager submits jobs to scheduler. Scheduler assigns work to resources to achieve specified time requirements. Scheduling. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Schedulers and Resource Brokers

6d.1

Schedulers and Resource Brokers

ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.

Page 2: Schedulers and Resource Brokers

6d.2

Scheduler

• Job manager submits jobs to scheduler.

• Scheduler assigns work to resources to achieve specified time requirements.

Page 3: Schedulers and Resource Brokers

6d.3

Scheduling

From "Introduction to Grid Computing with Globus," IBM Redbooks

Page 4: Schedulers and Resource Brokers

6d.4

Executing GT 4 jobs

Globus has two modes:

• Interactive/interactive-streaming• Batch

Page 5: Schedulers and Resource Brokers

6d.5

GT 4 “Fork” Scheduler

• GT 4 comes with a “fork” scheduler which attempts to execute the job immediately

• Provided for starting and controlling a job on a local host if job does not require any special software loaded or requirements.

• Other schedulers have to be added separately, using an “adapter.”

Page 6: Schedulers and Resource Brokers

6d.6

Batch scheduling

• Batch, a term form old computing days, when one submitted a pack of punched cards as the program to a computer and one would come back after the program had been run on the computer, maybe overnight.

Page 7: Schedulers and Resource Brokers

6d.7

GRAMservices

GT4 Java ContainerGRAM

services Localscheduler

Userjob

Compute element

GRAMadapter

Local jobcontrolJob

functions

Relationship between GT4 GRAM and a Local Scheduler

I Foster

Client

Various possible

Page 8: Schedulers and Resource Brokers

6d.8

Scheduler adapters included in GT 4

• PBS (Portable Batch System)• Condor• LSF (Load Sharing Facility)

Third party adapter provided for:• SGE (Sun Grid Engine)

Page 9: Schedulers and Resource Brokers

6d.9

“Meta-schedulers”

• Loosely defined as a higher level scheduler that can scheduler jobs between sites.

Page 10: Schedulers and Resource Brokers

6d.10

(Local) Scheduler Issues

• Distribute job• Based on load and characteristics of

machines, available disk storage, network characteristics, … .

• Both globally and locally. • Runtime scheduling!

• Arrange data in right place (Staging)– Data Replication and movement as needed– Data Error checking

Page 11: Schedulers and Resource Brokers

6d.11

Scheduler Issues (continued)• Performance

– Error checking – check pointing– Monitoring job, progress monitoring– QOS (Quality of service)– Cost (an area considered by Nimrod-G)

• Security– Need to authenticate and authorize remote

user for job submission• Fault Tolerance

Page 12: Schedulers and Resource Brokers

6d.12

Batch Scheduling policies

• First-in, First-out• Favor certain types of jobs• Shortest job first• Smallest (or largest) memory first• Short(or long) running job first• Fair sharing or priority to certain users• Dynamic policies

– Depending upon time of day and load– Custom, preemptive, process migration

Page 13: Schedulers and Resource Brokers

6d.13

Advance Reservation• Requesting actions at times in future. • “A service level agreement in which the

conditions of the agreement start at some agreed-upon time in the future” [2]

[2] “The Grid 2, Blueprint for a New Computing Infrastructure,” I. Foster and C. Kesselman editors, Morgan Kaufmann, 2004.

Page 14: Schedulers and Resource Brokers

6d.14

Resource Broker

• “A scheduler that optimizers the performance of a particular resource. Performance may be measured by such criteria as fairness (to ensure that all requests for the resources are satisfied) or utilization (to measure the amount of the resource used).” [2]

Page 15: Schedulers and Resource Brokers

6d.15

Scheduler/Resource Broker Examples

Schedulers/Resource Brokers available that work with Globus:

• Condor/Condor-G

• Sun Grid Engine– To be covered by James Ruff and to be

used in Assignment 4 this year.

Page 16: Schedulers and Resource Brokers

6d.16

Condor

• First developed at University of Wisconsin-Madison in mid 1980’s to convert a collection of distributed workstations and clusters into a high-throughput computing facility.

• Key concept - using wasted computer power of idle workstations.

Page 17: Schedulers and Resource Brokers

6d.17

Condor

• Converts collections of distributed workstations and dedicated clusters into a distributed high-throughput computing facility.

Page 18: Schedulers and Resource Brokers

6d.18

Features

• Include:– Resource finder– Batch queue manager– Scheduler– Checkpoint/restart– Process migration

Page 19: Schedulers and Resource Brokers

6d.19

Intended to complete job even if:

• Machines crash • Disk space exhausted • Software not installed • Machines are needed by others• Machines are managed by others• Machines are far away

Page 20: Schedulers and Resource Brokers

6d.20

Uses• Consider following scenario:

– I have a simulation that takes two hours to run on my high-end computer

– I need to run it 1000 times with slightly different parameters each time.

– If I do this on one computer, it will take at least 2000 hours (or about 3 months)

From: “Condor: What it is and why you should worry about it,” by B. Beckles, University of Cambridge, Seminar, June 23, 2004

Page 21: Schedulers and Resource Brokers

6d.21

– Suppose my department has 100 PCs like mine that are mostly sitting idle overnight (say 8 hours a day).

– If I could use them when their legitimate users are not using them, so that I do not inconvenience them, I could get about 800 CPU hours/day.

– This is an ideal situation for Condor.

• I could do my simulations in 2.5 days.

From: “Condor: What it is and why you should worry about it,” by B. Beckles, University of Cambridge, Seminar, June 23, 2004

Page 22: Schedulers and Resource Brokers

6d.22

• The Condor high-throughput computing system

• Condor-G agent for grid computing

Page 23: Schedulers and Resource Brokers

6d.23

HTCS

• Distributed batch computing system– Provide large amounts of fault-tolerant

computing power.• Opportunistic computing.

– Effectively utilizing all resources on the network

• Scavenger (but polite!)

Page 24: Schedulers and Resource Brokers

6d.24

Tools

• ClassAds: Flexible framework for matching resource requests and providers.

• Job checkpoint and migration• Remote system calls.

redirect I/O back to local machine

Page 25: Schedulers and Resource Brokers

6d.25

Condor-G

• Globus contributes protocols for secure inter-domain communications and standardized access to remote batch systems.

• Condor provides everything else

Page 26: Schedulers and Resource Brokers

6d.26

Page 27: Schedulers and Resource Brokers

6d.27

Condor Core Components

Page 28: Schedulers and Resource Brokers

6d.28

• User submits job to agent– Keeps job and finds resources willing to run

them.• Agents and resources advertise

themselves to a matchmaker. – E-harmony.com: 29 dimensions of

compatibility.• Agent contacts resource

Page 29: Schedulers and Resource Brokers

6d.29

• Agent creates shadow: provides all details necessary to run job.

• Resource creates sandbox: a sage execution environment for the job and the resource.

• All independent and individually responsible for enforcing their owners policies.

• This led to Condor Pools

Page 30: Schedulers and Resource Brokers

6d.30

Page 31: Schedulers and Resource Brokers

6d.31

Direct Flocking (Multiple pools)

Page 32: Schedulers and Resource Brokers

6d.32

Globus

• To develop worldwide Grid, needed uniform interface for batch execution.

• Grid Resource Access and Management protocol (GRAM). – Provides abstraction for remote process queuing

and execution (with security and GridFTP).• Globus provides a server that speaks GRAM,

converts its commands into a form understood by local schedulers

Page 33: Schedulers and Resource Brokers

6d.33

• GRAM does not:– Remember what jobs have been

submitted, where they are, what they are doing.

– Analyze job failure and resubmit– Provide queuing, prioritization, logging,

accounting. – Decouple resource allocation and job

execution.

Page 34: Schedulers and Resource Brokers

6d.34

– Agent must direct a particular job, executable image and all, to a particular queue.

• Gosh, what if there is a backlog and no reasonably available resources?

Page 35: Schedulers and Resource Brokers

6d.35

• Condor adapted standard agent to speak GRAM and uses own middleware.

• Gliding

Page 36: Schedulers and Resource Brokers

6d.36

Page 37: Schedulers and Resource Brokers

6d.37

Directed Acyclic GraphManager (DAGMan)

Meta-scheduler

Allows one to specify dependencies between Condor Jobs.

Page 38: Schedulers and Resource Brokers

6d.38

Example“Do not run Job B until Job A completed

successfully”

Especially important to jobs working together (as in Grid computing).

Page 39: Schedulers and Resource Brokers

6d.39

Directed Acyclic Graph(DAG)

• A data structure used to represent dependencies.

• Directed graph.

• No cycles.

• Each job is a node in the DAG.

• Each node can have any number of parents and children as long as there are no loops (Acyclic graph).

Page 40: Schedulers and Resource Brokers

6d.40

DAG

Job A

Job CJob B

Job D

Do job A.

Do jobs B and C after job A finished

Do job D after both jobs B and C finished.

Page 41: Schedulers and Resource Brokers

6d.41

Defining a DAG• Defined by a .dag file, listing each of the

nodes and their dependencies.

• Each “job” statement has an abstract job name (say A) and a file (say a.condor)

• PARENT-CHILD statement describes relationship between two or more jobs

• Other statements available.

Page 42: Schedulers and Resource Brokers

6d.42

Example# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D

Job A

Job CJob B

Job D

Page 43: Schedulers and Resource Brokers

6d.43

Running a DAG

• DAGMan acts as a scheduler managing the submission of jobs to Condor based upon DAG dependencies.

• DAGMan holds and submits jobs to Condor queue at appropriate times.

Page 44: Schedulers and Resource Brokers

6d.44

Job Failures

• DAGMan continues until it cannot make progress and then creates a rescue file holding current state of DAG.

• When failed job ready to re-run, rescue file used to restore prior state of DAG.

Page 45: Schedulers and Resource Brokers

6d.45

Summary of Key Condor Features

• High throughput computing using an opportunitistic environment.

• Provides a mechanisms for running jobs on remote machines.

• Matchmaking

• Checkpointing

• DAG scheduling