Top Banner
b Submission and Resource Brokerin WP 1
34

Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Apr 01, 2015

Download

Documents

Zoie Host
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Job Submission and Resource BrokeringWP 1

Page 2: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Contents:

•The components

•What (should) works now and configuration

•How to submit jobs … the UI and JDL

•Planned future functionality

Documentation available from:http://server11.infn.it/workload-grid/documents.htmA particularly gripping read is the “Administrator and User Guide” released last Friday.

Page 3: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 4: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The User Interface (UI):

All user interactions are through the UI

Installed on the submitting machine

Communicates with both the Resource Broker (RB) and the Logging Broker (LB)

On job submission the UI assigns a unique job identifier to the job (dg_jobId), sends the executable, Job Description File and Input Sandbox to the RB. It also sends notification of the submission to the LB.

Page 5: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The User Interface (UI):

The UI can also be used to query the status of a job… which it does by interrogating the LB

Configuration:The UI configuration is contained UI_ConfigEnv.cfg which contains the following information:• address and port of accessible RBs•     address and port of accessible LBs•   default location of the local storage areas for the Input/Output sandbox files•     default values for the JDL mandatory attributes• default number of retrials on fatal errors when

connecting to the LB.

Page 6: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The User Interface (UI):

Users concurrently using the same submitting machine use the same configuration files.

For users (or groups of users) having particular needs it is possible to “customise” the UI configuration through the -config option supported by each UI command.

Page 7: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Resource Broker (RB):

Situated at a central location (not local to your machine).

Expected to have one per VO, currently only one at CERN

Jobs are queued locally(stored in a PostgreSQL Database)

Interrogates the replica catalogue and the information services and attempts to match the job to an available resource. Matching is based on the Condor ClassAd Libraray.

If a suitable match is made the RB can submit the job to the Job Submission Service (JSS). Of course all events and status information is sent to the LB.

Page 8: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Resource Broker (RB):

Configuration:

Most people will never need to configure their own RB. However for completeness the configuration file is: <install path>/etc/rb.conf. This contains entries for the replica catalogue, the MDS etc.

For more detailed information see the “Administrator and User Guide”.

Input/Output Sandboxes etc are stored on the machine hosting the RB and so a reasonable amount of disk space is required.

Page 9: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Job Submission Service (JSS):

If the RB has successfully matched a job to a resource it is passed to the JSS (which is usually on the same machine).

The JSS queues the job internally in a PostgreSQL database.

Job submission is performed using Condor-G

The JSS also monitors job until their completion, notifying the LB of any significant events.

Page 10: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Job Submission Service (JSS):

Configuration:

Again most people will need to configure a JSS sever.The configuration file is <install path>/etc/jss.conf

Page 11: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Logging Broker (LB):

All events throughout the job submission, execution and output retrieval processes are logged by the LB in a MySQL database. All information is time stamped.

It is through the logged information that users are able to discover the state of their jobs.

Page 12: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Logging Broker (LB):

Configuration:An LB local logger must be installed on all machines which are pushing information into the LB system (RB and JSS machines and the gatekeeper machines of each CE). An exception to this is the job submission machine which can have a local logger but it is not mandatory.

The LB server needs only be installed on a server machine.

Page 13: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

The Logging Broker (LB):

Configuration:

The local logger requires no configuration and the server is configured when the database is created using <install path>/etc/server.sql.

No further configuration is required.

Page 14: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Submitting a job:

ClassAds are: Declarative – rather than procedural… that is they describe notions of compatibility rather than specifying a procedure to determine compatibilty

Simple – both syntactically and semantically … easy to use

Portable – Nothing is used that requires features specific to a given architecture

Page 15: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Submitting a job:

ClassAds have dynamic typing and so only values have types (not expressions)

As well as the usual type (numeric, string Boolean) values can also have types such as time intervals and timestamps and esoteric values such as undefined and error.

ClassAds can be nested

ClassAds have the usual set of operators (See the JDL how to).

Page 16: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Submitting a job:An example:Executable = "WP1testF";StdOutput = "sim.out";StdError = "sim.err";InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"};OutputSandbox = {"sim.err","sim.err","testD.out"};Rank = other.TotalCPUs * other.AverageSI00;Requirements = other.LRMSType == "PBS" \&& (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2") && \self.Rank > 10 && other.FreeCPUs > 1;RetryCount = 2;Arguments = "file1";InputData = "LF:test10099-1001";ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";DataAccessProtocol = "gridftp";OutputSE = "grid001.cnaf.infn.it";

Page 17: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Submitting a job:

AND F T U E   OR

F T U E   NOT  

F F F F E   F F T U E   F T

T F T U E   T T T T E   T F

U F U U E   U U T U E   U U

E E E E E   E E E E E   E E

Page 18: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Submitting a job:

ClassAds have dynamic typing and so only values have types (not expressions)

As well as the usual type (numeric, string Boolean) values can also have types such as time intervals and timestamps and esoteric values such as undefined and error.

ClassAds can be nested

ClassAds have the usual set of operators (See the JDL how to).

Page 19: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Submitting a job:

–      dg-job-submitAllows the user to submit a job for execution on remote resources in a grid.  SYNOPSISdg-job-submit [-help]dg-job-submit [-version]dg-job-submit [-template]dg-job-submit <job_description_file> [-input input_file | -resource res_id] [-notify e_mail_address] [-config group_name] [-output out_file] [-noint] [-debug]

Page 20: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

############################################## # # -------- Job description file ----------# ############################################## Executable = "$(CMS)/fpacini/exe/sum.exe";InputData = "LF:testbed0-00019";ReplicaCatalog="ldap://firefox.esrin.esa.it:2155/ReplicaCatalog1";DataAccessProtocol = "gridftp";RetryCount = 10;Rank = other.MaxCpuTime;Requirements = other.LRMSType == "Condor" && \ other.Architecture == "INTEL" && other.OpSys== "LINUX"

&& \ other.FreeCpus >= 4; 

Page 21: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

 

–      dg-get-job-outputThis command requests the RB for the job output files (specified by the OutputSandbox attribute of the job-ad) and stores them on the submitting machine local disk.  SYNOPSISdg-get-job-output [-help] dg-get-job-output [-version]dg-get-job-output < dg_jobId1 …. dg_jobIdn | -input input_file > [-dir

directory_path] [-config group_name] [-noint] [-debug] ExamplesLet us consider the following command: $> dg-get-job-output https://grid004.it:2234/124.75.74.12/12354732109721?www.rb.com:4577 –dir /home/data it retrieves the files listed in the OutputSandbox attribute from the RB and stores them locally in /home/data/12354732109721.

Page 22: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

–      dg-list-job-match Returns the list of resources fulfilling job requirements.  SYNOPSISdg-list-job-match [-help] dg-list-job-match [-version]dg-list-job-match <job_description_file> [-verbose] [-config group_name] [-output output_file] [-noint] [-debug]

–      dg-job-cancel Cancels one or more submitted jobs.  SYNOPSISdg-job-cancel [-help] dg-job-cancel [-version]dg-job-cancel < dg_jobId1 …. dg_jobIdn | -input input_file | -all > [-notify

e_mail_address] [-config group_name] [-output output_file] [-noint] [-debug] 

Page 23: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

–      dg-job-statusDisplays bookkeeping information about submitted jobs.  SYNOPSISdg-job-status [-help] dg-job-status [-version]dg-job-status < dg_jobId1 …. dg_jobIdn | -input input_file | -all > [-full] [-config

group_name] [-output output_file] [-noint] [-debug]

Page 24: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Examples$> dg-job-status dg_jobId2 displays the following lines: ********************************************************************BOOKKEEPING INFORMATIONPrinting status for the job: dg_jobId2 --- dg_JobId = firefox.esrin.esa.it__20010514_163007_21833_RB1_LB3 Job Owner = /C=IT/O=ESA/OU=ESRIN/CN=Fabrizio Pacini/[email protected] = RUNNINGLocation = firefox.esa.it:2119/jobmanager-condorJob Destination = http://ramses.esrin.esa.it/rams/dataset1Status Enter Time = 10:24:32 05-06-2001 GMTLast Update Time = 10:25:45 05-06-2001 GMTCpuTime = 1********************************************************************

Page 25: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

–      dg-get-logging-infoDisplays logging information about submitted jobs.  SYNOPSISdg-get-logging-info [-help]dg-get-logging-info [-version]dg-get-logging-info < dg_jobId1 …. dg_jobIdn | -input input_file | -all > [-

from T1] [-to T2] [-level logLevel] [-config group_name] [-output output_file] [-noint] [-debug]

Page 26: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Job Submission: There is a GUI

 

Page 27: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 28: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 29: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 30: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 31: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 32: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
Page 33: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Release Dependencies Job Partner

1.4 WP4 Support for interactive jobs UI/RB/JSS groups

2 WP4 Support for job partioning INFN PD/PPARC

1.3 WP4 Ability to submit MPI jobs UI/RB/JSS groups

1.4 WP4 Specification of job dependencies INFN CNAF/PPARC

1.4 WP7 WP2 Triggering of file transfers INFN TO +Catania

1.4 WP7 Integration of network into scheduling policy INFN TO + Catania +CNAF?

1.3   Develop APIs for application DATAMAT

1.4   Development of GUI DATAMAT

1.4 Globus CAS +WP4

Deployment of accounting infrastructure over testbeds (HLR with command line interface)

INFN TO

2   Full integration of cost estimation/accounting into scheduling policies

INFN TO +CT

1.?   Review command requirement from D8.1A: "hold", "move queue. Document reviewed by February. Implications to RB architecture to be understood.

DATAMAT

1.? WP8 WP9 WP10

Review of job info from D8,1A. Document to be reviewed by January . Implications may need coordination/blessing of WP4, and needs to be finalised and matched alongside their schedule

CESNET

Things to come over the next year

Page 34: Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

1.?   Review command requirement from D8.1A: "hold", "move queue. Document reviewed by February. Implications to RB architecture to be understood.

DATAMAT

1.? WP8 WP9 WP10

Review of job info from D8,1A. Document to be reviewed by January . Implications may need coordination/blessing of WP4, and needs to be finalised and matched alongside their schedule

CESNET

1.2   Support for Proxy renewal CESNET, JSS part UNFN PD possible UI change

??? WP3 Availability of L&B info through "standard" WP3 mechanism. Interfacing with WP3 R-GMA will tested by MAY. Feedback will be provided

CESNET

1.4 WP2 WP4 WP5 WP7

Advanced reservation API. Usefulness dependent on Testbed QoS configuration

INFN CNAF

2   Integration of advanced reservation(co-allocation) into RB

INFN CNAF