INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org EGEE Middleware The Resource Broker EGEE project members
Dec 18, 2015
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE MiddlewareThe Resource Broker EGEE project members
EGEE ResourceBroker
2
Enabling Grids for E-sciencE
INFSO-RI-508833
Contents
• Short review of concepts• Requirements of the applications communities• Overview of the main grid services• A closer look
EGEE ResourceBroker
3
Enabling Grids for E-sciencE
INFSO-RI-508833
Current production middleware
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
StorageStorageElementElement
ComputingComputingElementElement
Information Information ServiceService
Job Status
DataSets info
Author.&Authen.
Job S
ub
mit
Even
t
Job
Qu
ery
Job
Stat
us
Input “sandbox”
Input “sandbox” + Broker Info
Output “sandbox”
Output “sandbox”
Pu
blis
h
SE & CE info
““User User interface”interface”
LCG LCG FileCatalogue FileCatalogue (LFC)(LFC)
EGEE ResourceBroker
4
Enabling Grids for E-sciencE
INFSO-RI-508833
Building on basic tools and Information Service
Example JDL fileExecutable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“/home/joda/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
…
•Submit job to grid via the “resource broker”,
•edg_job_submit my.jdl
EGEE ResourceBroker
5
Enabling Grids for E-sciencE
INFSO-RI-508833
User Interface node
• The user’s interface to the Grid
• Command-line interface to– Proxy server– Job operations
To submit a job Monitor its status Retrieve output
– Data operations Upload file to SE Create replica Discover replicas
– Other grid services
• Also C++ and Java APIs
• To run a job user creates a JDL (Job Description Language) file
UIJDL
EGEE ResourceBroker
6
Enabling Grids for E-sciencE
INFSO-RI-508833
Example JDL fileExecutable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“/home/joda/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
InputData = “lfn:/grid/VOname/mydir/testbed0.00019”;
DataAccessProtocol = “gridftp”;
Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;
Rank = “other.GlueHostBenchmarkSF00”;
Building on basic tools and Information Service
•Submit job to grid via the “resource broker (RB)”,
•edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output
EGEE ResourceBroker
7
Enabling Grids for E-sciencE
INFSO-RI-508833
Example JDL fileExecutable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“/home/joda/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
InputData = “lfn:/grid/VOname/mydir/testbed0-00019”;
DataAccessProtocol = “gridftp”;
Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;
Rank = “other.GlueHostBenchmarkSF00”;
Building on basic tools and Information Service
•Submit job to grid via the “resource broker”,
•edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output
lfn: logical file name
RB uses Catalog to find replica locations
EGEE ResourceBroker
8
Enabling Grids for E-sciencE
INFSO-RI-508833
Example JDL fileExecutable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“/home/joda/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
InputData = “lfn:testbed0-00019”;
DataAccessProtocol = “gridftp”;
Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;
Rank = “other.GlueHostBenchmarkSF00”;
Building on basic tools and Information Service
•Submit job to grid via the “resource broker”,
•edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output
Uses BDII Information System
9
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
Job submission
10
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
Job Status
UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)WMS: Workload Management System
11
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
edg-job-submit myjob.jdlMyjob.jdl
JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;
submitted
Job Status
Job Description Language(JDL) to specify job characteristics and requirements
12
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Input Sandboxfiles
Job
waiting
submitted
Job Status
NS: network daemon responsible for acceptingincoming requests
13
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
WM: responsible to takethe appropriate actions to satisfy the request
Job
14
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
Where must thisjob be executed ?
15
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Matchmaker: responsible to find the “best” CE where to submit a job
16
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Where are (which SEs) the needed data ?
What is thestatus of the
Grid ?
17
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
CE choice
18
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
JobAdapter
JA: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, etc.)
19
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
JC: responsible for theactual job managementoperations (done via CondorG)
Job
submitted
waiting
ready
20
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job
InputSandboxfiles
submitted
waiting
ready
scheduled
21
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
InputSandbox
submitted
waiting
ready
scheduled
running
“Grid enabled”data transfers/
accesses
Job
22
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
23
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandbox
submitted
waiting
ready
scheduled
running
done
edg-job-get-output <dg-job-id>
24
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
LFC
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
cleared
25
Job monitoring
UI
Log Monitor
Logging &Bookkeeping
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ComputingElement
RB node
LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB
LB: receives and stores job events; processes corresponding job status
Log ofjob events
edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>
Job status
EGEE ResourceBroker
26
Enabling Grids for E-sciencE
INFSO-RI-508833
Possible job states
Flag Meaning
SUBMITTED submission logged in the LB
WAIT job match making for resources
READY job being sent to executing CE
SCHEDULED job scheduled in the CE queue manager
RUNNING job executing on a WN of the selected CE queue
DONE job terminated without grid errors
CLEARED job output retrieved
ABORT job aborted by middleware, check reason