The Grid Job The Grid Job Monitoring Service Monitoring Service Lud Lud ě ě k Matyska et al. k Matyska et al. CESNET, z.s.p.o. CESNET, z.s.p.o. Prague Prague Czech Republic Czech Republic
Mar 27, 2015
The Grid Job Monitoring The Grid Job Monitoring ServiceService
LudLuděěk Matyska et al.k Matyska et al.CESNET, z.s.p.o.CESNET, z.s.p.o.
PraguePrague
Czech RepublicCzech Republic
Motivation
Job trackingJob tracking– Too complex environmentToo complex environment– Responsibility delegationResponsibility delegation– Independent decision by componentsIndependent decision by components– Security issues (only delegated contact)Security issues (only delegated contact)
Parallel and multipart jobsParallel and multipart jobs– Too many sub-tasksToo many sub-tasks– View aggregationView aggregation
Job MovementJob Movement
The Logging and The Logging and Bookkeeping ServiceBookkeeping Service
Collects events associated with job life, Collects events associated with job life, e.g.e.g.– Job submittedJob submitted– Resource foundResource found– Job started on a CE (Computing Element)Job started on a CE (Computing Element)– Job finished its computationJob finished its computation
Stores them in bookkeeping and Stores them in bookkeeping and logging databaseslogging databases
Provides the job state to end usersProvides the job state to end users
Job Life CycleJob Life Cycle
LB service architectureLB service architecture
Two APIsTwo APIs– logging APIlogging API– server APIserver API
Local logger serviceLocal logger service The database serversThe database servers
Architecture — SchemaArchitecture — Schema
Architecture—CommentsArchitecture—Comments Message format:Message format:
– ULM based (NetLogger)ULM based (NetLogger)– Semantic rules prescribedSemantic rules prescribed
Local logger serviceLocal logger service– locallogger daemonlocallogger daemon– interlogger daemoninterlogger daemon– local persistency (local disk file)local persistency (local disk file)
Data transfer to database serversData transfer to database servers– Bookkeeping server: persistent during the job Bookkeeping server: persistent during the job
life timelife time– Logging server: “eternally” persistentLogging server: “eternally” persistent
Logging APILogging API
SimpleSimple Just one function dg_log_event()Just one function dg_log_event() Always stores date/time, event Always stores date/time, event
producer, jobIDproducer, jobID AuthenticatedAuthenticated
Server APIServer API
State computed on-demandState computed on-demand Three core functions:Three core functions:
– List of user’s jobsList of user’s jobs– Job status for a given jobJob status for a given job– List of events related to a given jobList of events related to a given job
AuthenticatedAuthenticated
Job IdentificationJob Identification GRID-wide (global) identifierGRID-wide (global) identifier Used to identify the appropriate bookkeeping Used to identify the appropriate bookkeeping
serverserver– Currently “wired in”Currently “wired in”– In the future probably via Information In the future probably via Information
serviceservice URL-like syntax: URL-like syntax:
https://hostname:port/unique_string?...https://hostname:port/unique_string?... unique_string —to distinguish individual jobsunique_string —to distinguish individual jobs Bookkeeping server “speaks” https protocolBookkeeping server “speaks” https protocol
Security ConsiderationsSecurity Considerations
AuthenticationAuthentication– Both for logging and database queriesBoth for logging and database queries– Certificate based (user and/or Certificate based (user and/or
host/service)host/service)– User associated with jobID on first User associated with jobID on first
authenticated eventauthenticated event Secure channelsSecure channels Storage (database) accessStorage (database) access
R-GMA IntegrationR-GMA Integration
Work in progressWork in progress The goals:The goals:
– To lower database loadTo lower database load– To provide notification serviceTo provide notification service– To allow better integration with other To allow better integration with other
information servicesinformation services
R-GMA—First ExtensionR-GMA—First Extension
LB Service ExtensionsLB Service Extensions
User defined attributesUser defined attributes– To store additional information associated To store additional information associated
with a jobwith a job– To retrieve job collectionsTo retrieve job collections
Synchronous APISynchronous API Job checkpointing (at the application Job checkpointing (at the application
level)level)– Information stored in Bookkeeping serverInformation stored in Bookkeeping server
Job PartitionningJob Partitionning
Group IDGroup ID– Job collectionsJob collections– HierarchicalHierarchical
Aggregate queriesAggregate queries
ConclusionConclusion
LB service providesLB service provides– Job trackingJob tracking– Persistent event storagePersistent event storage– Job state provisionJob state provision
Future workFuture work– (R-)GMA integration(R-)GMA integration– AuthorizationAuthorization– Collective operationsCollective operations
Thank you for your Thank you for your interestinterest