Top Banner
The Grid Job The Grid Job Monitoring Service Monitoring Service Lud Lud ě ě k Matyska et al. k Matyska et al. CESNET, z.s.p.o. CESNET, z.s.p.o. Prague Prague Czech Republic Czech Republic
18

The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Mar 27, 2015

Download

Documents

Hayden Fowler
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

The Grid Job Monitoring The Grid Job Monitoring ServiceService

LudLuděěk Matyska et al.k Matyska et al.CESNET, z.s.p.o.CESNET, z.s.p.o.

PraguePrague

Czech RepublicCzech Republic

Page 2: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Motivation

Job trackingJob tracking– Too complex environmentToo complex environment– Responsibility delegationResponsibility delegation– Independent decision by componentsIndependent decision by components– Security issues (only delegated contact)Security issues (only delegated contact)

Parallel and multipart jobsParallel and multipart jobs– Too many sub-tasksToo many sub-tasks– View aggregationView aggregation

Page 3: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Job MovementJob Movement

Page 4: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

The Logging and The Logging and Bookkeeping ServiceBookkeeping Service

Collects events associated with job life, Collects events associated with job life, e.g.e.g.– Job submittedJob submitted– Resource foundResource found– Job started on a CE (Computing Element)Job started on a CE (Computing Element)– Job finished its computationJob finished its computation

Stores them in bookkeeping and Stores them in bookkeeping and logging databaseslogging databases

Provides the job state to end usersProvides the job state to end users

Page 5: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Job Life CycleJob Life Cycle

Page 6: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

LB service architectureLB service architecture

Two APIsTwo APIs– logging APIlogging API– server APIserver API

Local logger serviceLocal logger service The database serversThe database servers

Page 7: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Architecture — SchemaArchitecture — Schema

Page 8: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Architecture—CommentsArchitecture—Comments Message format:Message format:

– ULM based (NetLogger)ULM based (NetLogger)– Semantic rules prescribedSemantic rules prescribed

Local logger serviceLocal logger service– locallogger daemonlocallogger daemon– interlogger daemoninterlogger daemon– local persistency (local disk file)local persistency (local disk file)

Data transfer to database serversData transfer to database servers– Bookkeeping server: persistent during the job Bookkeeping server: persistent during the job

life timelife time– Logging server: “eternally” persistentLogging server: “eternally” persistent

Page 9: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Logging APILogging API

SimpleSimple Just one function dg_log_event()Just one function dg_log_event() Always stores date/time, event Always stores date/time, event

producer, jobIDproducer, jobID AuthenticatedAuthenticated

Page 10: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Server APIServer API

State computed on-demandState computed on-demand Three core functions:Three core functions:

– List of user’s jobsList of user’s jobs– Job status for a given jobJob status for a given job– List of events related to a given jobList of events related to a given job

AuthenticatedAuthenticated

Page 11: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Job IdentificationJob Identification GRID-wide (global) identifierGRID-wide (global) identifier Used to identify the appropriate bookkeeping Used to identify the appropriate bookkeeping

serverserver– Currently “wired in”Currently “wired in”– In the future probably via Information In the future probably via Information

serviceservice URL-like syntax: URL-like syntax:

https://hostname:port/unique_string?...https://hostname:port/unique_string?... unique_string —to distinguish individual jobsunique_string —to distinguish individual jobs Bookkeeping server “speaks” https protocolBookkeeping server “speaks” https protocol

Page 12: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Security ConsiderationsSecurity Considerations

AuthenticationAuthentication– Both for logging and database queriesBoth for logging and database queries– Certificate based (user and/or Certificate based (user and/or

host/service)host/service)– User associated with jobID on first User associated with jobID on first

authenticated eventauthenticated event Secure channelsSecure channels Storage (database) accessStorage (database) access

Page 13: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

R-GMA IntegrationR-GMA Integration

Work in progressWork in progress The goals:The goals:

– To lower database loadTo lower database load– To provide notification serviceTo provide notification service– To allow better integration with other To allow better integration with other

information servicesinformation services

Page 14: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

R-GMA—First ExtensionR-GMA—First Extension

Page 15: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

LB Service ExtensionsLB Service Extensions

User defined attributesUser defined attributes– To store additional information associated To store additional information associated

with a jobwith a job– To retrieve job collectionsTo retrieve job collections

Synchronous APISynchronous API Job checkpointing (at the application Job checkpointing (at the application

level)level)– Information stored in Bookkeeping serverInformation stored in Bookkeeping server

Page 16: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Job PartitionningJob Partitionning

Group IDGroup ID– Job collectionsJob collections– HierarchicalHierarchical

Aggregate queriesAggregate queries

Page 17: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

ConclusionConclusion

LB service providesLB service provides– Job trackingJob tracking– Persistent event storagePersistent event storage– Job state provisionJob state provision

Future workFuture work– (R-)GMA integration(R-)GMA integration– AuthorizationAuthorization– Collective operationsCollective operations

Page 18: The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.

Thank you for your Thank you for your interestinterest