NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics analysis components, adaptable to different experiments. A job configuration manager uses a scripting interface to provide web- based editing, submission and cataloguing of analysis jobs, both user-level and experiment-wide, centrally managed in a database. A client/server system distributed over compute nodes provides job submission and monitoring across facilities, which may span several sites. A file catalog records production relationship of data files generated by an experiment. NOVA provides database tools for geometry and parameter object storage. A NOVA web- based browser navigates a relational database storing hierarchically structured dataObjects. Clients may access database information from the code or through a CORBA-specified interface. NOVA components have been tested and deployed in the STAR and ATLAS environments. February 7, 2000
20
Embed
NOVA N etworked O bject-based En V ironment for A nalysis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
NOVANetworked Object-based EnVironment for Analysis
P. Nevski, A. Vaniachine, T. Wenaus
NOVA is a project to develop distributed object oriented physics analysis components, adaptable to different experiments. A job configuration manager uses a scripting
interface to provide web-based editing, submission and cataloguing of analysis jobs, both user-level and experiment-wide, centrally managed in a database. A client/server
system distributed over compute nodes provides job submission and monitoring across facilities, which may span several sites. A file catalog records production relationship of data files generated by an experiment. NOVA provides database tools for geometry and parameter object storage. A NOVA web-based browser navigates a relational database storing hierarchically structured dataObjects. Clients may access database information from the code or through a CORBA-specified interface. NOVA components have been
tested and deployed in the STAR and ATLAS environments.
• Unprecedented data volume and software complexity in new large High Energy and Nuclear Physics experiments at RHIC (BNL) and LHC (CERN)New approaches to analysis and data handling
softwareDistributed computing environment (DCE) is vital
and increasingly powerfulExperience in developing DCE solutions for STARBuild on experience to develop DCE tools for use
in similarly challenging environments
February 7, 2000 CHEP in Padova
Goals
• Develop software tools for– coordination and control of widely distributed analysis
development and physics analysis activity– distributed management and analysis of very large
datasets– enhanced robustness, reusability and maintainability
of analysis software• For application in many global computing environments
(ATLAS, STAR, …)– generic tools not tied to specific implementation
choices– select, templatable implementations provided such
that NOVA components can be used in a baseline framework
February 7, 2000 CHEP in Padova
Requirements
• Support wide area data intensive analysis• Define middleware services are required to permit
analysis applications to effectively run over wide area networks
• Provide a rich set of features that applications can select and use to obtain the level of service they need to operate
• Define the features and the API's necessary to allow the application and middleware to communicate
• Integrate the middleware API's with the applications
February 7, 2000 CHEP in Padova
Design Approach
• Small, modular components; application-neutral interfaces– Can be used as a coherent framework or in
isolation to extend existing analysis systems• Focused on support for C++ based analysis
– Used for all RHIC, LHC, other large experiments• Emphasis on user participation in iterative
development; real-world prototyping and testing (STAR, ATLAS)
• Extensive use of existing tools and technologies– Must be readily available, true or de facto
standards, well supported, widely used or showing good growth
February 7, 2000 CHEP in Padova
Component-based Architecture
R egionalC enter
N O VA Architecture
R em oteC lients
Data Management
Analysis Server
Middlew are Components
Remote Analysis
A pplica tion specific; sam pleim plem enta tion provided
N O V A com ponent
Th ird party too l custom ized forand in tegrated in to N O V A
E xisting th ird party too l em ployed by N O V A
P roto typedS ta tus: P lannedIm plem ented
O ffline C ontro lF ram ework
C V S C odeR eposito ry
A nalysisD aem on
D ynam ica llyloaded apps
M yS Q L A na lysisC ata logue
M onitoringM odule
H yperN ewsB ug system
S tateS erver
M obileA nalysis
C lien t
W ebbrowser
V isua lisa tionG C A Q uerynanoD S T
D ata R eposito ry
G randC hallenge
A rch itecture(G C A )
M yS Q L D ataC ata logue
C ata logIn terface
C lien tD ata B inder
M odule
S erverD ata B inder
M odule
P aram etersR eposito ry
M yS Q L C lien tS ta te D B
C lien tD ata B inder
M odule
W eb S erverD atabaseN avigator
February 7, 2000 CHEP in Padova
Tools and Technologies
• Third party tools and technologies used in NOVA:– MySQL: relational database for catalogs, state
information and simple objects: C-structs– Perl: Unix scripting and web development tool– Apache: customizable (Perl & PHP) web server for
communication and monitoring – CORBA: low-volume interprocess data exchange– ROOT: visualization and analysis tools
February 7, 2000 CHEP in Padova
Components
NOVA components fall into four domains– Regional Center
• Central management and execution of analysis– Remote Client
• Mobile Analysis– Middleware Components
• Data exchange and navigation tools• Client/Server object request brokerage
– Data Management• Data repository, catalogue, and interface• Data model for simple objects (C-structs)
February 7, 2000 CHEP in Padova
Dynamic Binding
• Problem:– A user has a new idea that was not foreseen at the
beginning. User modifies the structure of one object in his application. Application stores new objects in the database.
– Remote applications unaware of a new functionality may request objects in old format.
• Solution:– Application: provides metadata request (name, time,
selectors...) and the application dataObject dictionary– Database server: provides dataObject and the dictionary– Object Request Broker module: converts dataObject
according to the application dictionary
February 7, 2000 CHEP in Padova
Dynamic Object Broker
Central Database Server
Middleware Services
Remote Application Clients
ApplicationDataObject
DatabaseDataObject
DatabaseDictionary
ApplicationDictionary
Parameters Repository
Object
Request
Broker
February 7, 2000 CHEP in Padova
Forward Compatibility
• Benefits:– Separation of database and analysis applications– Robust interface (via built-in type checking)– Dictionary built from C-header files or IDL-files – Database access is independent of application
code version: user can read new dataObjects with an old executable
– Object Request Broker at the Regional Center serves dynamic HTML dataObjects in format tailored according to application ID: Netscape or MS Internet Explorer
February 7, 2000 CHEP in Padova
Remote Application Client
Static Object Broker
NOVA Browser
Regional Center Database Server
Middleware Services
DatabaseAPI
Module
ApplicationDataObject
DatabaseDataObject
DatabaseAPI Call
ApplicationID
Parameters Repository
Apache
WebServer
February 7, 2000 CHEP in Padova
Layered Interface
February 7, 2000 CHEP in Padova
Data Model
structure relation parameter
Array of structures Array of parameters
February 7, 2000 CHEP in Padova
Job monitoring system
Cataloguing Analysis Workflow
fileCatalog
Job configuration manager
February 7, 2000 CHEP in Padova
Grand Challenge Interface
database
GC System
StIOMaker
fileCatalog
tagDB
QueryMonitor
CacheManager
QueryEstimator
gcaClient
FileCatalog
IndexFeeder
GCA Interface STAR Components
IndexBuilder
February 7, 2000 CHEP in Padova
Limiting Dependencies
Experiment-specific
• IndexFeeder server
– IndexFeeder read the “tag database” so that GCA “index builder” can create index
• FileCatalog server
– FileCatalog queries the “file catalog” database of the experiment to translate fileID to HPSS & disk path
& GCA-dependent
• gcaClient interface– Experiment sends queries and get back filenames
through the gcaClient library calls
February 7, 2000 CHEP in Padova
Summary
What is NOVA?• Framework components for distributed computing
What are NOVA components?• Configuration manager for analysis jobs• Distributed job submission and monitoring system • Analysis workflow catalog • Database for versioned dataObjects• Brokered extraction of dataObjects • Web-based database navigation tool