Tony Doyle - University of Glasgow UK UK Grid Data Management Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario GRID Data Management GRID Data Management Service Graph Service Graph Development Tools Development Tools Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype From Files b PEvent b PEventObj Vector b Pevent Obj b PsiDetect or b PSiDigi t b PMDT _Detector b PMDT _Digit b Pcalo Region b Pcalo Digit b Ptruth Vertex b Ptruth Track To Objects
16
Embed
UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Tony Doyle - University of Glasgow
UKUK
Grid Data ManagementGrid Data Management
IntroductionIntroduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario
GRID Data ManagementGRID Data Management Service GraphService Graph Development ToolsDevelopment Tools
Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype
Example analysis scenario:Example analysis scenario: Physicist issues a query from Athena for a Monte Carlo dataset
Issues: How expressive is this query? What is the nature of the query: declarative Creating new queries and language
Algorithms are already available in local shared libraries
An Athena service consults an ATLAS Virtual Data Catalog
Consider possibilities:Consider possibilities: TAG file exists on local machine (e.g. Glasgow)
Analyze it
ESD file exists in a remote store (e.g. Edinburgh) Access relevant event files, then analyze that
RAW File no longer exists (e.g. RAL) Regenerate, re-reconstruct, re-analyze !!! GRID Data
Management
Tony Doyle - University of Glasgow
UKUK
GRID Data ManagementGRID Data Management
Goal: develop middle-ware infrastructure to manage petabyte-scale data
Replica Manager
Data Mover
Data Accessor
Storage Manager
Castor HPSS
Data Locator
Meta Data Manager
Local Filesystem
Query Optimisation &Access Pattern Manag.
Secure Region
High Level Services
Medium Level Services
Core ServicesService levels reasonably well defined
Identify Key AreasWithin Software
Structure
UKUK
Tony Doyle - University of Glasgow
UKUK
5 areas for development5 areas for development Data Accessor - hides specific storage system requirements.
Mass Storage Management group. Replication - improves access by wide-area caching. Globus
toolkit offers sockets and a communication library, Nexus. Meta Data Management - data catalogues, monitoring
information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated.
Security - ensuring consistent levels of security for data and meta data.
Query optimisation - “cost” minimisation based on response time and throughput Monitoring Services group.
Identifiable UKContributions
RAL
Identifying Key AreasIdentifying Key Areas
RAL
Tony Doyle - University of Glasgow
UKUK
4 tasks defined in current UK WP24 tasks defined in current UK WP2 Service Discovery - locate grid services
(Wolfgang Hoschek, Gavin McCance +...) SQL Database Service - store, query and
Query Optimisation - “cost” model (Kurt Stockinger +…)
Data Mining - semi-automatic discovery of events patterns, associations and anomalies: Grid metadata and HEP applications
UK + CERN = UK++
Identifying Key AreasIdentifying Key Areas
UKUK
Tony Doyle - University of Glasgow
UKUK
Service GraphService Graph
sds.cern.ch
sds.anl.gov
sds.infn.it sds.ral.uk
sds.padova-infn.it
sds.trieste-infn.it
sds.bologna-infn.it
Optimisation? - combine all info on nodes from e.g. ScotGRID
locally and advertise via Globus
All nodes “Grid Aware”
Allowed? Hierarchical Model
Tony Doyle - University of Glasgow
UKUK
Unified Modelling LanguageUnified Modelling Language
•Standard method to define the architecture = UML
•Standard tool = TogetherSoft?
Free for academic use.Runs under linux.
“I tried to generate an import/export module for MySQL under linux by copying the db2 .config file and replacing the various column types by the ones that are available in MySQL. This works apart from the fact that the primary key generation fails and a schema is generated (which MySQL doesn't support). The Access97 type of primary key generation is fine for MySQL. I have seen that Access uses a specialized DB import/export class. How can I generate one for MySQL?”
DB Driver for MySQL under linux?
Determine correct tools by testing..
Tony Doyle - University of Glasgow
UKUK
Compiler EfficiencyCompiler Efficiency
Numerically intensive simulations:Numerically intensive simulations: Minimal input and output data
ATLAS Monte Carlo (gg H bb)228 sec/3.5 Mb event on 800 MHz linux