When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Post on 12-Jan-2016

222 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

Transcript

When DBs met the GRID

The Grid Data Source Engine

Dr. Giuliano Taffoni (Ph.D.)

The DMBS Problem

Scientific communities use DB for data (Astronomical Archives, Bioinformatics data...)

Metadata problem

No native access up to GT3.9 (OGSA-DAI)

GT4 (WS)

What is a DMBS?

A database is simply a bunch of information (data) stored on a computer.

DB management system is the service that allow to interact with those information

Relational databases consist of tables of data with clearly defined columns.

RDMS

Data is presented as a collection of relations

Each relation is depicted as a table

Columns are attributes

Rows ("tuples") represent entities

Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity

Software computing machines

DBMS <=> software computing machine

Memory model

Filesystem (table space)

data processor

Language SQL

SQLLanguage to access and manipulate DB system

Accept logical and math operators

You can ask a DB to make simple and complex operations (stats)

Language:

SELECT, INSERT

WHERE, AND, etc.

Meta-computing on the GRID

GRID is able to execute binary code!

It exists a different type of computing: the virtual machines (ex. Java WM)

Grid Abstract Computing Machine

Focus on semantic of the Grid

Formal Methods (ASM)

Grid assumes a virtual pool of resources: (CPU cycles + Memory)

No theoretical limit: Grid can operate with a wide range of resources

“A Formal Framework for Defining Grid Systems” Zsolt N. Nemeth & Vaidy Sunderam

2nd IEEE/ACM (CCGRID'02)

Grid capabilities

Extending “job submission process”:

dynamic Java class loading

functional resolution

inference and reasoning

query evaluation

Extending the Grid capabilities

Theoretical approach

Provide a semantic extension of Grid ASM to verify if it is possible to extend Grid capabilities

Provide a suitable architectural definition for the Grid meta-computing functions

Extending the Grid capabilities

Applicative view

Provide an integration Layer within the jobmanager

Provide a proper extension of the IS to monitor new resources

Security GSI: no need to extend but to use!

A first attempt: DSE

Grid DSE project

Architectural analysis of Grid middleware

Architectural analysis of DSE

Conceptual mapping between DSEs & Grid Resource Framework layer: represent DSE through the Grid resource abstraction.

Grid DSE goal

New calculous capabilities to a Grid Node: Inference, Query, Reasoning...

New fabric element for the Grid: Query Element (analogous of CE)

New IS

QE integration

GDSE JM detailsGrid Data Source Engine

Local Resource Manager

ODBC Driver

GRAM Protocol

GASSGlobusXIO

ODBC Driver

Catalog drv.

Job Manager

ODBC Manager

DSE Instance Man

DSE Instance

Worker Node

DSEMeta Machine

Internal DB

User DB

GDSE & GIS

Grid Data Source Engine

Local Resource Manager

ODBC Driver

GRAM Protocol

GASS

GlobusXIO

ODBC DriverCatalog drv.

Gridinfosystem

MDS

LDAP

Job Manager

ODBC Manager

DSE Instance Man

DSE Instance

Worker Node

DSEMeta Machine

Internal DB

User DB

GANGLIA

snmpd

snmpd

QE vs CE

Added values

QE is analogous to CE

QE is inside the Grid

monitored by the IS

easy to included in the WMS

VOMS

GDSE in practice

Globus GT 2.4.3 (VDT) + globus patch

ODBC & JDBC

Perl DBD

a DBMS: any one supporting ODBC

GDSE security

GSI+VOMS: VO + Groups + Roles;

Mapping VOMS cert with site user account;

Mapping local user account with DB user

User certificate => DB accounting

DB accounting

GRANT user:

DB access

DB managing

Table/role access

User accounting

GDSE demo usage

GDSE accounting

GDSE DB administration

GDSE in the GRID:

BDII usage

Load balancing

Computing and GDSE

GDSE accounting

Site user

inafdbadmin

inafdbuser

inafdbmanager

DMBS (postgresql)

postgres

user

dbmanager

Simple Biomedical DB

Private

Code

Family_name

Name

Address

Telephone

medinfo

Code

Occupation

disease

Hospital

Biomed

GDSE usage

Uses globus

globus-job-run resource SQL-STATEMENT CONNECTOR DBNAME

Ex. globus-job-run grid006 “select * from uno;” ODBC TEST

SQL

Create table as dbadmin

Insert into table

Select

Drop, vista & transaction

Adding new user

Job submission

GDSE and BDII

Meta computing

Made the DB make some calculation for you

DB has scientific/statistical functions

Sometimes simply extracting some data is useless and time consuming

Astronomical example

GSC22 Star catalogue

Huge table: position, magnitude, etc.

Luminosity function on a sky area

Luminosity function

User

LFLF

plot

Luminosity function

Luminosity Function

Simple example of Workflow that uses GDSE and WNs

Parallel access to DB

select

duroc

GDSE GDSE GDSE GDSE

stop

GDSE and Metadata

Astronomical example

FITS files:

Metadata + dataSIMPLE = T / file does conform to FITS standardBITPIX = 16 / number of bits per data pixelNAXIS = 2 / number of data axesNAXIS1 = 4 / length of data axis 1NAXIS2 = 3 / length of data axis 2

123 456

986 345 869

1321 45 84

515

Header == metadata

data

TTYPE1 = 'signal ' / label for field 1 TFORM1 = '1024E ' / data format of field: 4-byte REAL TUNIT1 = 'unknown ' / physical unit of field EXTNAME = 'xtension' / name of this binary table extension PIXTYPE = 'HEALPIX ' / HEALPIX pixelisation ORDERING= 'RING ' / Pixel ordering scheme, either RING or NESTED NSIDE = 128 / Resolution parameter for HEALPIX FIRSTPIX= 0 / First pixel # (0 based) LASTPIX = 196607 / Last pixel # (0 based) INDXSCHM= 'IMPLICIT' / Indexing: IMPLICIT or EXPLICIT COMMENT ------------------------ COMMENT POINTING CHARACTERISTICS COMMENT ------------------------ PT_MODEZ= 'gaussian' / the z-axis pointing mode PT_SIGZ = 5.0000000000E-01 / [arcmin] mean z-axis pointing error PT_DZMAX= 2.0000000000E+00 / [arcmin] maximum z-axis pointing error PT_MODEP= 'ideal ' / the initial scan phase mode PT_MODER= 'gaussian' / the rotation rate mode PT_SIGR = 2.0943951024E-03 / [rad s**(-1)] mean rotation rate error PT_DRMAX= 2.0943951024E-03 / [rad s**(-1)] maximum rotation rate error COMMENT ------------------------ COMMENT SCAN STRATEGY PARAMETERS COMMENT ------------------------ SS_MODE = 'cycloidal' / overall type of scan strategy SS_PMODE= 'followsun' / azimuthal scanning mode SS_T0_I = 1.1991888000E+09 / [s since 1970.0] scan reference time: int SS_T0_F = 0.0000000000E+00 / [s since 1970.0] scan reference time: frac SS_TH_Z0= 9.0000000000E+01 / [deg] fiducial scanning colatitide SS_DTH_Z= 7.0000000000E+00 / [deg] fiducial scanning colatitide variation SS_PROT = 6.0000000000E+01 / [s] rotation period of the satellite SS_PPT = 3.6000000000E+03 / [s] pointing period of scan strategy SS_PREPT= 3.6000000000E+03 / [s] repointing period of scan strategy SS_PMOT = 1.5778800000E+07 / [s] period of the main scan motion SS_PHASE= 1.4323944878E+01 / [deg] initial phase of main scan motion TH_SE_ST= 'warning ' / status of Sun/Earth aspect angle violations TH_S_MAX= 1.0000000000E+01 / [deg] maximum Solar aspect angle TH_E_MAX= 1.5000000000E+01 / [deg] maximum Earth aspect angle SS_N_PT = 8784 / number of pointing periods

COMMENT -------------------- COMMENT Cosmological parameters COMMENT -------------------- OMEGAB = 0.05 / Omega in baryons OMEGAC = 0.95 / Omega in dark matter OMEGAV = 0.00 / Omega in cosmological constant OMEGAN = 0.00 / Omega in neutrinos HUBBLE = 50.00 / Hubble constant in km/s/Mpc NNUNR = 0.00 / number of massive neutrinos NNUR = 3.04 / number of massless neutrinos TCMB = 2.7260 / CMB temperature in Kelvin HELFRACT= 0.24 / Helium fraction OPTDLSS = 0.00 / reionisation optical depth IONFRACT= 0.20 / ionisation fraction

MetadataDBMS

GUID -> to all metadata

Query DB:Globus-job-run

Locate files:GUID

Get files:lfc

Make computation

top related