Top Banner
When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)
42

When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Jan 12, 2016

Download

Documents

Evan Gaines
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

When DBs met the GRID

The Grid Data Source Engine

Dr. Giuliano Taffoni (Ph.D.)

Page 2: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

The DMBS Problem

Scientific communities use DB for data (Astronomical Archives, Bioinformatics data...)

Metadata problem

No native access up to GT3.9 (OGSA-DAI)

GT4 (WS)

Page 3: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

What is a DMBS?

A database is simply a bunch of information (data) stored on a computer.

DB management system is the service that allow to interact with those information

Relational databases consist of tables of data with clearly defined columns.

Page 4: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

RDMS

Data is presented as a collection of relations

Each relation is depicted as a table

Columns are attributes

Rows ("tuples") represent entities

Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity

Page 5: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Software computing machines

DBMS <=> software computing machine

Memory model

Filesystem (table space)

data processor

Language SQL

Page 6: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

SQLLanguage to access and manipulate DB system

Accept logical and math operators

You can ask a DB to make simple and complex operations (stats)

Language:

SELECT, INSERT

WHERE, AND, etc.

Page 7: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Meta-computing on the GRID

GRID is able to execute binary code!

It exists a different type of computing: the virtual machines (ex. Java WM)

Page 8: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Grid Abstract Computing Machine

Focus on semantic of the Grid

Formal Methods (ASM)

Grid assumes a virtual pool of resources: (CPU cycles + Memory)

No theoretical limit: Grid can operate with a wide range of resources

“A Formal Framework for Defining Grid Systems” Zsolt N. Nemeth & Vaidy Sunderam

2nd IEEE/ACM (CCGRID'02)

Page 9: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Grid capabilities

Extending “job submission process”:

dynamic Java class loading

functional resolution

inference and reasoning

query evaluation

Page 10: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Extending the Grid capabilities

Theoretical approach

Provide a semantic extension of Grid ASM to verify if it is possible to extend Grid capabilities

Provide a suitable architectural definition for the Grid meta-computing functions

Page 11: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Extending the Grid capabilities

Applicative view

Provide an integration Layer within the jobmanager

Provide a proper extension of the IS to monitor new resources

Security GSI: no need to extend but to use!

Page 12: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

A first attempt: DSE

Page 13: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Grid DSE project

Architectural analysis of Grid middleware

Architectural analysis of DSE

Conceptual mapping between DSEs & Grid Resource Framework layer: represent DSE through the Grid resource abstraction.

Page 14: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Grid DSE goal

New calculous capabilities to a Grid Node: Inference, Query, Reasoning...

New fabric element for the Grid: Query Element (analogous of CE)

New IS

Page 15: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

QE integration

Page 16: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE JM detailsGrid Data Source Engine

Local Resource Manager

ODBC Driver

GRAM Protocol

GASSGlobusXIO

ODBC Driver

Catalog drv.

Job Manager

ODBC Manager

DSE Instance Man

DSE Instance

Worker Node

DSEMeta Machine

Internal DB

User DB

Page 17: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE & GIS

Grid Data Source Engine

Local Resource Manager

ODBC Driver

GRAM Protocol

GASS

GlobusXIO

ODBC DriverCatalog drv.

Gridinfosystem

MDS

LDAP

Job Manager

ODBC Manager

DSE Instance Man

DSE Instance

Worker Node

DSEMeta Machine

Internal DB

User DB

GANGLIA

snmpd

snmpd

Page 18: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

QE vs CE

Page 19: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Added values

QE is analogous to CE

QE is inside the Grid

monitored by the IS

easy to included in the WMS

VOMS

Page 20: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE in practice

Globus GT 2.4.3 (VDT) + globus patch

ODBC & JDBC

Perl DBD

a DBMS: any one supporting ODBC

Page 21: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE security

GSI+VOMS: VO + Groups + Roles;

Mapping VOMS cert with site user account;

Mapping local user account with DB user

User certificate => DB accounting

Page 22: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

DB accounting

GRANT user:

DB access

DB managing

Table/role access

User accounting

Page 23: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE demo usage

GDSE accounting

GDSE DB administration

GDSE in the GRID:

BDII usage

Load balancing

Computing and GDSE

Page 24: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE accounting

Site user

inafdbadmin

inafdbuser

inafdbmanager

DMBS (postgresql)

postgres

user

dbmanager

Page 25: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Simple Biomedical DB

Private

Code

Family_name

Name

Address

Telephone

medinfo

Code

Occupation

disease

Hospital

Biomed

Page 26: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE usage

Uses globus

globus-job-run resource SQL-STATEMENT CONNECTOR DBNAME

Ex. globus-job-run grid006 “select * from uno;” ODBC TEST

SQL

Page 27: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Create table as dbadmin

Page 28: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Insert into table

Page 29: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Select

Page 30: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Drop, vista & transaction

Page 31: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Adding new user

Page 32: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Job submission

Page 33: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE and BDII

Page 34: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Meta computing

Made the DB make some calculation for you

DB has scientific/statistical functions

Sometimes simply extracting some data is useless and time consuming

Page 35: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Astronomical example

GSC22 Star catalogue

Huge table: position, magnitude, etc.

Luminosity function on a sky area

Page 36: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Luminosity function

User

LFLF

plot

Page 37: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Luminosity function

Page 38: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Luminosity Function

Simple example of Workflow that uses GDSE and WNs

Page 39: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

Parallel access to DB

select

duroc

GDSE GDSE GDSE GDSE

stop

Page 40: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

GDSE and Metadata

Astronomical example

FITS files:

Metadata + dataSIMPLE = T / file does conform to FITS standardBITPIX = 16 / number of bits per data pixelNAXIS = 2 / number of data axesNAXIS1 = 4 / length of data axis 1NAXIS2 = 3 / length of data axis 2

123 456

986 345 869

1321 45 84

515

Header == metadata

data

Page 41: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

TTYPE1 = 'signal ' / label for field 1 TFORM1 = '1024E ' / data format of field: 4-byte REAL TUNIT1 = 'unknown ' / physical unit of field EXTNAME = 'xtension' / name of this binary table extension PIXTYPE = 'HEALPIX ' / HEALPIX pixelisation ORDERING= 'RING ' / Pixel ordering scheme, either RING or NESTED NSIDE = 128 / Resolution parameter for HEALPIX FIRSTPIX= 0 / First pixel # (0 based) LASTPIX = 196607 / Last pixel # (0 based) INDXSCHM= 'IMPLICIT' / Indexing: IMPLICIT or EXPLICIT COMMENT ------------------------ COMMENT POINTING CHARACTERISTICS COMMENT ------------------------ PT_MODEZ= 'gaussian' / the z-axis pointing mode PT_SIGZ = 5.0000000000E-01 / [arcmin] mean z-axis pointing error PT_DZMAX= 2.0000000000E+00 / [arcmin] maximum z-axis pointing error PT_MODEP= 'ideal ' / the initial scan phase mode PT_MODER= 'gaussian' / the rotation rate mode PT_SIGR = 2.0943951024E-03 / [rad s**(-1)] mean rotation rate error PT_DRMAX= 2.0943951024E-03 / [rad s**(-1)] maximum rotation rate error COMMENT ------------------------ COMMENT SCAN STRATEGY PARAMETERS COMMENT ------------------------ SS_MODE = 'cycloidal' / overall type of scan strategy SS_PMODE= 'followsun' / azimuthal scanning mode SS_T0_I = 1.1991888000E+09 / [s since 1970.0] scan reference time: int SS_T0_F = 0.0000000000E+00 / [s since 1970.0] scan reference time: frac SS_TH_Z0= 9.0000000000E+01 / [deg] fiducial scanning colatitide SS_DTH_Z= 7.0000000000E+00 / [deg] fiducial scanning colatitide variation SS_PROT = 6.0000000000E+01 / [s] rotation period of the satellite SS_PPT = 3.6000000000E+03 / [s] pointing period of scan strategy SS_PREPT= 3.6000000000E+03 / [s] repointing period of scan strategy SS_PMOT = 1.5778800000E+07 / [s] period of the main scan motion SS_PHASE= 1.4323944878E+01 / [deg] initial phase of main scan motion TH_SE_ST= 'warning ' / status of Sun/Earth aspect angle violations TH_S_MAX= 1.0000000000E+01 / [deg] maximum Solar aspect angle TH_E_MAX= 1.5000000000E+01 / [deg] maximum Earth aspect angle SS_N_PT = 8784 / number of pointing periods

COMMENT -------------------- COMMENT Cosmological parameters COMMENT -------------------- OMEGAB = 0.05 / Omega in baryons OMEGAC = 0.95 / Omega in dark matter OMEGAV = 0.00 / Omega in cosmological constant OMEGAN = 0.00 / Omega in neutrinos HUBBLE = 50.00 / Hubble constant in km/s/Mpc NNUNR = 0.00 / number of massive neutrinos NNUR = 3.04 / number of massless neutrinos TCMB = 2.7260 / CMB temperature in Kelvin HELFRACT= 0.24 / Helium fraction OPTDLSS = 0.00 / reionisation optical depth IONFRACT= 0.20 / ionisation fraction

Page 42: When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)

MetadataDBMS

GUID -> to all metadata

Query DB:Globus-job-run

Locate files:GUID

Get files:lfc

Make computation