When DBs met the GRID The Grid Data Source Engine Dr. Giuliano Taffoni (Ph.D.)
Jan 12, 2016
When DBs met the GRID
The Grid Data Source Engine
Dr. Giuliano Taffoni (Ph.D.)
The DMBS Problem
Scientific communities use DB for data (Astronomical Archives, Bioinformatics data...)
Metadata problem
No native access up to GT3.9 (OGSA-DAI)
GT4 (WS)
What is a DMBS?
A database is simply a bunch of information (data) stored on a computer.
DB management system is the service that allow to interact with those information
Relational databases consist of tables of data with clearly defined columns.
RDMS
Data is presented as a collection of relations
Each relation is depicted as a table
Columns are attributes
Rows ("tuples") represent entities
Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity
Software computing machines
DBMS <=> software computing machine
Memory model
Filesystem (table space)
data processor
Language SQL
SQLLanguage to access and manipulate DB system
Accept logical and math operators
You can ask a DB to make simple and complex operations (stats)
Language:
SELECT, INSERT
WHERE, AND, etc.
Meta-computing on the GRID
GRID is able to execute binary code!
It exists a different type of computing: the virtual machines (ex. Java WM)
Grid Abstract Computing Machine
Focus on semantic of the Grid
Formal Methods (ASM)
Grid assumes a virtual pool of resources: (CPU cycles + Memory)
No theoretical limit: Grid can operate with a wide range of resources
“A Formal Framework for Defining Grid Systems” Zsolt N. Nemeth & Vaidy Sunderam
2nd IEEE/ACM (CCGRID'02)
Grid capabilities
Extending “job submission process”:
dynamic Java class loading
functional resolution
inference and reasoning
query evaluation
Extending the Grid capabilities
Theoretical approach
Provide a semantic extension of Grid ASM to verify if it is possible to extend Grid capabilities
Provide a suitable architectural definition for the Grid meta-computing functions
Extending the Grid capabilities
Applicative view
Provide an integration Layer within the jobmanager
Provide a proper extension of the IS to monitor new resources
Security GSI: no need to extend but to use!
A first attempt: DSE
Grid DSE project
Architectural analysis of Grid middleware
Architectural analysis of DSE
Conceptual mapping between DSEs & Grid Resource Framework layer: represent DSE through the Grid resource abstraction.
Grid DSE goal
New calculous capabilities to a Grid Node: Inference, Query, Reasoning...
New fabric element for the Grid: Query Element (analogous of CE)
New IS
QE integration
GDSE JM detailsGrid Data Source Engine
Local Resource Manager
ODBC Driver
GRAM Protocol
GASSGlobusXIO
ODBC Driver
Catalog drv.
Job Manager
ODBC Manager
DSE Instance Man
DSE Instance
Worker Node
DSEMeta Machine
Internal DB
User DB
GDSE & GIS
Grid Data Source Engine
Local Resource Manager
ODBC Driver
GRAM Protocol
GASS
GlobusXIO
ODBC DriverCatalog drv.
Gridinfosystem
MDS
LDAP
Job Manager
ODBC Manager
DSE Instance Man
DSE Instance
Worker Node
DSEMeta Machine
Internal DB
User DB
GANGLIA
snmpd
snmpd
QE vs CE
Added values
QE is analogous to CE
QE is inside the Grid
monitored by the IS
easy to included in the WMS
VOMS
GDSE in practice
Globus GT 2.4.3 (VDT) + globus patch
ODBC & JDBC
Perl DBD
a DBMS: any one supporting ODBC
GDSE security
GSI+VOMS: VO + Groups + Roles;
Mapping VOMS cert with site user account;
Mapping local user account with DB user
User certificate => DB accounting
DB accounting
GRANT user:
DB access
DB managing
Table/role access
User accounting
GDSE demo usage
GDSE accounting
GDSE DB administration
GDSE in the GRID:
BDII usage
Load balancing
Computing and GDSE
GDSE accounting
Site user
inafdbadmin
inafdbuser
inafdbmanager
DMBS (postgresql)
postgres
user
dbmanager
Simple Biomedical DB
Private
Code
Family_name
Name
Address
Telephone
medinfo
Code
Occupation
disease
Hospital
Biomed
GDSE usage
Uses globus
globus-job-run resource SQL-STATEMENT CONNECTOR DBNAME
Ex. globus-job-run grid006 “select * from uno;” ODBC TEST
SQL
Create table as dbadmin
Insert into table
Select
Drop, vista & transaction
Adding new user
Job submission
GDSE and BDII
Meta computing
Made the DB make some calculation for you
DB has scientific/statistical functions
Sometimes simply extracting some data is useless and time consuming
Astronomical example
GSC22 Star catalogue
Huge table: position, magnitude, etc.
Luminosity function on a sky area
Luminosity function
User
LFLF
plot
Luminosity function
Luminosity Function
Simple example of Workflow that uses GDSE and WNs
Parallel access to DB
select
duroc
GDSE GDSE GDSE GDSE
stop
GDSE and Metadata
Astronomical example
FITS files:
Metadata + dataSIMPLE = T / file does conform to FITS standardBITPIX = 16 / number of bits per data pixelNAXIS = 2 / number of data axesNAXIS1 = 4 / length of data axis 1NAXIS2 = 3 / length of data axis 2
123 456
986 345 869
1321 45 84
515
Header == metadata
data
TTYPE1 = 'signal ' / label for field 1 TFORM1 = '1024E ' / data format of field: 4-byte REAL TUNIT1 = 'unknown ' / physical unit of field EXTNAME = 'xtension' / name of this binary table extension PIXTYPE = 'HEALPIX ' / HEALPIX pixelisation ORDERING= 'RING ' / Pixel ordering scheme, either RING or NESTED NSIDE = 128 / Resolution parameter for HEALPIX FIRSTPIX= 0 / First pixel # (0 based) LASTPIX = 196607 / Last pixel # (0 based) INDXSCHM= 'IMPLICIT' / Indexing: IMPLICIT or EXPLICIT COMMENT ------------------------ COMMENT POINTING CHARACTERISTICS COMMENT ------------------------ PT_MODEZ= 'gaussian' / the z-axis pointing mode PT_SIGZ = 5.0000000000E-01 / [arcmin] mean z-axis pointing error PT_DZMAX= 2.0000000000E+00 / [arcmin] maximum z-axis pointing error PT_MODEP= 'ideal ' / the initial scan phase mode PT_MODER= 'gaussian' / the rotation rate mode PT_SIGR = 2.0943951024E-03 / [rad s**(-1)] mean rotation rate error PT_DRMAX= 2.0943951024E-03 / [rad s**(-1)] maximum rotation rate error COMMENT ------------------------ COMMENT SCAN STRATEGY PARAMETERS COMMENT ------------------------ SS_MODE = 'cycloidal' / overall type of scan strategy SS_PMODE= 'followsun' / azimuthal scanning mode SS_T0_I = 1.1991888000E+09 / [s since 1970.0] scan reference time: int SS_T0_F = 0.0000000000E+00 / [s since 1970.0] scan reference time: frac SS_TH_Z0= 9.0000000000E+01 / [deg] fiducial scanning colatitide SS_DTH_Z= 7.0000000000E+00 / [deg] fiducial scanning colatitide variation SS_PROT = 6.0000000000E+01 / [s] rotation period of the satellite SS_PPT = 3.6000000000E+03 / [s] pointing period of scan strategy SS_PREPT= 3.6000000000E+03 / [s] repointing period of scan strategy SS_PMOT = 1.5778800000E+07 / [s] period of the main scan motion SS_PHASE= 1.4323944878E+01 / [deg] initial phase of main scan motion TH_SE_ST= 'warning ' / status of Sun/Earth aspect angle violations TH_S_MAX= 1.0000000000E+01 / [deg] maximum Solar aspect angle TH_E_MAX= 1.5000000000E+01 / [deg] maximum Earth aspect angle SS_N_PT = 8784 / number of pointing periods
COMMENT -------------------- COMMENT Cosmological parameters COMMENT -------------------- OMEGAB = 0.05 / Omega in baryons OMEGAC = 0.95 / Omega in dark matter OMEGAV = 0.00 / Omega in cosmological constant OMEGAN = 0.00 / Omega in neutrinos HUBBLE = 50.00 / Hubble constant in km/s/Mpc NNUNR = 0.00 / number of massive neutrinos NNUR = 3.04 / number of massless neutrinos TCMB = 2.7260 / CMB temperature in Kelvin HELFRACT= 0.24 / Helium fraction OPTDLSS = 0.00 / reionisation optical depth IONFRACT= 0.20 / ionisation fraction
MetadataDBMS
GUID -> to all metadata
Query DB:Globus-job-run
Locate files:GUID
Get files:lfc
Make computation