Top Banner
Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD
28

Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Jan 04, 2016

Download

Documents

Marilyn Tucker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Laboratoire LIP6

The Gedeon Project: Data, Metadata and Databases

Yves DENNEULINLIG laboratory, Grenoble

ACI MD

Page 2: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Context and goals● Heterogeneous metadata management on grids

Clusters of clusters ● High-level queries using metadata● Easy and flexible deployment and

configuration● Minimal overhead● Various interfaces● Initial target application domains

Biocomputing (lots of metadata, few data) Microscopic imaging (lots of data data, few

metadata)

intergicielGEDEON

GrappeBioInfointergiciel

GEDEONintergicielGEDEON

Grille

Requète

Résultat

séquencesproproétaires

swissprot

TrEMBL

Page 3: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

The Gedeon middleware Metadata management on lightweight grids

● Records of (attribute,value) pairs stored in files Flexible requests

● Can be combined through scripting Various interfaces

● Command line (tools)● Libraries● Virtual FS (legacy applications support)

Deployment “à la carte”● Composition of various data sources

Performances● Dedicated I/O library● Semantic caching

Page 4: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Outline

1. General architecture

a.Gedeon internal structure

b.Composition of various data sources

2. Practical use

3. « dual » cache

Conclusion

Page 5: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Example of a deploymentQuery Interface(API, FS, GUI, ...)

Local proxy

Interconnect middleware Interconnect middleware

Local proxy Local proxy Local proxy

Interconnect

Client

Servers« close »

to the client

Storage sites

cache

cache cach

e

cach

e

cache

cache cache cache

Page 6: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Gedeon components● Gedeon Kernel

fuple● I/O Library● Evaluate the queries

lowerG● Operators to compose bases● Remote access

● Interface API lowerG Virtual FS

● Cache

application

vSGF

lowerG

fuple

network

cach

e

fuple

network

lowerG

Local proxy

Page 7: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

What inside the sources?

● Records of pairs attribute/value

Id

classifA

classifB

457

Bacteria

Clostridia

taille 26

ref

Record

Page 8: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Example of composition of sources

client

+

J

Metadata can belocal or copies

site S1site S2

site S3

RR

Page 9: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

...

Union

enreg. A1

enreg. A2

enreg. A3

enreg. A4

+

enreg. B1

enreg. B2

enreg. B3

enreg. B4

...

...

enreg. A1

enreg. A2

enreg. A3

enreg. A4

enreg. B1

enreg. B2

enreg. B3

enreg. B4Unify storage space

+Parallel evaluation

Page 10: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Round Robin

RR

Fault Tolerance

client

Source 1

Source 2

Page 11: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Round Robin

RR

Load Balancing

client

Source 1

Source 2

client

Page 12: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

...

Join operatorId

A1

A2

457

v1

v2

A3 v3

Id

A1

A2

458

v4

v5

A3 v6

J

Id

...

Id

An

457

vAn1

Id

An

458

vAn2

...

Id

A1

A2

457

v1

v2

A3 v3

Id

A1

A2

458

v4

v5

A3 v6

An vAn1

An vAn2

Enrich a source withanother

Page 13: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Outline

1. General architecture

a.Gedeon internal structure

b.Composition of various data sources

2. Practical use

3. « dual » cache

Conclusion

Page 14: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Tools 1/2

● Libraries● CLI● Operations

sort projection select index ...

Page 15: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Tools 2/2

sort(attr='taille')

● Examples sort$> cat mesmeta.g | fsort 'taille' > trie_taille.g

index

create_idx(attr='Id')

.Id.idx

.Id.idx

.Id.idx

search_idx('Id', 'P0123')

Page 16: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Language for the requests

● Simple ($, type control with the operators)

● Regular expressions

● Of the second order

Page 17: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Select expression

Id

classifB

459

Bacteria

taille 47

Id

classifA

460

Fermicutes

Select$Id>459

Id

classifA

460

Fermicutes

Id

classifA

classifB

457

Bacteria

Clostridia

taille 26

Page 18: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Select using regexpId

classifA

classifB

Id

classifB

457

Bacteria

Clostridia

459

Bacteria

taille 26

taille 47

Id

classifA

460

Fermicutes

Select$classifB==/.*a$/

Id

classifA

classifB

457

Bacteria

Clostridia

taille 26

Id

classifB

459

Bacteria

taille 47

Page 19: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Select using 2nd order logicId

classifA

classifB

Id

classifB

457

Bacteria

Clostridia

459

Bacteria

taille 26

taille 47

Id

classifA

460

Fermicutes

Select$/classif[AB]/==Bacteria

&&$taille>=36

Id

classifB

459

Bacteria

taille 47

Page 20: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Virtual FS interface

● Just a specific file-oriented interface● Data and metadata can be anywhere in the grid● Definition of logical directories

Ex: cd '$classifB==|.*a$|' « and » between directories 1 filename =value of a metadata: logical view

/fs_virt/$classifB==|.*a$|> ls457 459/fs_virt/$classifB==|.*a$|> cat *>/tmp/mater/fs_virt/$classifB==|.*a$|>

Page 21: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Outline

1. General architecture

a.Gedeon internal structure

b.Composition of various data sources

2. Practical use

3. « dual » cache

Conclusion

Page 22: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Dual cache (1)

● 2 cooperative caches cache of requests (R, {id,...})

-> save computing power cache of data (id, {attr,...})

-> save bandwidth● Semantic cache

Can evaluate a query using the data in the cache Can generate a remainder to complement the data

cached

Page 23: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Example

● Refinement of a request1)'$OC==/Eukaryota/'

-> (R, Lid={id1,id2, ...})2)'$OC==/Eukaryota/ && $year>=1998'

Select(*Lid, '$year>=1998')

Page 24: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Dual cache (2)

● Distributed semantic cache Typically used inside communities

● Lots of common requests No location constraints

● Members of the community can be geographically scattered

● Distributed data cache Minimize time and data transfer Cooperation between close, from a topological point

of view, sites

Page 25: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Dual cache (3)

Grenoble

ServersServers

Rennes

Dual cache

Query cache

Object cache

Semantic locality

Community Eukaryota

Community Archaea

Geographic locality

Page 26: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Dual cache (4)

● Work in progress on the notion of distance Find geographical proximity Find common interests between communities

● Create hybrid communities based on their requests

● Could be used to change the cache parameters Manual and/or automatic

Page 27: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Conclusion

● A data integration middleware Handling of metadata

● Distributed and modular Deployment can be done according to

architectural/organisational constraints● Definition of a dual cache infrastructure

Reflect both organisational use● Prototype in use

Packaging and documentation needed

Page 28: Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.

Questions?