Top Banner
Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain ([email protected]) Supervised By Dr. Hasan Jamil ([email protected]) Wayne State University, Detroit, USA
37

Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain ([email protected]) Supervised By Dr. Hasan Jamil ([email protected]) Wayne State.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

Integration of Biological Data (LifeDB)

Presented ByMd. Shazzad Hosain ([email protected])

Supervised ByDr. Hasan Jamil ([email protected])

Wayne State University, Detroit, USA

Page 2: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 2

Outline

Data Integration WebFusion (our previous work) LifeDB (our goal) Research Scopes

Page 3: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 3

Data Integration Example

Detroit to Bologna air ticket Alitalia, Italy Airline Air France NorthWest Airline Lufthansa etc.

Page 4: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 4

Page 5: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 5

Page 6: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 6

Integration Example cont.

CheapAir.com / Expedia.com

Alitalia Lufthansa Air France Delta

myAirFare.com

CheapAir.com Expedia.com ……

Page 7: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 7

Integration Approaches

Warehouse Integration

Mediator based Integration

Navigational Integration

Page 8: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 8

Warehouse Integration

Materialize data from all sources to local warehouse

Emphasize data translation rather query translation

Advantages: Low network bottleneck, efficient Disadvantages: reliability in terms of most up

to date data, system maintenance

Page 9: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 9

Mediator – based Integration

Concentrates on Query translation GAV approach and LAV Approach

Page 10: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 10

GAV Approach

Query reformulation easy, but addition or removal of sources are difficult

Preferred when sources are known an stable

S1 S2 S3 S4

Mediator Schema

Page 11: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 11

LAV Approach

Query reformulation is difficult but addition or removal of source are easy

Appropriate for large scale ad-hoc integration ARIADNE, Discovery Link, TAMBIS, KIND etc

Mediator Schema

S1 S2 S3 S4

Page 12: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 12

Navigational Integration

Some sources provide information that would not/hardly be accessible without point-and-click navigation

Page 13: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 13

WebFusionDr. Liangyou Chen

Page 14: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 14

LinkDB

DBGET

KEGG Pathways

Can these be done electronically for a biologist?

Page 15: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 15

Go to: http://www.ncbi.nlm.nih.gov/LocusLink/

Page 16: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 16

Click <Register Web Process> menu

Page 17: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 17

2. Press <Pickup Input> button

1. Input: 103730

Page 18: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 18

1. Press <Next> button

2. Press [Go] button

Page 19: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 19

1. Mark the table

2. Press <Pickup Table> Button

Page 20: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 20Press the <Create> Button

Page 21: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 212. Press the <Update & Redraw> Button

1. Uncheck all Boxes except 2~6

Page 22: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 22

1. Give it a name called: LocusLink

2. Name them as: Link, LocusID, Org, Symbol, Descriptionrespectively

3. Select appropriate transformations

4. Press <Update & Redraw> button

Page 23: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 23

Press <Confirm & Create Table>

Page 24: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 24

LocusLink web process is created

Page 25: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 25

LinkDB

DBGET

KEGG Pathways

Page 26: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 26

1. Select ‘LocusLink’ table

2. Type in ‘LocusLinkQuery’ as a query name

3. Check these fields to display

4. Double click here

Page 27: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 27

1. Select ‘local_gene_ids’ table

2. Select ‘LID’ field

3. Click here (any place)

Page 28: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 28

Click <This Query> button

Page 29: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 29

Press <Execute> button

Page 30: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 30

Here shows in progress results

Page 31: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 31

LifeDB

Page 32: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 32

LinkDB

DBGET

KEGG Pathways

Page 33: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 33

Resource Discovery Automatic Schema/Ontology Matching Query Optimization WorkFlows

LifeDB

BioFlow (A declarative WorkFlow Language)

Page 34: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 34

Glimpse of BioFlow

GeneBankURL FlyBaseURL

DNA sequence repositories

EMBL formatGeneBank format

Combine these sequence

Reading Frame Predictor (input_seq : FASTA format, species)

Score and predicted DNA region

University of Minnesota

Page 35: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 35

BioFlow

workflow open_reading_frame ; use ontology BioSystems ; declare found logical, count int ; define data sequences_1 at GeneBankURL as (seq_1 DNA) ; define tool orf at URL parameter (seq DNA, target organism)

results (score int, predicted_region DNA) ; combine sequences_1, sequences_2 into sequences (seqs); select seqs, orf (seqs, “drosophila”) from sequences ;

Goal is to develop a formal BioFlow language syntax with compositionality, closure property and type safety

Page 36: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 36

Research Scope

Resource Discovery Automatic Schema/Ontology Matching Query Optimization WorkFlows

7-8 PhD positions 3-5 years funding

Page 37: Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State.

04/18/23 37

Thanks to all