From Laboratory to e-Laboratory

Post on 12-Sep-2014

2458 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation for Lab-J of the Human Genetics Department at the Leiden University Medical Centre.

Transcript

From Laboratory to e-Laboratory?

Introduction for ‘Lab-J’ of the LUMC Human Genetics Department

Marco Roos Acknowledging the colleagues from BioSemantics, myGrid, OMII-UK, AID, The LUMC BioInformatics Expertise Centre

2

Introducing

Me

3

Liaison biology/bioinformatics – informatics

Biologist and bioinformatician, e-(bio)science researcherCoordinator BioSemantics group Leiden

Human Genetics Department Leiden University Medical Centre and Informatics Institute University of Amsterdam

Project or Area Liaison (PAL) OMII-UK Member BioAssist programme committee NBIC

4

also about

You

5

First about

Me

6

My C.V. before e-Sciencebefore 2003

• Molecular & Cellular biology (MSc)– microscopy and image analysis of chromosome structure– ‘minor’ computer science

• Image analysis methods to measure DNA content in bull sperm cells (civil service)

• Chromatin structure & function (PhD molecular cytology)

– F.I.S.H., microscopy, image analysis, statistics– 3-D chromosome structure during cell cycle (no luck)– DNA movement in Escherichia coli (success)

• Human Transcriptome Map (post-doc)– Gene expression to human genome sequence– Analysis of regions of increased gene expression

MotivationStructure and function of DNA in the nucleus

Esc

heri

chia

coli

Munti

acu

s m

untj

ak

8

Why bioinformatics?

Lab-J suggests…

07/04/2023 BioAID 9

Bioinformatics

A typical bioinformatician

07/04/2023 BioAID 10

Bioinformatics

A biologist behind a computerwho (just) learned perl

07/04/2023 BioAID 11

/* * determines ridges in htm expression table*/

#include "ridge.h"

int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable){

char querystring[256];

sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname);htmtable = PQexec(conn, querystring);

return(validquery(htmtable, querystring));}

int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount)/* determines if mincount genes in a row are (part of) a ridge *//* pre: htmtable is valid and sorted on genStart (ascending)/* post: {

if (mincount<=0) return TRUE;

if (row>=PQntuples(htmtable)) return FALSE;

if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold){ return FALSE;}return(is_ridge(htmtable, ++row, exprthreshold, --mincount));

}

int main(){

PGconn *conn; /* holds database connection */char querystring[256]; /* query string */PGresult *result;int i;

conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim");

if (PQstatus(conn)==CONNECTION_BAD){

fprintf(stderr, "connection to database failed.\n");fprintf(stderr, "%s", PQerrorMessage(conn));exit(1);

}else printf("Connection ok\n");

sprintf(querystring, "SELECT * FROM chromosomes");printf("%s\n", querystring);

result = PQexec(conn, querystring);

if (validquery(result, querystring)){

printresults(result);}else{

PQclear(result);PQfinish(conn);return FALSE;

}

PQclear(result);PQfinish(conn);return TRUE;

}

int printresults(PGresult *tuples){

int i;

for (i=0; i< PQntuples(tuples) && i < 10; i++){

printf("%d, ", i);printf("%s\n", PQgetvalue(tuples,i,0));

}return TRUE;

}

int validquery(PGresult *result, char *querystring){

printf(" in validquery\n");if (PQresultStatus(result) != PGRES_TUPLES_OK) {

printf("Query %s failed.\n", querystring);fprintf(stderr, "Query %s failed.\n", querystring);return FALSE;

}return TRUE;

}

13

Why e-science? What is wrong with bioinformatics?

Human geneticists think…

14

Why should a biologist be interested in e-science?

BioAssistants guessed…

• Involves Computation• Interpretation of results• Biology isn’t that interesting• Reduce reinvention of the wheel• Current lack of standards• Sharing results• Reshaping biology• Synergy between different sciences• Emerging Data driven science

15

Why e-Science?

A needy biologist

Single tiny brain

Lots of data to deal with

Lots of methodsand algorithms to try

and combine

No computationalsuperpowers

Lots of knowledge to deal with

16

1070 databases Nucleic Acids Research Jan 2008(96 in Jan 2001)

Proteomics, Genomics, Transcriptomics, Protein sequence prediction, Phenotypic studies, Phylogeny, Sequence analysis, Protein Structure prediction, Protein-protein interaction, Metabolomics, Model organism collections, Systems Biology, Epidemiology, etcetera …

All with a splendid interface… all different, of course

07/04/2023 17

Traditional data integration in bioinformatics

LocalDatabase

LocalDatabase

18

The ‘spaghetti’ approach

19

Some of my observations

• Reinvention– How many reannotation pipelines do you need?– Little reuse of components

• Reproducibility– Black boxes – Emphasis not on clarity– Can we understand bioinformatics as wet lab protocols?

• Focus on technicalities, not biological analysis– Should bioinformaticians write ‘job submission’ scripts?

• Data graveyards– Do we need >1000 databases?– Can we understand our own data?

21

SOME EXAMPLES FROM FIELD OF E-SCIENCE

22

Enhancement 1: Workflows(Taverna workflow)

23

Enhancement 2: exploiting brains

24

Exploiting Brains By Web Servicessource: http://biocatalogue.org (launched at ISMB2009)

>1000 annotated services, >3000 known to TavernaIncludes BioMart, R, Text mining, Kegg, NCBI Pubmed, Ensembl, etc.

Web Services run remotely

25

Exploiting more brains by sharing workflowssource: http://myExperiment.org

Social community web site for scientists2300 registered users in two years

750 workflows

Bioinformatics and e-science

Single purpose,single person,

black boxapplication

Customized experiments with reusable components

My component

Your componentMy component

Your component

My component

27

What do we know of our data?

Sufficient?

• Query discoveries?• Query across

experiment?• Fit biological

modelling?• Good basis for new

experiments?• Flexible enough?

Model-based data integration

Biological concepts (‘myModel’)

Data

Marshall et al., International Workshop on Knowledge Systems in Bioinformatics 2006Post et al., Bioinformatics 2007

Biologist readable

model

Computer

readable model

roos
Principle method extensively shown at previous SPX meeting

Model based data integrationExample: UCSC genome browser

partOf

30

Semantic Web (Linked Open Data)

31

Empower me with a ‘virtual brain’

My ws

Your ws

My ws

Your ws

My ws

* From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34

*

32

Query

Retrieve documents from Medline

Extract proteins (Homo sapiens)

Calculate ranking scores

Create biological cross references

Convert to table (html)

Add documents (IDs) to semantic model

Add proteins to semantic model

Add scores to semantic model

Add cross references to semantic model

Add query to semantic model

Workflow and Semantic Web

33

Concept web from a users point of view

34

e-Laboratories and e-Laboratory factories

35

e-Galaxy for NBIC

• Galaxy as front end

• Workflows & Web Services

• Grid enabled Taverna

• MOLGENIS

• Semantic/Concept Web

• myExperiment/BioCatalogue

• Scientific Research Objects

Vacancy! (software engineer)

37

e-Galaxy mock-up

Underlying workflow

Your Scientific Research Object

MOLGENISConvertImport/ExportResearch ObjectsStoreConfigureRun

Related research and documents

Adlsjflad jslf adsflkj alfd adsf Adflja dlfkjal adlfj lakdjflkj adf Adflkj lakjlkjadsf lakdfjlf ladoioewnJlakdsfo oiuw fja oija oisdflv oaijdf

Suggestions by semantic components

38

e-Science requirement: Reuse

E-La

bora

tory

com

pone

nt

39

http://www.epigenius.org/ (mock-up)

40

Research and development aims

• Automated support for hypothesis formation – E.g. on epigenetic mechanisms– Apply Workflow, Semantic Web, Concept Web– Concept-based meta-analysis– Automated triple creation from computational

analysis

41

Research and development ambitions

• Co-develop e-Laboratories– e-Galaxy– epiGenius– BioBanking

• Help BEC with support environment• Concept Web services

– Web services– E-Laboratory components– Transparent creation of triples– Personal semantic repositories

Liaison

OMII-UKManchester, Southampton, Edinburgh

(ca. 30 engineers)Taverna, myExperiment, e-Labs

W3C Health Care & Life Sciences Interest Group

Semantic Web expertsLinked Open Data

AIDUniversity of Amsterdam

e-Science expertsGrid tools

BioSemantics RotterdamText mining

Concept profile meta-analysis

NBICBioAssist core software development

Grid tools, Concept Web, e-Labs

Concept WebContent, tools and infrastructure

You?

Bioinformatics Expertise Centre LUMCStatistical and computer science expertise

Generic support

43

‘e’ for enhance, not enforce

Please help me to help you

Register for:http://snipurl.com/biosemanticsusers(http://www.myexperiment.org/groups/211)

Allows me to• Give you preferential treatment• Not spam everybody• Keep you informed• Ask your opinion (user driven development!)

44

Visit the BioSemantics web sitehttp://www.biosemantics.org/

45

Word of warning

Computer scientists are scientists too!Need to publishScore by papers, not by softwareAddressed by OMII-UK and BioAssist

Compare“How can I use it in the clinic?”“How can I use it in the lab?”

46

Dissemination

• Come by for help or information• Internal ‘mini-courses’?• Send me suggestions!

• FYI: Course ‘Managing Life Science Information’ for PhD students, 2010

47

Key points

• Liaisingbetween technology contacts and you, the colleagues of Human Genetics.

• No obligationsTry any new developments that we are involved in with our help, but don't feel obliged.

• Help us help you Express your wishes, problems, try things and give feedback – and be patient sometimes

Please join the biosemantics users group on myExperiment.org to help us communicate.

48

Thank you for your attention

An enhanced biologist

Lots of accessible data

Web Services, Workflows,

and their creatorsavailable

Other people’scomputationalsuperpowers

Knowledge basesto query

Communitybrain power

Homo biologicus enhancis

top related