Top Banner
The Changing Nature of Biomedical Research: Semantic e-Science Robert Stevens BioHealth Informatics Group University of Manchester [email protected]
48

The Changing Nature of Biomedical Research: Semantic e-Science

May 21, 2015

Download

Science

robertstevens65

Keynote talk, at the KR4HC workshop at Artificial Intelligence in medicine Europe, Verona, 2009
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Changing Nature of Biomedical Research: Semantic e-Science

The Changing Nature of Biomedical Research: Semantic e-Science

Robert Stevens

BioHealth Informatics Group

University of Manchester

[email protected]

Page 2: The Changing Nature of Biomedical Research: Semantic e-Science

Introduction

• (Modern bio-molecular) Science• E-Science• Semantics and science• Semantic e-Science

Page 3: The Changing Nature of Biomedical Research: Semantic e-Science

Ernest Rutherford

“All science is either physics or stamp collecting”

Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg

Page 4: The Changing Nature of Biomedical Research: Semantic e-Science

Mathematical Sciences

Page 5: The Changing Nature of Biomedical Research: Semantic e-Science

Laws in Biology

Charles Darwin

Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg

On The Origin of Species - 1859

Page 6: The Changing Nature of Biomedical Research: Semantic e-Science

Central Dogma

Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg

Page 7: The Changing Nature of Biomedical Research: Semantic e-Science

Classic and Modern Biology

Genotype Phenotype

Modern biology

Classic biology

Page 8: The Changing Nature of Biomedical Research: Semantic e-Science

Speed of sequencing

• First human genome

– 10+ years to produce– Cost $500 million– Huge international effort

• Now done in 10 weeks

– (for $399)– http://tinyurl.com/genomecost– http://www.23andme.com

Page 9: The Changing Nature of Biomedical Research: Semantic e-Science

1000+ databases

• according to Nucleic Acids Research

Page 10: The Changing Nature of Biomedical Research: Semantic e-Science

PubMed: 2 papers per minute

• ~700,000 individual papers• Grows at 2 papers per minute

(see http://blogs.bbsrc.ac.uk for details)

Page 11: The Changing Nature of Biomedical Research: Semantic e-Science

Biology now has lots of facts

Page 12: The Changing Nature of Biomedical Research: Semantic e-Science

Lots of catalogues

Genome

Proteome

Transcriptome

Interactome

Metabolome

PHENOME

Page 13: The Changing Nature of Biomedical Research: Semantic e-Science

Creating Woods, not Trees

Genes

Proteins

Pathways

Interactions

LiteratureComplex Machines

Virtual Organism

…. from biological facts, we make a system that is some model of a real organism

Page 14: The Changing Nature of Biomedical Research: Semantic e-Science

Networks of Chemicals

Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif

Page 15: The Changing Nature of Biomedical Research: Semantic e-Science

Systems within Systems

Image: http://www.ehponline.org/members/2007/10373/fig1.jpg

Page 16: The Changing Nature of Biomedical Research: Semantic e-Science

Uniprot:- A protein database?

Page 17: The Changing Nature of Biomedical Research: Semantic e-Science

Navigating the Web of Knowledge in Bioinformatics

Page 18: The Changing Nature of Biomedical Research: Semantic e-Science

Bioinformatics Experiments are Data pipelines

Resources/S

ervices

Investigate the evolutionary relationships between proteins

Proteinsequences

Multiplesequencealignment

Query

[Peter Li]

My data

My tool

Page 19: The Changing Nature of Biomedical Research: Semantic e-Science

Linking together data resourcesHypo Science – the routine for the manyHyper Science – big projects, big science

Page 20: The Changing Nature of Biomedical Research: Semantic e-Science

The In Silico Experiment

• We can mine these data for possible hypotheses

• “what are the genes that are involved in some disease phenotype?”

• Correlate genes in QTL with differentially regulated genes in microarray via pathways; query the literature base with these genes, pathways and phenotype; …

• Resulting facts form some hypothesis: A co-ordinated set of SNPs increase cholesterol biosynthesis in macrophage, while delaying apoptosis of these cells; increased super-oxide production aids tolerance to trypanosomiasis in cattle

Page 21: The Changing Nature of Biomedical Research: Semantic e-Science

How bioinformatics was DoneIntegrating data sets

• Slave labour• Collections of Scripts• Warehouses• Applications

– Galaxy– Gaggle– Integr8– Ensembl– …..

• Workflows!

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta

Page 22: The Changing Nature of Biomedical Research: Semantic e-Science

Workflows: E. Science laboris

• Data preparation and analysis pipelines.• Data preparation pipelines• Data integration pipelines• Data analysis pipelines• Data annotation pipelines• Warehouse population refreshing• Data and text mining • Knowledge extraction.• Parameter sweeps over

simulations/computations• Model building and verification• Knowledge management and model

population• Hypothesis generation and modelling

Page 23: The Changing Nature of Biomedical Research: Semantic e-Science

• A workflow is a specification.• WFmS is the machinery for

coordinating the execution of (scientific) services and linking together (scientific) resources.

• Handles cross cutting concerns like: error handling, service invocation, data movement, data streaming, data provenance tracking, process auditing, execution monitoring, security access, blah blah…..

• Agile software development

Workflows: E. Science laboris

Enactment Engine

My data

My tool

Page 24: The Changing Nature of Biomedical Research: Semantic e-Science

Workflow Execution Engine

Workflow execution engineLocal desktop and remote server Implicit iteration over large data collectionsNested workflowsAutomated data flowEvent history log and data provenance trackingWithin-workflow programmingExtensibility points for plug-ins

Graphical workbenchFor ProfessionalsPlug-in architecture

Incorporate new service without coding. Services as they are.Access to local and remote resources and analysis tools

Re-Design

Rewritten

Page 25: The Changing Nature of Biomedical Research: Semantic e-Science

• Comparing resistant vs. susceptible strains – Microarrays

• Mapping quantitative traits – Classical genetics QTL

• Integrated Microarray data, genomic sequences, pathway data, literature mining.

Trypanosomiasis Study

Paul Fisher, et al Nucleic Acids Research, 2007, 35(16)

Page 26: The Changing Nature of Biomedical Research: Semantic e-Science

Genotype to Pathway

Created by Paul Fisher

Page 27: The Changing Nature of Biomedical Research: Semantic e-Science

Pathway to Phenotype

Created by Paul Fisher

Page 28: The Changing Nature of Biomedical Research: Semantic e-Science

• Eliminated user bias and premature filtering

• The scale and complexity of data and literature.

• Systematic data analysis

• Data analysis provenance

• Manageable amount of output data for biologists to interpret and verify

• Data driven science

“Looking where others hadn’t”

“make sense of this data” -> “does this make sense?”

http://www.youtube.com/watch?v=Y6_Kz5L010g

Page 29: The Changing Nature of Biomedical Research: Semantic e-Science

Transferring Characteristics

Uncharacterised protein

Tra1 La2 La3

High similarity transfer characteristics

Page 30: The Changing Nature of Biomedical Research: Semantic e-Science

… A Fact Based Discipline

• Rather than laws captured in mathematics….• We have lots of facts: the discipline’s knowledge• Rather than “calculating” what a protein does, we

investigate and write it down• Equivalent to writing down the trajectories of all

thrown objects and not doing ballistics!• To do biology one needs “the knowledge”

Page 31: The Changing Nature of Biomedical Research: Semantic e-Science

Heterogeneity

• 28 ways to format the representations of a biological sequence

• Though one way to represent the bases or amino acids…

• Different words same concept• Different concepts same words• Different and implicit data schema

Page 32: The Changing Nature of Biomedical Research: Semantic e-Science

An Identity Crisis

• Database entries have identifiers unique within their database

• The type of entity described in an entry doesn’t have an identifier

• Different entries about the same type talk about it differently

• How do we know when an entry in one DB talks about the same thing as another entry in another DB?

• That’s the skill of a bioinformatician

Page 33: The Changing Nature of Biomedical Research: Semantic e-Science

Categories and Category Labels

GO:0000368

U2-type nuclear mRNA 5' splice site recognition

spliceosomal E complex formation

spliceosomal E complex biosynthesis

spliceosomal CC complex formation

U2-type nuclear mRNA 5'-splice site recognition

Page 34: The Changing Nature of Biomedical Research: Semantic e-Science

The Role of Knowledge

• A lot of facts• Perhaps organised into a system• No equivalent of “laws of mechanics” – we

can’t do this biology with mathematics• Or at least not without knowing what the

numbers mean...• This is why we’ve been using ontologies!

Page 35: The Changing Nature of Biomedical Research: Semantic e-Science

Uses of Ontology in Bioinformatics

Page 36: The Changing Nature of Biomedical Research: Semantic e-Science

Post-Genomic Biology

• Fly, mouse, yeast, worm all have their own terminologies

• I want to compare genomes• How?• The genomic sequence is easily dealt with

computationally and comparisons are easy• This is not true of the annotations or knowledge of

those sequences• Need a common understanding

Page 37: The Changing Nature of Biomedical Research: Semantic e-Science

Annotation of Data

• Big effort to create controlled vocabularies using ontologies

• A huge annotation effort – describe the entities in DB with terms from ontologies

• The Gene Ontology (http://www.geneontology.org)• The Open Biomedical Ontologies Consortium

Page 38: The Changing Nature of Biomedical Research: Semantic e-Science
Page 39: The Changing Nature of Biomedical Research: Semantic e-Science

GO in Analysis

• Microarray analysis one of the original visions for GO• Clustering of modulated genes cluster about

functional attributes of their proteins• GO also used in, for example, semantic similarity;

text analysis; etc.

Page 40: The Changing Nature of Biomedical Research: Semantic e-Science

Biocatalogue content screenshot

Page 41: The Changing Nature of Biomedical Research: Semantic e-Science

Shield users and applications from service interoperability and incompatibility plumbing.

Turn your app into a service

Service providers Not only web services

How a bioinformatician assumes stuff should work

Page 42: The Changing Nature of Biomedical Research: Semantic e-Science

Pettifer, University of Manchester

inside

A collection of interactive tools for analysing protein sequence and structure

http://utopia.cs.manchester.ac.uk/

Page 43: The Changing Nature of Biomedical Research: Semantic e-Science

Semantic Descriptions of All

• Not just bio-entities in data• The laboratory experiments by which they were

generated• The protocols for their analysis • The services for their analysis

Page 44: The Changing Nature of Biomedical Research: Semantic e-Science

Semantic Integration

• Same identifiers means integration and interoperation• Most workflow hobbled by syntactic and semantic

heterogeneity• Syntactic integration (Bio2RDF)• Semantic integration via ontologies and naming

schemes• Enables better e-Science through semantic science

Page 45: The Changing Nature of Biomedical Research: Semantic e-Science

Fact Management

• When “stamp collecting” we’re collecting facts• Biology is a fact management activity• Knowing what these facts mean is very important• Science is performed on data and the semantics of data

enable us to do science• Semantic e-Science

Page 46: The Changing Nature of Biomedical Research: Semantic e-Science

Summary

• The nature of modern biology gives it interesting knowledge (fact) management issues

• It is a knowledge based discipline• Not unique, but often extreme• Ontologies seen as one component in management

(but not a panacea)• E-Science gives infra-structure for management;

semantics enable analysis• Actually, very light use of semantics

Page 47: The Changing Nature of Biomedical Research: Semantic e-Science
Page 48: The Changing Nature of Biomedical Research: Semantic e-Science

More Acknowledgements

• Phil Lord• Simon Jupp• Carole Goble