Top Banner
The STRING database Michael Kuhn EMBL Heidelberg
103

The STRING database Michael Kuhn EMBL Heidelberg.

Dec 18, 2015

Download

Documents

Charles Russell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The STRING database Michael Kuhn EMBL Heidelberg.

The STRING database

Michael KuhnEMBL Heidelberg

Page 2: The STRING database Michael Kuhn EMBL Heidelberg.

protein interactions

Page 3: The STRING database Michael Kuhn EMBL Heidelberg.
Page 4: The STRING database Michael Kuhn EMBL Heidelberg.
Page 5: The STRING database Michael Kuhn EMBL Heidelberg.
Page 6: The STRING database Michael Kuhn EMBL Heidelberg.
Page 7: The STRING database Michael Kuhn EMBL Heidelberg.
Page 8: The STRING database Michael Kuhn EMBL Heidelberg.
Page 9: The STRING database Michael Kuhn EMBL Heidelberg.

example

Tryptophan synthase beta chainE. Coli K12

Page 10: The STRING database Michael Kuhn EMBL Heidelberg.
Page 11: The STRING database Michael Kuhn EMBL Heidelberg.
Page 12: The STRING database Michael Kuhn EMBL Heidelberg.
Page 13: The STRING database Michael Kuhn EMBL Heidelberg.

many sources

Page 14: The STRING database Michael Kuhn EMBL Heidelberg.

genomic context

Page 15: The STRING database Michael Kuhn EMBL Heidelberg.

curated knowledge

Page 16: The STRING database Michael Kuhn EMBL Heidelberg.

Texperimental evidence

Page 17: The STRING database Michael Kuhn EMBL Heidelberg.

literature

Page 18: The STRING database Michael Kuhn EMBL Heidelberg.

373 genomes

(only completely sequenced genomes)

Page 19: The STRING database Michael Kuhn EMBL Heidelberg.

1.5 million genes

(not proteins)

Page 20: The STRING database Michael Kuhn EMBL Heidelberg.

Genome Reviews

Page 21: The STRING database Michael Kuhn EMBL Heidelberg.

RefSeq

Page 22: The STRING database Michael Kuhn EMBL Heidelberg.

Ensembl

Page 23: The STRING database Michael Kuhn EMBL Heidelberg.

model organism databases

Page 24: The STRING database Michael Kuhn EMBL Heidelberg.

data integration

Page 25: The STRING database Michael Kuhn EMBL Heidelberg.

genomic context methods

Page 26: The STRING database Michael Kuhn EMBL Heidelberg.

gene fusion

Page 27: The STRING database Michael Kuhn EMBL Heidelberg.

gene neighborhood

Page 28: The STRING database Michael Kuhn EMBL Heidelberg.

phylogenetic profiles

Page 29: The STRING database Michael Kuhn EMBL Heidelberg.
Page 30: The STRING database Michael Kuhn EMBL Heidelberg.
Page 31: The STRING database Michael Kuhn EMBL Heidelberg.
Page 32: The STRING database Michael Kuhn EMBL Heidelberg.

Cell

Cellulosomes

Cellulose

Page 33: The STRING database Michael Kuhn EMBL Heidelberg.

automatic inferenceof interactions

Page 34: The STRING database Michael Kuhn EMBL Heidelberg.

correct interactions

Page 35: The STRING database Michael Kuhn EMBL Heidelberg.

wrong associations

Page 36: The STRING database Michael Kuhn EMBL Heidelberg.

gene fusion

score: sequence similarity

Page 37: The STRING database Michael Kuhn EMBL Heidelberg.

gene neighborhood

score: sum of intergenic distances

Page 38: The STRING database Michael Kuhn EMBL Heidelberg.

phylogenetic profiles

Page 39: The STRING database Michael Kuhn EMBL Heidelberg.

SVD

singular value decomposition(removes redundancy)

Page 40: The STRING database Michael Kuhn EMBL Heidelberg.

score: Euclidean distance

Page 41: The STRING database Michael Kuhn EMBL Heidelberg.

all scores are “raw scores”

Page 42: The STRING database Michael Kuhn EMBL Heidelberg.

not comparable

sequence similarity

sum of intergenic distances

Euclidean distance

Page 43: The STRING database Michael Kuhn EMBL Heidelberg.

benchmarking

calibrate against “gold standard”(KEGG)

Page 44: The STRING database Michael Kuhn EMBL Heidelberg.
Page 45: The STRING database Michael Kuhn EMBL Heidelberg.

raw scores

Page 46: The STRING database Michael Kuhn EMBL Heidelberg.

probabilistic scores

e.g. “70% chance for an assocation”

Page 47: The STRING database Michael Kuhn EMBL Heidelberg.

curated knowledge

Page 48: The STRING database Michael Kuhn EMBL Heidelberg.

KEGG

Kyoto Encyclopedia of Genes

Page 49: The STRING database Michael Kuhn EMBL Heidelberg.

Reactome

Page 50: The STRING database Michael Kuhn EMBL Heidelberg.

GO

Gene Ontology

Page 51: The STRING database Michael Kuhn EMBL Heidelberg.

primary experimental data

Page 52: The STRING database Michael Kuhn EMBL Heidelberg.

many sources

Page 53: The STRING database Michael Kuhn EMBL Heidelberg.

many parsers

Page 54: The STRING database Michael Kuhn EMBL Heidelberg.

BIND

Biomolecular Interaction Network Database

Page 55: The STRING database Michael Kuhn EMBL Heidelberg.

GRID

General Repository for Interaction Datasets

Page 56: The STRING database Michael Kuhn EMBL Heidelberg.

HPRD

Human Protein Reference Database

Page 57: The STRING database Michael Kuhn EMBL Heidelberg.

co-expression

microarray data

Page 58: The STRING database Michael Kuhn EMBL Heidelberg.

GEO

Gene Expression Omnibus

Page 59: The STRING database Michael Kuhn EMBL Heidelberg.

correlation coefficient

Page 60: The STRING database Michael Kuhn EMBL Heidelberg.

literature mining

Page 61: The STRING database Michael Kuhn EMBL Heidelberg.

different gene identifiers

Page 62: The STRING database Michael Kuhn EMBL Heidelberg.

synonyms list

Page 63: The STRING database Michael Kuhn EMBL Heidelberg.

Medline

Page 64: The STRING database Michael Kuhn EMBL Heidelberg.

SGD

Saccharomyces Genome Database

Page 65: The STRING database Michael Kuhn EMBL Heidelberg.

The Interactive Fly

Page 66: The STRING database Michael Kuhn EMBL Heidelberg.

OMIM

Online Mendelian Inheritance in Man

Page 67: The STRING database Michael Kuhn EMBL Heidelberg.

simple scheme

Page 68: The STRING database Michael Kuhn EMBL Heidelberg.

co-mentioning

Page 69: The STRING database Michael Kuhn EMBL Heidelberg.

more advanced

Page 70: The STRING database Michael Kuhn EMBL Heidelberg.

NLP

Natural Language Processing

Page 71: The STRING database Michael Kuhn EMBL Heidelberg.

Gene and protein namesCue words for entity

recognitionVerbs for relation extraction

The expression ofthe cytochrome genes

CYC1 and CYC7is controlled by

HAP1

Page 72: The STRING database Michael Kuhn EMBL Heidelberg.

calibrate against gold standard

Page 73: The STRING database Michael Kuhn EMBL Heidelberg.
Page 74: The STRING database Michael Kuhn EMBL Heidelberg.

combine all evidence

Page 75: The STRING database Michael Kuhn EMBL Heidelberg.

Bayesian scoring scheme

Page 76: The STRING database Michael Kuhn EMBL Heidelberg.

e.g.: two scores of 0.7

combined probability: ?

Page 77: The STRING database Michael Kuhn EMBL Heidelberg.

e.g.: two scores of 0.7

combined probability: 0.91

1 - (1-0.7)2 = 0.91

Page 78: The STRING database Michael Kuhn EMBL Heidelberg.

evidence transfer

Page 79: The STRING database Michael Kuhn EMBL Heidelberg.

evidence spread over many species

Page 80: The STRING database Michael Kuhn EMBL Heidelberg.

transfer by orthology

(or “fuzzy orthology”)

Page 81: The STRING database Michael Kuhn EMBL Heidelberg.

von Mering et al., Nucleic Acids Research, 2005

Page 82: The STRING database Michael Kuhn EMBL Heidelberg.

von Mering et al., Nucleic Acids Research, 2005

Page 83: The STRING database Michael Kuhn EMBL Heidelberg.

two modes

Page 84: The STRING database Michael Kuhn EMBL Heidelberg.
Page 85: The STRING database Michael Kuhn EMBL Heidelberg.
Page 86: The STRING database Michael Kuhn EMBL Heidelberg.

COG mode

Page 87: The STRING database Michael Kuhn EMBL Heidelberg.

von Mering et al., Nucleic Acids Research, 2005

Page 88: The STRING database Michael Kuhn EMBL Heidelberg.

higher coveragelower specificity

includes all available evidence

some orthologous groups are too large to be meaningful

Page 89: The STRING database Michael Kuhn EMBL Heidelberg.

proteins mode

Page 90: The STRING database Michael Kuhn EMBL Heidelberg.

von Mering et al., Nucleic Acids Research, 2005

Page 91: The STRING database Michael Kuhn EMBL Heidelberg.

maximum specificitylower coverage

information will be relevant for selected species

Page 92: The STRING database Michael Kuhn EMBL Heidelberg.

Demo

Page 93: The STRING database Michael Kuhn EMBL Heidelberg.
Page 94: The STRING database Michael Kuhn EMBL Heidelberg.

outlook

Page 95: The STRING database Michael Kuhn EMBL Heidelberg.
Page 96: The STRING database Michael Kuhn EMBL Heidelberg.
Page 97: The STRING database Michael Kuhn EMBL Heidelberg.

take home message

STRING integrates information and predicts interactions

You can always go to the sources

Proteins mode: specific speciesCOG mode: more coverage,

especially for prokaryotic genes

Page 98: The STRING database Michael Kuhn EMBL Heidelberg.

Acknowledgements

The STRING team

Lars JensenPeer Bork

Christian von Mering & group in Zurich

Berend SnelMartijn Huynen

Page 99: The STRING database Michael Kuhn EMBL Heidelberg.

Thank you for your attention

Page 100: The STRING database Michael Kuhn EMBL Heidelberg.

take home message

STRING integrates information and predicts interactions

You can always go to the sources

Proteins mode: specific speciesCOG mode: more coverage,

especially for prokaryotic genes

Page 101: The STRING database Michael Kuhn EMBL Heidelberg.

Exercises:tinyurl.com/36twzq

(or via course wiki)

Alternative server:xi.embl.de

Page 102: The STRING database Michael Kuhn EMBL Heidelberg.
Page 103: The STRING database Michael Kuhn EMBL Heidelberg.

Bork et al., Current Opinion in Structural Biology, 2004