Top Banner
Cédric Notredame (25/03/22) Finding What you Need in Biological Databases Cédric Notredame
134

Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Dec 28, 2015

Download

Documents

Vanessa Collins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Finding What you Need in Biological

Databases

Cédric Notredame

Page 2: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Where is my Needle ?

Databases:

Page 3: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 4: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Our Scope

Give you means to answer simple questions

Databases are UNFRIENDLY INFORMATION DESKS

Give you an idea of what is possible

WHAT can you ask ?

HOW can you ask it ?

Page 5: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Outline

- An Overall view

- Asking a biological question to a database

- Turning a question into a query

- Bibliographic Databases: Medline, OMIM

- Gene Databases: GenBank, LocusLink, ENSEMBL

- Protein Databases: SwissProt, InterPro, Prodom

- SRS

Page 6: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Database:

What is a Database ?

Page 7: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

DataBase Entries

1 entry = 1 SequenceAGCTGTCGAGGGATAGGACATATACATAAATTAATATAAT

1 entry = 1 File = Sequence +DocSEQ

DOC

= Flat File

Database = Collection of Flat FilesSEQ

DOCSEQ

DOCSEQ

DOCSEQ

DOCSEQ

DOCSEQ

DOCSEQ

DOC

Page 8: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

DataBase Entries: Flat Files

Accession number: 1

First Name: Amos

Last Name: Bairoch

Course: DEA=oct-nov-dec 2002

http://www.expasy.org/people/amos.html

//

Accession number: 2

First Name: Laurent

Last name: Falquet

Course: EMBnet=sept 2000, sept 2001;DEA=oct-nov-dec 2000;

//

Accession number 3:

First Name: Marie-Claude

Last name: Blatter Garin

Course: EMBnet=sept 2000; sept 2001; DEA=oct-nov-dec 2000;

http://www.expasy.org/people/Marie-Claude.Blatter-Garin.html

//

Page 9: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

DataBase: Relational Databases

TeacherAccession number

Education

Amos 1 Biochemistry

Laurent 2 Biochemistry

M-Claude 3 Biochemistry

CourseDate Involved

teachers

DEA Oct-nov-dec 2000 1,3

EMBnet Sept 2000, Sept 2001 2,3

Relational database (« table file »):

Page 10: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

To Summarize: What’s a database ?

Collection of Data that is:•Structured Data •Searchable (index) -> table of contents

•Updated periodically (release) -> new edition

•Cross-referenced (hyperlinks) -> links with other db

Collection of tools (software) necessary for:

Searching –Updating -Releasing

Data storage managment: flat files, relational databases…

Page 11: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Database:

What’s on the Menu?

Page 12: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

A large amount of information

More than 1000 different databases

Generally accessible through the webEBI: http://www.ebi.ac.uk/

NCBI: http://www.ncbi.nlm.nih.org

Google: http://www.google.com

Variable size: <100Kb to >10GbDNA: > 10 Gb

Protein: 1 Gb

3D structure: 5 Gb

Other: smaller

Update frequency: daily to annually

Page 13: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

A Non Exhaustive List

AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,BioMagResBank, BIOMDB, BLOCKS, BovGBASE,

BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!!

There Exists A Specialized Database on Almost anything you can think of

Page 14: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

A database of databases

Page 15: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What’s on the Menu:The Art of Eating Well

Always Use Fresh Data: The Latest Update of your DataBase

Make Sure The DataBase is Maintained: Many Databases are poorly maintained

Treat DataBases like Publications: Some Journals are Better than Others

Page 16: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Bio-Google:

How Can I Search a Database ?

Page 17: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Searching Databases

There are 2 ways to search databases

Text based queries: Medline, EntrezSEQ

DOCSearch For « Smith AND dUTPase>

Similarity Searches: BLASTAGCTGTCGAGGGATAGGACATATACATAAATTAATATAAT

Page 18: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Searching Databases

Each database is a little kingdom…

Has its own query system

Has its own information structure

The main databases are well documentedand this documentation is available online

Most databases can be searched using SRSor Entrez

Page 19: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Databases: Asking the right Question

Databases ARE NOT meant for browsing

When you search a Database you must have an idea of what your Needle-in-a-hay-stack looks like

Page 20: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Databases: Asking the right Question

Browsing a database is like Using your

phone book in place of a dating agency…

Page 21: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Databases: Asking the right Question

Finding Data: Database Search

Finding Questions: Data Mining

Page 22: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

The Kind Of Questions We Can Ask:

SEQUENCE Based

InterPro Any Known Domain in my Protein ???

SwissProt Any Protein like mine ???

These ARE Predictions

Page 23: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

The Kind Of Questions We Can Ask:

TEXT Based

Medline Who Worked on my Protein ???

SwissProt Function of My Protein ???

PDB Structure of My Protein ???

These are NOT Predictions

Page 24: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Just like When You Google up

Specific Queries give Precise Answers

Page 25: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Medline:

Who worked on my Protein ?

Page 26: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Medline (PubMed)

Page 27: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What is in Medline ?

MEDLINE covers the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences

more than 4,000 biomedical journals and More than 10 million citations since 1966 until now

Contains links to biological db and to some journals

nMany papers not dealing with human are not in Medline

nBefore 1970, keeps only the first 10 authors !

Page 28: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Asking a question

During the last Lab Meeting, I heard the word dUTPase.

What can it be ? What has been published on this ?

Page 29: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Asking a question

Page 30: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Asking a question

Page 31: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Asking a question

Page 32: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Asking a question

By Default, Medline Assumes you mean:

Abergel AND dUTPase

Page 33: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Asking a question

I have found the reference I wanted.

Now I want to save it so that I can use it later, For instance to Import it in ENDnote my Reference Manager

Save Your Data in the Proper DataBase format

Page 34: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Storing your results

Page 35: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Storing your results

Page 36: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Retrieving EXACTLY the Information that you need

[AB] [AD]

Restricted fields

Page 37: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Storing your results

AB

AD

Page 38: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Looking for a Review

I Want to Find the LATEST REVIEW on the dUTPase.

Use The Limit Option of Medline

Page 39: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: Looking For a Review

LanguageTitle OR Abstract

Article type

1-Limits

Page 40: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: A Few Tips

•Quoted queries (e.g. «down syndrome» ) behave as a single word, and are great to improve the relevance of your search

•Adding initials to names (e.g. “Abergel C” ) (if you can) also reduces your output

•Write down the PubMed Identifier (the number in the PMID field) of that interesting paper you just find. It could be very useful in your subsequent search for related items such as associated gene and protein sequences

Page 41: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: A Few Tips

•Spelling mistakes, wrong field restrictions or Limits setting can occur. These may be the problem.

•Use abstracts to enlarge your vocabulary and look for synonyms: some papers on dUTPase might use dUTP pyrophosphatase instead!

•The “related papers” button (on the extreme right of the PubMed output). Try it from time to time, to enlarge a search that is not giving you enough references

Page 42: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Medline: A Few Tips

•Storing your PDFs,•Memory is cheap, access is sometimes strange…•Storing your favourite PDF is a good idea

•Which name on your disk?

•THE MEDLINE ID NUMBER !!!

•With a reference manager like EndNote

Page 43: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 44: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank:

What is the Sequence of my

Gene ?

Page 45: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank: an Overview

Page 46: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank: an Overview

Page 47: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank: an Overview

EMBL

DDBJ

GenBank

EMBL, GenBank and DDBJ are the same database. They are synchronized every day.

Page 48: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank: an Overview

GenBank contains EVERY piece of DNA that has been sequenced and made publicly available.

It contains GOOD and BAD data

There is a Historical Aspect in the GenBank data:

-Complex Genes are spread in many entries:

Page 49: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank Entries Are Complex because Genes are complex

Prokaryotic Example

GenePromoter RBS

Protein

ORF

mRNASTOPATG

Page 50: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

GenBank Entries Are Complex because Genes are complex

Gene

Promoter

Protein (form2)

Protein (form1)

mRNA (form1)

mRNA (form2)

exonexon exon exon exonexon

Page 51: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What is the Sequence of the E. Coli dUTPase ?

Using GenBank: Asking a question

Page 52: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: Asking a questionThe Naive Way

This search reports EVERY GenBank entry that contains these two words.

Most Bacterial Genomes Entries (annotated by similarity) Contain these two words

Escherichia coli dUTPase

Page 53: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: Asking a questionThe Right Way

Escherichia coli[organism] dUTPase[definition]

Page 54: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: And There Is Plenty More where It comes from…

If a Gene is published more than once, Each publication gets its own entry

This can mean MANY ENTRIES if you have SNPs or ESTs

GenBank Is Redundant:

Page 55: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

HeaderContains all the practical Information

Page 56: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

FeaturesContains Experimental

Information and Predictions

Page 57: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Extra GeneThis is common in GenBankentries

Page 58: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What is the Sequence of the Human dUTPase ?

Using GenBank: Asking a question

What is the Sequence of the E. Coli dUTPase ?

Page 59: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: Finding the Human dUTPase

2-Check box here to exclude ESTs

1-Request Limits

Page 60: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: Finding the Human dUTPase

The Gene does NOT appear in a single entry

Page 61: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: Finding the Human dUTPase

Page 62: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using GenBank: Reconstructing your gene

Page 63: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Some Good News…

-This Information is complicated because it is RAW Information

-It is necessary to keep UNINTERPRETED Experimental Information available

-There are SIMPLER alternatives to using this RAW Information:

-Gene Centric Databases-Protein Databases

Page 64: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

RefSeq/LocusLink:

What Is There To know about This

Gene?

Page 65: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using LocuLink

Page 66: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What Can I find about the DUT Gene ?

Using LocusLink: Asking a question

Page 67: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

EnterGene name

SelectLocusLink

Page 68: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using LocusLink: Asking a question about a Gene

Page 69: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using LocusLink: Asking a question about a Gene

Page 70: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

OMIM:

Is There A disease Associated to This

Gene?

Page 71: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

OMIM: Finding Out About The Phenotype of a Gene

Page 72: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

OMIM: Finding Out About The Phenotype of a Gene

OMIM™: Online Mendelian Inheritance in Man

A catalog of human genes and genetic disorders

Contains a summary of literature, pictures, and reference information. It also contains numerous links to articles and sequence information.

Page 73: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

OMIM: Finding Out About The Phenotype of a Gene

Page 74: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

NCBI-GENOME:

What is the Context of my Gene In Its

Genome?

Page 75: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

NCBI-GENOME

Page 76: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

NCBI-GENOME: The Virus Section

Page 77: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

NCBI-GENOME: The Virus Section

Page 78: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

NCBI-GENOME: The Bacteria Section

Page 79: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

NCBI-GENOME: The Bacteria Section

Page 80: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

ENSEMBL:

Where is my Gene in the Human

Genome (who are its neighbors) ?

Page 81: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using ENSEMBL

Page 82: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

My Gene:

A Summary

Page 83: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Gathering Everything you need on a gene

GenBank: What is the Sequence ?

LocusLink: What about this Gene?

ENSEMBL: What is the Context?

MEDLINE: Are There Papers?

OMIME: Are There Illnesses?

Page 84: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

SwissProt:

What Do We Know About My Protein ?

Page 85: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

The Protein Databases

GenBank: A Big Bag of DNA

PREDICTION+

EXPERIMENT

Generic Non Redundant Protein

DatabasesNR

trEMBLSpecialized Protein

DatabasesSwissProt

PIR

Page 86: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What Is SwissProt ?

Page 87: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

What Is SwissProt ?

Fully-annotated (manually), non-redundant, cross-referenced, documented protein sequence database.

~100 ’000 sequences from more than 6’800 different species; 70 ’000 references (publications); 550 ’000 cross-references (databases); ~200 Mb of annotations.

Collaboration between the SIB (CH) and EMBL/EBI (UK)

Page 88: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Asking a question

We hear the word EPO quite often these days, but whatexactly is known about it ?

Page 89: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Asking a question

A Simple SwissProt Text Query

EPO HUMAN

Page 90: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Reading an Entry

Page 91: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Reading an Entry

Page 92: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Reading an Entry

Page 93: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Reading an Entry

Page 94: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Reading an Entry

Structure Information

Page 95: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SwissProt: Reading an Entry

Page 96: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

The Protein Databases

GenBank: A Big Bag of DNA

PREDICTION+

EXPERIMENT

Specialized Protein DatabasesSwissProt

PIRUniProt

Generic Non Redundant Protein

DatabasesNR

trEMBL

Page 97: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 98: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 99: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 100: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

SwissProt

How Good is Good ?

Page 101: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 102: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 103: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

PDB:

What is the Structure of my

Protein ?

Page 104: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

PDB: The Protein Database

Page 105: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

PDB: The Protein Database

Managed by Research Collaboratory for Structural Bioinformatics (RCSB) (USA).

Contains macromolecular structure data on proteins, nucleic acids, protein-nucleic acid complexes, and viruses.

Currently there are ~16’000 structure data for about 4’000 different molecules, but far less protein families (highly redundant) !

Page 106: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Asking a question

Does tolB have a known Structure? And If the answer is Yes, How can I look at it ?

Page 107: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Asking a question

Query: TolB

Page 108: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Viewing a Structure

View Structure

Page 109: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Viewing a Structure

Page 110: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Viewing a Structure

Page 111: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Viewing a Structure

Page 112: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using PDB: Downloading Data

Coordinates

Page 113: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Interpro:

Are There Domains In my Protein ?

Page 114: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Interpro: The Idea of Domains

Page 115: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Interpro: The Idea of Domains

Page 116: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Interpro: A Federation of Databases

Page 117: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using InterPro: Asking a question

Which Domains does the oncogene FosB contain?

Page 118: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using InterPro: Asking a question

Page 119: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using InterPro: Asking a question

Page 120: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using CDsearch: Asking a question

Page 121: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using CDsearch: Asking a question

Page 122: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using Domains: Some Statistics

• 10 most common protein domains for H. sapiens

Immunoglobulin and major histocompatibility complex domainZinc finger, C2H2 typeEukaryotic protein kinaseRhodopsin-like GPCR superfamilyPleckstrin homology (PH) domainRING fingerSrc homology 3 (SH3) domainRNA-binding region RNP-1 (RNA recognition motif)EF-hand familyHomeobox domain

Page 123: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

My Protein:

A Summary

Page 124: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Gathering Everything you need on a Protein

trEMBL: What is the Sequence ?

MEDLINE: Are There Papers?

PDB: Which Structure?

INTERPRO: Which Domains?

SwissProt:What about the Function

Page 125: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

SRS:

Can I search Many Databases

Simultaneously ?

Page 126: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SRS

Page 127: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Using SRS

Page 128: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

A Few Databases in Bulk

Page 129: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 130: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 131: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 132: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

Page 133: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

A Few Addresses

Page 134: Cédric Notredame (20/09/2015) Finding What you Need in Biological Databases Cédric Notredame.

Cédric Notredame (19/04/23)

A few Databases