Top Banner
[email protected] Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics The UniProt knowledgebase www.uniprot.org a hub of integrated protein data http://education.expasy.org/cours/Turin/
75

[email protected] Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Jan 24, 2016

Download

Documents

sorley

The UniProt knowledgebase www.uniprot.org a hub of integrated protein data http://education.expasy.org/cours/Turin/. [email protected] Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics. Protein sequences. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

[email protected] group, GenevaSIB Swiss Institute of Bioinformatics

The UniProt knowledgebase

www.uniprot.org

a hub of integrated protein data

http://education.expasy.org/cours/Turin/

Page 2: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Protein sequences

• > 180 billions of ‘different’ proteins on earth (∑ N species x M genes)

• > 23.0 millions of ‘known and public’ protein sequences in 2012

• More than 99 % of the protein sequences are derived from the translation of nucleotide sequences (mRNA or DNA)

• About 1 % come from direct protein sequencing (Edman, MS/MS…)

Page 3: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Science cover, february 2011

Page 4: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

protein sequence functional information

data knowledge

Page 5: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt consortium

EBI : European Bioinformatics Institute (UK)SIB : Swiss Institute of Bioinformatics (CH)PIR : Protein information resource (US)

Page 6: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

www.uniprot.org

Page 7: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt databases

Page 8: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt databases

UniParcUniParc: protein sequence archive (EMBL-ENA equivalent at the protein level) Each entry contains a protein sequence, taxonomic information, cross-links to other databases where you find the sequence (active or not)

No annotation

All the public patented sequences are stored in UniParc (EPO, USPO, JPO)

You can: query, Blast, download

~31 mo entries

Page 9: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt databases

UniRefUniRef

3 clusters of protein sequences with 100, 90 and 50 % identity;

useful to speed up sequence similarity search (BLAST)

You can: query, Blast, download

UniRef100 17 mo entries; UniRef90 11 mo entries; UniRef50 5 mo entries

Page 10: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt databases

UniMESUniMES:

protein sequences derived from metagenomic projects (mostly Global Ocean Sampling (GOS))

You can : download

12 mo entries, included in UniParc

Page 11: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt databasesThe centerpiece

Page 12: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKBan encyclopedia on proteins

composed of 2 sectionsUniProtKB/TrEMBL and UniProtKB/Swiss-Prot

unreviewed and reviewed automatically annotated and manually annotated

released every 4 weeks

Page 13: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKBOrigin of protein sequences

UniProtKB protein sequences are mainly derived from

-INSDC (translated submitted coding sequences - CDS)-Ensembl (gene prediction) and RefSeq sequences-Sequences of PDB structures-Direct submission or sequences scanned from literature (includes direct protein sequencing)

Notes: - UniProt is not doing any gene prediction- Most non-germline immunoglobulins, T-cell receptors , most patent

sequences, highly over-represented data (e.g. viral antigens), pseudogenes sequences are excluded from UniProtKB, - but stored in UniParc

- Data from the PIR database have been integrated in UniProtKB since 2003.

15 %

85 %

Page 14: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Swiss-Prot

TrEMBL

EMBL

Automated extraction of protein sequence (translated CDS), gene name and

references.Automated annotation

Manual annotation of the sequence and associated

biological information

Page 15: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/TrEMBL

unreviewedAutomatic annotation

released every 4 weeks

Page 16: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

One protein sequenceOne species

Automated annotationKeywords

and Gene Ontology

Automated annotationFunction, Subcellular location,

Catalytic activity, Sequence similarities…

Automated annotationtransmembrane domains,

signal peptide…

Cross-references to over 125 databases

References

Protein and gene namesTaxonomic information

UniProtKB/TrEMBLwww.uniprot.org

Page 17: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/TrEMBL

Automatic annotation Protein sequence

- The quality of the protein sequences is dependent on the information provided by the submitter of the original nucleotide entry (CDS) or of the gene prediction pipeline (i.e. Ensembl). - 100% identical sequences (same length, same organism are merged automatically).

Biological information Sources of annotation-Provided by the submitter (EMBL, PDB, TAIR…)-From automated annotation (automated generated annotation rules (i.e. SAAS) and/or manually generated annotation rules (i.e. UniRule))

Page 18: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 19: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 20: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Example of fully automatic annotation: SAAS

• Rules are derived from the UniProtKB/Swiss-Prot manual annotation.

• Fully automated rule generation based on C4.5 decision tree algorithm.

• One annotation, one rule.

• High stringency – require 99% or greater estimated precision to generate annotation (test on UniProtKB/Swiss-Prot)

• Rules are produced, updated and validated at each release.

UniProtKB/TrEMBL

Page 21: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/Swiss-Prot

reviewedmanually annotated

released every 4 weeks

Manual biocuration is essential to knowledge maintenance

Page 22: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

MSKEKFERTKPHVNVGTIGHVDHGKTTLTAAITTVLAKTYGGAARAFDQIDNAPEEKARGITINTSHVEYDTPTRHYAHVDCPGHADYVKNMITGAAQMDGAILVVAATDGPMPQTREHILLGRQVGVPYIIVFLNKCDMVDDEELLELVEMEVRELLSQYDFPGDDTPIVRGSALKALE GDAEWEAKILELAGFLDSYIPEPERAIDKPFLLPIEDVFSISGRGTVVTGRVERGIIKVGEEVEIVGIKETQKSTCTGVEMFRKLLDEGRAGENVGVLLRGIKREEIERGQVLAKPGTIKPHTKFESEVYILSKDEGGRHTPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNIKMV VTLIHPIAMDDGLRFAIREGGRTVGAGVVAKVLG

One protein sequenceOne gene

One species

Manual annotationKeywords

and Gene Ontology

Manual annotationFunction, Subcellular location,

Catalytic activity, Disease, Tissue specificty, Pathway…

Manual annotationPost-translational modifications,

variants, transmembrane domains, signal peptide…

Cross-references to over 125 databases

References

Protein and gene namesTaxonomic information

Alternative products:protein sequences produced by

alternative splicing, alternative promoter usage,

alternative initiation…

UniProtKB/Swiss-Protwww.uniprot.org

Page 23: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/Swiss-Prot

Manual annotation

1. Protein sequence (merge available CDS, annotate sequence discrepancies, report sequencing mistakes…)

2. Biological information (sequence analysis, extract literature information, ortholog data propagation, …)

Page 24: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/Swiss-Prot

1- Protein sequence curation

Page 25: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

The displayed protein sequence: …canonical, representative, consensus…

+alternative sequences (described within the entry)

1 entry <-> 1 gene (1 species)

UniProtKB/Swiss-Prot

a gene-centric view of the protein space

Page 26: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

What is the current status?

• At least 20% of Swiss-Prot entries required a minimal amount of curation effort so as to obtain the “correct” sequence.

• Typical problems– unsolved conflicts– uncorrected initiation sites– frameshifts– wrong gene prediction– other ‘problems’

Page 27: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UCSC genome browserexamples of CDS annotation submitted to INSDC…

Page 28: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/Swiss-Prot

2- Biological data curation

Page 29: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB/Swiss-Prot gathers data form multiple sources:

- publications (literature/Pubmed)- prediction programs (Prosite, TMHMM, …)- contacts with experts - other databases- nomenclature committees

An evidence attribution system allows to easily trace the source of each annotation

Extract literature informationand protein sequence analysis

maximum usage of controlled vocabulary

Page 30: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Protein and gene names

Synonyms useful for

literature searching

Synonyms useful for

literature searching

Page 31: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

…enable researchers to obtain a summary of what is known about a protein…

General annotation

(Comments)

www.uniprot.org

An evidence attribution system allows to easily trace the source of each annotation(Reference number, By similarity, Probable, Potential)

Page 32: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Human protein manual annotation: some statistics (June 2012)

Page 33: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Sequence annotation

(Features)

…enable researchers to obtain a summary of what is known about a protein…

www.uniprot.org

An evidence attribution system allows to easily trace the source of each annotation(Reference number, By similarity, Probable, Potential)

Page 34: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Find all the proteins localized in the cytoplasm (experimentally

proven) which are phosphorylated on a serine (experimentally proven)

Page 35: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Ontologies

www.uniprot.org

Page 36: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

• The ‘Protein existence’ tag indicates what is the evidence for the existence of a given protein;

• Different qualifiers:1. Evidence at protein level (~18%) (MS, western blot (tissue specificity), immuno (subcellular

location),…)2. Evidence at transcript level (~19%)3. Inferred from homology (~58 %)4. Predicted (~5%)5. Uncertain (mainly in TrEMBL)

‘Protein existence’ tag

http://www.uniprot.org/docs/pe_criteria

Page 37: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Not sequence validation !

Page 38: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProtKB

Additional information can be found in the cross-references

(to more than 140 databases)

Page 39: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

2D gel2DBase-EcoliANU-2DPAGEAarhus/Ghent-2DPAGE (no server)

COMPLUYEAST-2DPAGECornea-2DPAGE DOSAC-COBS-2DPAGEECO2DBASE (no server)

OGPPHCI-2DPAGEPMMA-2DPAGERat-heart-2DPAGEREPRODUCTION-2DPAGESiena-2DPAGESWISS-2DPAGEUCD-2DPAGEWorld-2DPAGE

Family and domainGene3DHAMAPInterProPANTHERPfamPIRSFPRINTSProDomPROSITESMARTSUPFAMTIGRFAMs

Organism-specificAGDArachnoServerCGDConoServerCTDCYGD dictyBaseEchoBASEEcoGeneeuHCVdbEuPathDBFlyBaseGeneCardsGeneDB_SpombeGeneFarmGenoListGrameneH-InvDB HGNCHPA LegioListLepromaMaizeGDBMGIMIMneXtProtOrphanet PharmGKBPseudoCAPRGDSGDTAIRTubercuListWormBaseXenbaseZFIN

Protein family/groupAllergomeCAZyMEROPSPeroxiBasePptaseDBREBASETCDB

Genome annotationEnsemblEnsemblBacteriaEnsemblFungiEnsemblMetazoaEnsemblPlantsEnsemblProtistsGeneIDGenomeReviewsKEGGNMPDRTIGRUCSCVectorBase

Enzyme and pathwayBioCycBRENDAPathway_Interaction_DBReactome

OtherBindingDBDrugBank NextBio PMAP-CutDB

SequenceEMBLIPIPIRRefSeqUniGene

3D structureDisProtHSSPPDBPDBsumProteinModelPortalSMR

PTMGlycoSuiteDBPhosphoSitePhosSite

UniProtKB/Swiss-Prot:129 explicit links

and 14 implicit links!

ProteomicPeptideAtlasPRIDEProMEX

PPIDIPIntAct MINTSTRING

Phylogenomic dbseggNOGGeneTreeHOGENOMHOVERGENInParanoidOMAOrthoDBPhylomeDBProtClustDB

PolymorphismdbSNP

Gene expressionArrayExpressBgeeCleanExGenevestigatorGermOnline

Ontologies GO

Page 40: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

The UniProt web site The UniProt web site www.uniprot.orgwww.uniprot.org

• Powerful search engine, google-like and easy-to-use, but also supports very directed field searches

• Scoring mechanism presenting relevant matches first

• Entry views, search result views and downloads are customizable

• The URL of a result page reflects the query; all pages and queries are bookmarkable, supporting programmatic access

• Search, Blast, Align, Retrieve, ID mapping

Page 41: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Search

A very powerful text search tool with autocompletion and refinement

options allowing to look for UniProt entries and documentation by

biological information

Page 42: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

The search interface guides users with helpful suggestions and hints

Page 43: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 44: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Result pages: highly customizableResult pages: highly customizable

Page 45: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Result pages: downloadableResult pages: downloadable

Page 46: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 47: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

The URL can be bookmarked and

manually modified.

Page 48: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Query: sequence:(database:epo OR database:JPO or database:USPTO)

Page 49: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Blast

A tool associated with the standard options to search

sequences in different UniProt databases and

data sets

Page 50: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Blast: customize the result display

Page 51: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Blast: local alignment

sequence annotation highlighting option

Page 52: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Align

A ClustalW multiple alignment tool with

sequence annotation highlighting option

Page 53: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Align

sequence annotation highlighting option

Page 54: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Retrieve

A UniProt specific tool allowing to retrieve a list of entries in several standard identifiers formats.

You can then query your ‘personal database’ with the UniProt search tool.

Page 55: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Query your own dataset

Page 56: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

ID Mapping

Gives the possibility to get a mapping between different databases for a given

protein

Page 57: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

These identifiers are all pointing to a TP53 (p53) protein sequence !

P04637, NP_000537, NP_001119584.1, NP_001119585.1,

NP_001119584.1, NP_001119584.1, NP_001119584.1,

NP_001119584.1, ENSG00000141510, CCDS11118,

UPI000002ED67, IPI00025087, etc.

Page 58: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 59: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Download

Page 60: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Download UniProt Download UniProt http://www.uniprot.org/downloadshttp://www.uniprot.org/downloads

Page 61: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Do not hesitate to contact us !

[email protected]

Page 62: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

The UniProt ConsortiumThe UniProt ConsortiumSIBIoannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Sebastien Gehant, Elisabeth Gasteiger, Alain Gateau, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne-Lise Veuthey

EBIRolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer, Francesco Fazzini, Alexander Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius Jacobsen, Michael Kleen, Duncan Legge, Wudong Liu, Jie Luo, Sandra Orchard, Samuel Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony Sawford, Harminder Sehra, Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg

PIRCathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen, Pratibha Dubey, Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale, Thanemozhi G. Natarajan, Jules Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh and Jian Zhang

www.uniprot.org

Page 63: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

UniProt is mainly supported by the National National Institutes of Health (NIH)Institutes of Health (NIH) grant 1 U41 HG006104-01. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273-07. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government Swiss Federal Government through the Federal Office of Education and Science and the European CommissionEuropean Commission contracts SLING (226073), Gen2Phen (200754) and MICROME (222886). PIR activities are also supported by the NIH grants 5R01GM080646-04, 3R01GM080646-04S2, 1G08LM010720-01, and 3P20RR016472-09S2, and NSFNSF grant DBI-0850319.

Page 64: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

www.isb-sib.chwww.isb-sib.ch

Page 65: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Thank you for your attentionThank you for your attention

http://education.expasy.org/cours/Turin/

Page 66: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Some pratical issueswww.uniprot.org

• Look for HBB – customize display - multiple alignment

• Look for plant protein sequence similar to human HBB (Blast).

• ID mapping: Several proteins have been identified in a proteomic

experiment. Which GO terms do they share? (GI numbers of the identified proteins: 16130093, 20664033, 1789812, 27574045, 229597766).

Page 67: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Summary

Page 68: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 69: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

A few words on the UniProt ‘complete proteome’

sequence sets…

Page 70: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 71: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

2’747 complete proteomes

Genome completely sequenced

Proteins mapped to the genome

Entries tagged with the KW ‘Complete proteome’

UniProtKB/Swiss-Prot isoform sequences are available in FASTA format only

Fully manually reviewed (e.g. S. cerevisiae)Partially manually reviewed (e.g. Homo sapiens)Unreviewed (e.g. Acinetobacter baumannii (strain

1656-2))

UniProtKB - complete UniProtKB - complete proteomesproteomes

Page 72: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Can be downloaded:

From our complete proteome page www.uniprot.org/taxonomy/complete-proteomes

From the ‘ftp download ‘ page

By querying UniProtKB + download Query: organism:93062 AND keyword:"complete proteome"

UniProtKB - complete UniProtKB - complete proteomesproteomes

Additional information: www.uniprot.org/faq/15

Page 73: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Query UniProtKB + download

Page 74: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics
Page 75: Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics

Human proteome ~ 20’200 genes

Query for ‘homo sapiens’ (August 2011)•UniProtKB: 110,056 entries + alt sequences (~ 15’435) = 125’491•UniProtKB/Swiss-Prot: 20’244 entries + alt sequences (~ 15’435) = 35’679•UniProtKB/TrEMBL: 89,834 entries•RefSeq: 32’898 sequences•Ensembl: 90’720 sequences

Query for ‘homo sapiens’ + Complete proteome (KW-181)•UniProtKB: 56’392 + alt sequences (15’435) = 71’827•UniProtKB/Swiss-Prot: 20’238 + alt sequences (15’435) = 35’673•UniProtKB/TrEMBL: 36’154

92% of human entries are linked with at least one RefSeq entry…