Top Banner
Protein function and classification www.ebi.ac.uk/ interpro Hsin-Yu Chang www.ebi.ac.uk
57
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Protein function and classification  Hsin-Yu Chang .

Protein function and classification

www.ebi.ac.uk/interpro

Hsin-Yu Chang

www.ebi.ac.uk

Page 2: Protein function and classification  Hsin-Yu Chang .

Greider and Balckburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ?

1. Tetrahymena

2. Saccharomyces cerevisiae3. Mouse

4. Human

Page 3: Protein function and classification  Hsin-Yu Chang .

A single Tetrahymena cell has 40,000

telomeres, whereas a human cell only has

92.

1985Discovery of telomerase Greider and Blackburn

1989Telomere hypothesis of

cell senescenceSzostak

1995 Clone hTR1995/1997 Clone hTERT

1997 Telomerase knockout mouse

1998 Ectopic expression of telomerase in normal

fibroblasts and epithelial cells bypasses the Hayflick’s limit

1999/2000…Telomerase/telomere

dysfunctions and cancer

Gilson and Ségal-Bendirdjian, Biochimie, 2010.

Page 4: Protein function and classification  Hsin-Yu Chang .

Therefore, protein classification could help scientists to gain information about protein

functions.

Page 5: Protein function and classification  Hsin-Yu Chang .

In the lab, what do we usually do to analyse protein sequences and find out their functions?

Page 6: Protein function and classification  Hsin-Yu Chang .

• Protein BLAST

• Publications - text books or papers

• UniProt

• PDB

• Specialized protein databases such as SGD, the human

protein atlas, etc.

What I used to do:

Page 7: Protein function and classification  Hsin-Yu Chang .

BLAST it?

Advantages:

• Relatively fast

• User friendly

• Very good at recognising similarity between closely related sequences

Drawbacks:

• sometimes struggle with multi-domain proteins

• less useful for weakly-similar sequences (e.g., divergent homologues)

Page 8: Protein function and classification  Hsin-Yu Chang .

Using BLAST to find clues of protein functions-when it goes well

Page 9: Protein function and classification  Hsin-Yu Chang .

Pairwise alignment of two proteins: CD4 from two closely-related species

Page 10: Protein function and classification  Hsin-Yu Chang .

Using BLAST to find clues of protein functions-when it does not give you much information

Page 11: Protein function and classification  Hsin-Yu Chang .

Using BLAST to find clues of protein functions-when it does not give you much information

Page 12: Protein function and classification  Hsin-Yu Chang .

Because BLAST performs local pairwise alignment, it:

•Cannot encode the information found in an multiple sequence alignment that show you conserved sites.

Page 13: Protein function and classification  Hsin-Yu Chang .

60S acidic ribosomal protein P0: multiple sequence alignment

Using pairwise alignment could miss out on conserved residues

Page 14: Protein function and classification  Hsin-Yu Chang .

An alternative approach: protein signature search

• Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment

• Use these models to infer relationships with the characterised sequences (from which the alignment was constructed)

• This is the approach taken by protein signature databases

Page 15: Protein function and classification  Hsin-Yu Chang .

Three different protein signature approaches

PatternsSingle motif

methods

FingerprintsMultiple motif

methods

Profiles & HMMs

hidden Markov models

Full alignment methods

Page 16: Protein function and classification  Hsin-Yu Chang .

Patterns

Sequence alignment

Motif

Pattern signature

[AC] – x -V- x(4) - {ED}Regular expression

PS00000

Pattern sequences

ALVKLISGAIVHESATCHVRDLSCCPVESTIS

Patterns are usually directed against functional sequence features such as: active sites, binding sites, etc.

Page 17: Protein function and classification  Hsin-Yu Chang .

Patterns

Advantages:

• Can anchor the match to the extremity of a sequence

<M-R-[DE]-x(2,4)-[ALT]-{AM}

• Strict - a pattern with very little variability and forbidden residues can produce highly accurate matches

Drawbacks:

• Simple but less flexible

Page 18: Protein function and classification  Hsin-Yu Chang .

Fingerprints: a multiple motif approach

Sequence alignment

Motif 2 Motif 3Motif 1Define motifs

Fingerprint signature

PR00000

Motif sequencesxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

Weight matrices

Page 19: Protein function and classification  Hsin-Yu Chang .

The significance of motif context

order

interval

• Identify small conserved regions in proteins

• Several motifs characterise family

• Offer improved diagnostic reliability over single motifs by virtue of the biological context provided by motif neighbours

1 2 3

Page 20: Protein function and classification  Hsin-Yu Chang .

• Good at modeling the often small differences between closely related proteins

• Distinguish individual subfamilies within protein families, allowing functional characterisation of sequences at a high level of specificity

Fingerprints

Page 21: Protein function and classification  Hsin-Yu Chang .

Sequence alignment

Entire domain Define coverage

Whole protein

Use entire alignment of domain or protein family xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Build model

Profile or HMM signature

Profiles & HMMs

Page 22: Protein function and classification  Hsin-Yu Chang .

Profiles

Start with a multiple sequence alignment

Amino acids at each position in the alignment are scored according to the frequency

with which they occur

Scores are weighted according to

evolutionary distance using a BLOSUM matrix

• Good at identifying homologues

Page 23: Protein function and classification  Hsin-Yu Chang .

HMMs

Amino acid frequency at each position in the alignment and their transition probabilities

are encoded

Insertions and deletions are also modelled

Start with a multiple sequence alignment

• Very good at identifying evolutionarily distant homologues

• Can model very divergent regions of alignment

Page 24: Protein function and classification  Hsin-Yu Chang .

Three different protein signature approaches

PatternsSingle motif

methods

FingerprintsMultiple motif

methods

Profiles & HMMs

hidden Markov models

Full alignment methods

Page 25: Protein function and classification  Hsin-Yu Chang .

www.ebi.ac.uk/interpro

Page 26: Protein function and classification  Hsin-Yu Chang .

InterPro

The aim of InterPro

Page 27: Protein function and classification  Hsin-Yu Chang .

What is InterPro?

• InterPro is an integrated sequence analysis resource

• It combines predictive models (known as signatures)

from different databases to provide functional analysis of

protein sequences by classifying them into families and

predicting domains and important sites

Page 28: Protein function and classification  Hsin-Yu Chang .

• First release in 1999

• 11 partner databases

• Forms part of the automated system that adds annotation to UniProtKB/TrEMBL

• Provides matches to over 80% of UniProtKB

• Source of >60 million Gene Ontology (GO) mappings to >17 million distinct UniProtKB sequences

• 50,000 unique visitors to the web site per month> 2 million sequences searched online per month. Plus offline searches with downloadable version of software

Facts about InterPro

Page 29: Protein function and classification  Hsin-Yu Chang .

Structuraldomains

Functional annotation of families/domains

Protein features 

(sites)

Hidden Markov Models Finger prints

Profiles Patterns

HAMAP

Page 30: Protein function and classification  Hsin-Yu Chang .

• Signatures are provided by member databases

• They are scanned against the UniProt database to see which

sequences they match

• Curators manually inspect the matches before integrating the

signatures into InterPro

InterPro signature integration process

Signatures representing the same entity are integrated together

Relationships between entries are traced, where possible

Curators add literature referenced abstracts, cross-refs to other databases, and GO terms

Page 31: Protein function and classification  Hsin-Yu Chang .

http://www.ebi.ac.uk/interpro/

Page 32: Protein function and classification  Hsin-Yu Chang .

Search using protein sequences

Page 33: Protein function and classification  Hsin-Yu Chang .

Family

Page 34: Protein function and classification  Hsin-Yu Chang .

Type

Page 35: Protein function and classification  Hsin-Yu Chang .

InterPro entry types

Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure

Family

Distinct functional, structural or sequence units that may exist in a variety of biological contextsDomain

Short sequences typically repeated within a proteinRepeats

PTM Active Site

Binding Site

Conserved Site

Sites

Page 36: Protein function and classification  Hsin-Yu Chang .

TypeName Identifier Contributing

signatures

Description

GO terms

References

Page 37: Protein function and classification  Hsin-Yu Chang .
Page 38: Protein function and classification  Hsin-Yu Chang .
Page 39: Protein function and classification  Hsin-Yu Chang .
Page 40: Protein function and classification  Hsin-Yu Chang .
Page 41: Protein function and classification  Hsin-Yu Chang .

TypeName Identifier Contributing

signatures

Description

References

Relationships

Page 42: Protein function and classification  Hsin-Yu Chang .

InterPro family and domain relationships

Page 43: Protein function and classification  Hsin-Yu Chang .

Family relationships in InterPro:

Interleukin-15/Interleukin-21 family

Interleukin-15

Interleukin-15avian

Interleukin-15fish

Interleukin-15mammal

Page 44: Protein function and classification  Hsin-Yu Chang .

Relationships

Page 45: Protein function and classification  Hsin-Yu Chang .

InterPro relationships: domains

Protein kinase-like domain

Protein kinase catalytic domain

Serine/threoninekinase catalytic

domain

Tyrosinekinase catalytic

domain

Page 46: Protein function and classification  Hsin-Yu Chang .

A brief diversion into the Gene Ontology...

Page 47: Protein function and classification  Hsin-Yu Chang .
Page 48: Protein function and classification  Hsin-Yu Chang .

Gene Ontology

• Allow cross-species and/or cross-database comparisons

• Unify the representation of gene and gene product attributes across species

Page 49: Protein function and classification  Hsin-Yu Chang .

• A way to capture biological knowledge in a written and computable form

The Gene Ontology

• A set of concepts and their relationships to each other arrangedas a hierarchy

www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

Page 50: Protein function and classification  Hsin-Yu Chang .

The Concepts in GO

1. Molecular Function

2. Biological Process

3. Cellular Component

• protein kinase activity• insulin receptor activity

• Cell cycle• Microtubule cytoskeleton organisation

Page 51: Protein function and classification  Hsin-Yu Chang .

GO:0006955 Immune responseGO:0016020 membrane

Page 52: Protein function and classification  Hsin-Yu Chang .

Summary

Its member databases all have their particular niche or focus......but InterPro offers a combination of all their areas of expertise!

InterPro is a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites

It uses protein signatures based on different methodologies from different member databases

Page 53: Protein function and classification  Hsin-Yu Chang .

Why use InterPro?

• Large amounts of manually curated data

• 35,634 signatures integrated into 25,214 entries

• Cites 38,877 PubMed publications

• Large coverage of protein sequence space

• Regularly updated

• ~ 8 week release schedule

• New signatures added

• Scanned against latest version of UniProtKB

Page 54: Protein function and classification  Hsin-Yu Chang .

Caution

We need your feedback!missing/additional referencesreporting problemsrequests

• InterPro is a predictive protein signature database - results are predictions, and should be treated as such

• InterPro entries are based on signatures supplied to us by our member databases

....this means no signature, no entry!

EBI support page.

And one more thing…..

Page 55: Protein function and classification  Hsin-Yu Chang .

The InterPro Team:

Amaia Sangrador

Craig McAnulla

MatthewFraser

Maxim Scheremetjew

Siew-Yit Yong

Alex Mitchell

Sebastien Pesseat

SarahHunter

GiftNuka

Hsin-YuChang

LouiseDaugherty

Page 56: Protein function and classification  Hsin-Yu Chang .

Database Basis Institution Built from Focus URL

Pfam HMM Sanger Institute Sequence alignment

Family & Domain based on conserved sequence

http://pfam.sanger.ac.uk/

Gene3D HMM UCL Structure alignment

Structural Domainhttp://gene3d.biochem.ucl.ac.uk/Gene3D/

Superfamily HMM Uni. of Bristol Structure alignment

Evolutionary domain relationships

http://supfam.cs.bris.ac.uk/SUPERFAMILY/

SMART HMM EMBL Heidelberg Sequence alignment

Functional domain annotation

http://smart.embl-heidelberg.de/

TIGRFAM HMM J. Craig Venter Inst. Sequence alignment

Microbial Functional Family Classification

http://www.jcvi.org/cms/research/projects/tigrfams/overview/

Panther HMM Uni. S. California Sequence alignment

Family functional classification

http://www.pantherdb.org/

PIRSF HMM PIR, Georgetown, Washington D.C.

Sequence alignment

Functional classification

http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml

PRINTS Fingerprints Uni. of Manchester Sequence alignment

Family functional classification

http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php

PROSITE Patterns & Profiles SIB Sequence

alignmentFunctional annotation

http://expasy.org/prosite/

HAMAP Profiles SIB Sequence alignment

Microbial protein family classification

http://expasy.org/sprot/hamap/

ProDom Sequence clustering

PRABI : Rhône-Alpes Bioinformatics Center

Sequence alignment

Conserved domain prediction

http://prodom.prabi.fr/prodom/current/html/home.php

Page 57: Protein function and classification  Hsin-Yu Chang .

Thank you!

www.ebi.ac.uk

Twitter: @emblebi

Facebook: EMBLEBI

YouTube: EMBLMedia