Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.08 Protein functions prediction Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.08 Introduction Signal peptides Transmembrane regions and topology PTM (post-translational modifications) Low complexity and biased regions Repeats Coils Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package
21
Embed
Protein functions prediction - EMBnet node Switzerland
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Protein functions prediction
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Introduction
Signal peptidesTransmembrane regions and topologyPTM (post-translational modifications)Low complexity and biased regionsRepeatsCoils
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Sliding windowTHISISATESTSEQVENCETHATDISPLAYSTHESLIDINGWINDQ W
Score1Score2
Scoren
Width or Size=11, Step=5
Results are usually displayed as a graph, see example ->
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Patterns / regular expression
Pattern: <A-x-[ST](2)-x(0,1)-{V}Regexp: ^A.[ST]{2}.?[^V]Text: The sequence must start with an alanine, followed by any amino acid, followed by a serine or a threonine, two times,followed by any amino acid or nothing,followed by any amino acid except a valine.Simply the syntax differ…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Weight matrices (PSSM)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
HMM / profiles
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Neural Networks
General principle: Example:
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Format USA'asis' :: Sequence[start :end : reverse]
Format :: '@' ListFile[start :end : reverse]
Format ::'list' :ListFile[start :end : reverse]
Format ::Database :Entry [start :end : reverse]
Format ::Database -SearchField: Word[start :end : reverse]
Format :: File: Entry [start :end : reverse]
Format :: File: SearchField: Word[start :end : reverse]
Format ::Program Program-parameters '|' [start :end : reverse]
Example: fasta::Swissprot:UBP5_HUMAN[200:300]
DatabasesAny can be added, use showdb to display the available databases
Some details
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
showdbDisplays information on the currently available databases# Name Type ID Qry All Comment# ==== ==== == === === =======ipr_fetch P OK OK OK InterPro current by fetchipi_fetch P OK OK OK IPI current by fetchrefseq_fetch P OK OK OK refseq current by fetchrepbase_fetch P OK OK OK repbase current by fetchswiss_fetch P OK OK OK SwissProt current by fetchswissprot P OK OK OK SWISSPROT sequencestrembl P OK OK OK TREMBL sequencestrembl_fetch P OK OK OK trembl current by fetchtremblnew P OK OK OK TREMBL New sequencesug_fetch P OK OK OK Unigene by fetchembl N OK OK OK EMBL releaseemhum N OK OK OK EMBL release, Human section by emboss indexemrod N OK OK OK EMBL release, Rodent section by emboss indexemvrt N OK OK OK EMBL release, Vertebrate (nonhuman, nonrodent)
seqret (seqretall, seqretset, seqretsplit)entret (for complete untouched entry, e.g., for unigene, interpro, swissprot…)Possible to define your own « .embossrc » file
databases
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Some tools for DNAredata Search REBASE for enzyme name, references, suppliers etcremap Display a sequence with restriction cut sites, translation etcrestover Finds restriction enzymes that produce a specific overhangrestrict Finds restriction enzyme cleavage sitesshowseq Display a sequence with features, translation etcsilent Silent mutation restriction enzyme scancirdna Draws circular maps of DNA constructs lindna Draws linear maps of DNA constructs revseq Reverse and complement a sequence…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: remapECLAC E.coli lactose operon with lacI,lacZ,lacY and lacA genes.
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Launch Jemboss
First time only…
Each time…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Jemboss windows
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Jemboss windows other systems
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Summary
Anonymous web access through PiseRegistered access through JembossRegistered access through command-line (requires UNIX skills)
Please report problems!
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Exercises
DEA Exercises web based sequence analysisThe goal of this exercise is to use web based tools for protein sequence analysis
a) Take this TrEMBL sequence (Q9X252) and try a BLAST against swissprot with the complete protein orwith the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you.
b) Pass this sequence through PFSCAN and search all databases. Compare with this command onludwig-sun1/2: hits -b "prf pat pfam" tr:Q9X252 c) use the different profile, motifs, pattern databases to get more information about the domain(s) you found.
d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (Q9WZH0)?
List of useful links:basic BLAST or advanced BLAST or PSI-BLAST
TMPred prediction tool for transmembrane regions (or TMHMM)
COILS prediction tool for coiled-coil regions
SignalP prediction tool for signal-peptide cleavage site
Profile, domain, motifs databases and search sites:PFSCAN