Top Banner
Bioinformatics 29 th Feb, 2012 Ayesha Masrur Khan
13

Bioinformatics

Feb 22, 2016

Download

Documents

garth

29 th Feb, 2012. Bioinformatics. Ayesha Masrur Khan. Protein Family and Domains. Once a protein sequence is obtained, there are many questions that can be asked, such as -what is the protein’s overall identity? -what putative functions does it have? -what biological motifs are present? - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bioinformatics

Bioinformatics29th Feb, 2012

Ayesha Masrur Khan

Page 2: Bioinformatics

Lec-5 2

Protein Family and DomainsOnce a protein sequence is obtained, there

are many questions that can be asked, such as

-what is the protein’s overall identity?-what putative functions does it have?-what biological motifs are present?

Different computational tools are needed to determine possible functional domains based on primary sequence data.

Page 3: Bioinformatics

Lec-5 3

Protein Family and Domains (contd.) Therefore, family and domain

databases are used to address the question- ‘what domains are contained within this sequence?’ or ‘what family does this protein belong to?’

BUT first: what are families and domains?

Page 4: Bioinformatics

Lec-5 4

Protein Family and Domains (contd.)

Family---> A family of proteins was originally defined by Dayhoff et.al (1978) as a group of sequences with more than 50% identity when aligned with similar functions. Families are often also characterized by the presence of one or more domains with high sequence similarity.

Domains---> Traditionally known as structurally independent folding units, are conserved functional units that may contain one or more motifs.

Page 5: Bioinformatics

Lec-5 5

Protein Family and Domains (contd.)

Motifs---> These include both short stretches of fixed residue length that act as sites for post translational modifications and longer sequences that form secondary structures for protein-DNA, protein-ion or protein-lipid interactions.

Page 6: Bioinformatics

Lec-5 6

Domain Example: Pyruvate kinase

Quaternary structure: 4 subunits

3 domains

Page 7: Bioinformatics

Lec-5 7

Zinc finger motif: A sequence motif

Three zinc fingers bound spirally in the major groove of a DNA molecule.

The coordination of a zinc atom by characteristically spaced cysteine and histidineresidues in a single zinc finger motif

Sequence motif: A particular amino-acid sequence that is characteristic of a specific biochemical function

Page 8: Bioinformatics

Lec-5 8

Other examples: structural motifs

Another type is the functional motif, which is a sequence or structural motif that isalways associated with a particular biochemical function.

Page 9: Bioinformatics

Lec-5 9

Protein families Protein families are related to one another by

sequence similarity, domain composition, or structure.

These include proteins found across species orthologues) or within the same species (paralogs).

Family descriptors are derived from MSAs (multiple sequence alignments) that enable us to define traits that encompass all member sequences. Family descriptors have been based on sequence identity (>50% identical), common domains (e.g. catalytic binding domains, calcium binding motifs etc.), structure, or a combination of these characteristics.

Page 10: Bioinformatics

Lec-5 10

Protein Domains Domains represent discrete stretches within the

protein, unlike protein families, which are commonly defined over the length of the sequence.

These units are conserved at the level of sequence and structure.

They can be described by: combinations of short regions of highly conserved amino

acids within a domain all amino acids structural features

Domain description is developed in the same way as the family descriptors.

Page 11: Bioinformatics

Lec-5 11

Family-Domain Databases

Because of the reuse of motifs and domains, similarities can be found within sequences that are otherwise unrelated evolutionarily.

Therefore, methods are needed to distinguish between similarities due to random variation and those of common origin or function.

Family-domain databases provide the following benefits:1. Increase sensitivity, i.e. true matches are

detected through MSA2. Increased specificity, i.e. detect only related

proteins3. Classification of protein sequences to

appropriate families

Page 12: Bioinformatics

Lec-5 12

Family-Domain DatabasesSome database referencesName Web-address Description

PROSITE http://www.expasy.ch/prosite Groups of proteins of similar biochemical function on basis of amino acid patterns

Pfam http://www.sanger.ac.uk/Pfam

Profiles derived from alignment of protein families, each one composed of similar sequence

SMART http://smart.emblheidelberg.de/

Genetically mobile domains

InterPro http://www.ebi.ac.uk/interpro

Integrated resource of protein domains and functional sites: combination of Pfam, PRINTS, ProSite, and current SwissProt/TrEMBL sequence.

Page 13: Bioinformatics

Lec-5 13

Searching sequence databases

Search methods engage in a series of sequence alignments to determine degrees of similarity between sequences and then return a list of matched sequences to the user.

Alignment Algorithms

Manually, we examine two or more sequences for similar residue patterns, match up identical residues, decide qualitatively whether they are aligned well, and determine statistically how identical or similar the sequences are.

The automation of this process requires a computer-based method to line sequences up against one another and a scoring method for evaluating the success of the alignment in terms of similarity or identity.