Top Banner
Introduction to Bioinformatics English Courses for Graduate Students Introduction to Bioinformatics English Courses for Graduate Students http://1.51.212.243/bioinfo.html Dr. rer. nat. Jing Gong Cancer Research Center School of Medicine, Shandong University 2011.10.12
151

Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Aug 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://1.51.212.243/bioinfo.html

Dr. rer. nat. Jing Gong

Cancer Research Center

School of Medicine, Shandong University

2011.10.12

Page 2: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Chapter 3

Alignment

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 3: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Similarity Searches on Sequence DatabasesIn the game of Mahjong Titans, you want to find the same symbol from a collection of symbols for a certain one. What you can do is to compare the symbol with every one, with your eyes.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 4: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Similarity Searches on Sequence databasesFor a protein or DNA sequence, it means finding a similar one from a collection of sequences. It is impossible to compare every pair in the biological databases with your eyes, because there are too many sequences.

Introduction to BioinformaticsEnglish Courses for Graduate Students

…… > 100,000

BLAST

Page 5: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The Importance of SimilaritySimilar sequences often derive from a common ancestral sequence. They probably share similar structure and biological function. You can infer something you know about a particular DNA or protein sequence to all similar DNA or protein sequences.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Similar sequences

Similar structures Similar functions

Page 6: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The Importance of SimilaritySimilar sequences often derive from a common ancestral sequence. They probably share similar structure and biological function. You can infer something you know about a particular DNA or protein sequence to all similar DNA or protein sequences.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Similar structure? Similar function? Brothers?

Page 7: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Identity and SimilarityResidue: a letter; an amino acid in a protein; a base in a nucleotide.

Identity: If two sequences (protein or DNA) have the same length, the identity between them is defined as the percent of identical residues relative to their length.

Similarity: If two sequences (protein or DNA) have the same length, the similarity between them is defined as the percent of similar residues relative to their length. Who and who are similar, who and who not? They are defined by a matrix, such as BLOSUM.

Introduction to BioinformaticsEnglish Courses for Graduate Students

My name is Lampy.

Page 8: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Identity and SimilarityResidue: a letter; an amino acid in a protein; a base in a DNA.

Identity: If two sequences (protein or DNA) have the same length, the identity between them is defined as the percent of identical residues relative to their length.

Similarity: If two sequences (protein or DNA) have the same length, the similarity between them is defined as the percent of similar residues relative to their length. Who and who are similar, who and who not? They are defined by a matrix, such as BLOSUM.

Introduction to BioinformaticsEnglish Courses for Graduate Students

seq 1 : CLHKseq 2 : CIHL

Identity = 2/4 = 50%

Similarity = 3/4 = 75%

Page 9: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Identity and SimilarityResidue: a letter; an amino acid in a protein; a base in a DNA.

Identity: If two sequences (protein or DNA) have the same length, the identity between them is defined as the percent of identical residues relative to their length.

Similarity: If two sequences (protein or DNA) have the same length, the similarity between them is defined as the percent of similar residues relative to their length. Who and who are similar, who and who not? They are defined by a matrix, such as BLOSUM.

What happens when two sequences have different lengths?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Identity? Similarity?

seq 1 : CLHKAseq 2 : CIHL

Page 10: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Identity and SimilarityHomologous: In general, if two protein sequences have an identity of 25%, or two DNA sequences have an identity of 70%, they can be regarded as homologous. However,

Nothing is sure about the meaning of observed similarity. Some protein sequences are less than 15% identical, but they have the same 3D structure, while some are 25% identical, but they have different structures.

Homology or non-homology is never granted. The 25% cutoff is mostly a common-sense indicator. In most cases, to make sure whether two sequences are true homologous, you need to consider many other things.

Homology is a binary relationship: yes or no; similarity is a quantifiable property: 0%-100%.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 11: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The Most Popular Search Tool: BLASTBLAST (Basic Local Alignment Search Tool) – A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query.

Different kinds of BLAST:

BLASTn: Search a nucleotide database using a nucleotide query.

BLASTp: Search protein database using a protein query.

BLASTx: Search protein database using a translated nucleotide query.

tBLASTn: Search translated nucleotide database using a protein query.

tBLASTx: Search translated nucleotide database using a translated nucleotide query.

Translated nucleotide: A nucleotide sequence translated into six proteins according to the six open reading frames (ORF, in prokaryotes).

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 12: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

= x 6 reading frames

Reading Frame - breaking a DNA sequence into three letter codonswhich can be translated in amino acids.

x 3 x 3

ATG Met (M)

TAA TAG TGA

ORF (Open Reading Frame) - a DNA sequence that contains a start codon but does not contain a stop codon in a given reading frame.

ORF

Introduction to BioinformaticsEnglish Courses for Graduate Students

Nucleotide DatabasesNucleotide DatabasesReading into Genes and Genomes

Page 13: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://www.ncbi.nlm.nih.gov/

The Most Popular Search Tool: BLASTThe NCBI BLAST server http://www.ncbi.nlm.nih.gov/

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

Page 14: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

Page 15: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

blast.fastablast.fasta

Page 16: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://1.51.212.243/bioinfo.html

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

Page 17: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

query only a part of your sequence

give a name to your job

blast.fasta

BLAST another sequence at the same time

Page 18: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

select in which database you want to search

Page 19: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

type which species you want to search, e.g. human

select algorithm

Page 20: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

Part 1 : a brief summary

Page 21: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server

Part 1 : a brief summary

This figure illustrates the sequence length and classification of the input protein.

Part 2 : graphic summary

an overview of similar sequences

……

Page 22: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

Part 3 : descriptions

……go to the corresponding database entry

go to the alignment between your query sequence and the matching sequence

Page 23: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://blast.ncbi.nlm.nih.gov

The Most Popular Search Tool: BLASTThe NCBI BLAST server : BLASTp

Part 4 : Alignment

Page 24: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Upgraded BLAST: PSI-BLASTSometimes BLAST is not enough. For instance, you want to catch all the members of a very large protein family, starting with one sequence that you have. When running BLAST, you catch only the most closely related sequences. The other distant members would not be found. In other words, you find your direct friends, but the friends of your friends are missed.

PSI (Position-Specific Iterated)-BLAST first looks for sequences that are closely related to yours; and then, gradually, it extends the circle of friends to include sequences that are distantly related.

- How does PSI-BLAST extend the circle of friends?

- A Position-Specific Weight Matrix and Iterations.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 25: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Position-Specific Weight Matrix

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: A B C DSeq2: B B C DSeq3: A C C DSeq4: A B D D

1 2 3 4A 75% 0 0 0B 25% 75% 0 0C 0 25% 75% 0D 0 0 25% 100%

A Position-Specific Weight Matrix describes the letter distribution of each position (column) for a family of sequences. The distributions can be presented as probabilities or other statistic values.

Page 26: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Upgraded BLAST: PSI-BLASTThe first round of search (first iteration) of PSI-BLAST is just like BLAST. All closely related sequences BBCD, ACCD and ABDD that have one different letter are found for the query sequence ABCD, but BCCD that has two different residues is missed.

Then, a Position-Specific Weight Matrix is made for ABCD, BBCD, ACCD and ABDD. This matrix is used in the second round of search (second iteration). Since BCCD matches the matrix, now it is found. And then, a second matrix is made for ABCD, BBCD, ACCD, ABDD and BCCD. And further new sequences will be found. …… Iterations ……

PSI-BLAST can detect distant evolutionary relationships, especially when the proteins returned by the first round of search are all hypothetical proteins, unknown proteins or predicted proteins.

BACD ……BBCD BBAD ……

BBCA BCADBCCD BCBD

ABCD ACCD ACBD BCDDACCB ……CBDD ……

ABDD ACDD ……ABDC ……

Page 27: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Upgraded BLAST: PSI-BLASTThe NCBI BLAST server : PSI-BLAST http://blast.ncbi.nlm.nih.gov

Page 28: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Upgraded BLAST: PSI-BLASTThe NCBI BLAST server : PSI-BLAST http://blast.ncbi.nlm.nih.govThe NCBI BLAST server : PSI-BLAST http://blast.ncbi.nlm.nih.gov

Page 29: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Upgraded BLAST: PSI-BLASTThe NCBI BLAST server : PSI-BLAST http://blast.ncbi.nlm.nih.gov

Page 30: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Upgraded BLAST: PHI-BLASTPHI (Pattern-Hit Initiated)-BLAST: in every round of BLAST (iteration), you are required to give a sequence pattern to filter the results. Only the BLAST results that match the pattern are regarded as results.

Sequence pattern:

[LIVMF]-G-E-x-[GAS]-[LIVM]-x(3,7)

Yes: VGEAAMPRINo: VGEAAYPRI

PHI-BLAST can find very exact “friends”.

Page 31: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Upgraded BLAST: PHI-BLASTThe NCBI BLAST server : PHI-BLAST http://blast.ncbi.nlm.nih.gov

Page 32: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

The Most Popular Search Tool: BLAST

BLAST

PSI-BLAST

PHI-BLAST

Query

Page 33: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Similarity Searches for Free over the Internet

http://blast.ddbj.nig.ac.jpDDBJJapanhttp://www.ebi.ac.uk/Tools/sssEBIEurope

http://www.ncbi.nlm.nih.gov/BLASTNCBIUSAhttp://web.expasy.org/blastExPASyEurope

URLServerLocationBLAST Servers around the World

WU-BLAST - WU stands for Washington University. More sensitive and more gifted at inserting gaps than NCBI-BLAST.Smith and Waterman (SSEARCH): It’s slower, but more accurate than BLAST.FASTA: It’s a bit slower than BLAST but more accurate when making DNA comparisons.BLAT: Use this for locating cDNA rapidly in a genome or finding close (mammalian vs. mammalian) proteins in a genome.

Page 34: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two Sequencescan help you to …

Convince yourself that two sequences are in fact homologous;Find out that your sequences share a domain;Identify the exact location of common features, such as disulfide bridgesor catalytic active sites.

Domain: a structural and functional unit in a protein.

Introduction to BioinformaticsEnglish Courses for Graduate Students

single-domain protein multiple-domain protein

Page 35: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesMethods: dot plot, global/local alignment

Dot plot is the simplest means of comparing two sequences. In fact, dot plot is the only type you can do with pencil and paper, without computer. Advantages: no biological hypothesis required; results can be analyzed with your eyes.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: THEFASTCAT

Seq2: THEFATCAT

T H E F A S T C A TT x x xH xE xF xA x xT x x xC xA x xT x x x

length(seq1) = 10length(seq2) = 910 x 9 = 90 comparisons

Page 36: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: THEFASTCAT

Seq2: THEFATCAT

T H E F A S T C A TT x x xH xE xF xA x xT x x xC xA x xT x x x

Comparing Two SequencesThe diagonals indicate the segments of similarity between the two sequences.

1. THEFA2. TCAT3. AT

Seq 1

Seq2

Page 37: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesYou can also do dot plot for one sequence to discover repeated subsequences hidden in it.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: THEFASTHET H E F A S T H E

T x xH x xE x xF xA xS xT x xH x xE x x

Page 38: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two Sequences

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://arbl.cvmbs.colostate.edu/molkit/dnadotDnadot

http://sonnhammer.sbc.su.se/Dotter.htmlDotterhttp://emboss.sourceforge.netDottup

http://myhits.isb-sib.ch/cgi-bin/dotletDotletURLName

Dot plot servers

Page 39: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesDotlet servers http://myhits.isb-sib.ch/cgi-bin/dotlet

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 40: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

dotlet.fasta

seq1

The Sequence Input Dialog

Comparing Two SequencesDotlet servers http://myhits.isb-sib.ch/cgi-bin/dotlet

Page 41: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

window size zoom

The dots window will display the diagonal plot.

Histogram window defines the grayscale

alignment window

Comparing Two SequencesDotlet servers http://myhits.isb-sib.ch/cgi-bin/dotlet

Page 42: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesUse Dot Plot to detect tandem repeats in a sequence.

Tandem repeat: two or more repeated units directly adjacent to each other.

Example: CCCABCABCABCDDD

They are often used by evolution to create new proteins or make them function more efficiently.

Short Tandem Repeat (STR) in DNA describes a pattern that helps determine an individual's inherited traits. A short tandem repeat polymorphism (STRP) occurs when homologous STR loci differ in the number of repeats between individuals. By identifying repeats of a specific sequence at specific locations in the genome, it is possible to create a genetic profile of an individual. There are currently over 10,000 published STR sequences in the human genome. STR analysis has become the prevalent analysis method for determining genetic profiles in forensic cases.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 43: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesUse Dot Plot to detect tandem repeats in a sequence.

Tandem repeats: two or more repeated units directly adjacent to each other.

Example: CCCABCABCABCDDD

Introduction to BioinformaticsEnglish Courses for Graduate Students

C C C A B C A B C A B C D D DC x C x C x A x x xB x x xC x x xA x x xB x x xC x x xA x x xB x x xC x x xD xD xD x

Page 44: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesUse Dot Plot to detect tandem repeats in a sequence.

Introduction to BioinformaticsEnglish Courses for Graduate Students

tandem.fasta

Page 45: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesUse Dot Plot to detect tandem repeats in a sequence.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 46: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Comparing Two SequencesUse Dot Plot to detect tandem repeats in a sequence.

Introduction to BioinformaticsEnglish Courses for Graduate Students

1. The number of repeats is equal to the number of diagonals including the main diagonal.

2. The distance between two adjacent diagonals represents the length of the repeat.

3. The shortest diagonal gives you a single repeat unit.

Page 47: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

AlignmentAn alignment is an arrangement of two protein or DNA sequences to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Gaps are inserted between the residues so that identical or similar characters are aligned in the same columns.

Global alignment is most useful when the two sequences are similar and of roughly equal size.

Local alignment is more useful for dissimilar sequences that are suspected to contain segments of similarity.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 48: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

AlignmentA substitution matrix BLOSUM62 gives a score for every pair of amino acids, defining “what is similar” and “how similar”.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 49: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

AlignmentUsages of global alignment:

Checking minor differences between two sequences. This may happen with data that you’ve manipulated and possibly altered. The global alignment is the best way to localize potential problems.

Analyzing polymorphisms (for example, SNPs) between closely related sequences.

Comparing two sequences that partly overlap. In that case, you want to make a global pairwise comparison that doesn’t penalize misalignments at the extremities of the sequences.

Usages of local alignment:

Comparing two distantly related sequences that share only a few noncontiguous domains.

Analyzing repeated elements within a single sequence.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 50: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Global Alignment

Introduction to BioinformaticsEnglish Courses for Graduate Students

How is a global alignment generated?

Input:

Seq1: PYMNVI

Seq2: PYELF

substitution matrix (BLOSUM62)

gap penalty (-1 by default ): The score of an arbitrary residue vs. another arbitrary residue is given in the substitution matrix; a gap penalty gives the score of an arbitrary residue vs. a gap.

Output:

PYMNVI PYMNVIPY-ELF or PYE-LF or … ?** :. ** :.

Page 51: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: PYELF

Step 1

IVNMYP

F

L

E

Y

P

Global Alignment

Page 52: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 2

IVNMYP

F

L

E

Y

P

-5

-4

-3

-2

-1

-6-5-4-3-2-10

Global Alignment

Seq1: PYMNVISeq2: PYELF

Page 53: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 3

IVNMYP

F

L

E

Y

P

-5

-4

-3

-2

7-1

-6-5-4-3-2-10S(i, j) =

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

max

Global Alignment

Seq1: PYMNVISeq2: PYELF

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 54: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

P

P7-1

-10

S(i, j) =

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

max

S(1, 1) =S(0, 0) + m(s11, s21) = 0 + 7 = 7S(1, 0) + gap = -1 + (-1) = -2S(0, 1) + gap = -1 + (-1) = -2

max

Global Alignment

Seq1: PYMNVISeq2: PYELF

S(0, 0) S(0, 1)

S(1, 0) S(1, 1)

Seq 1

Seq

2

S(0, 0) + m(s11, s21) = 0 + 7 = 7S(1, 0) + gap = -1 + (-1) = -2S(0, 1) + gap = -1 + (-1) = -2

maxS(1, 1) =S(0, 0) + m(s11, s21) = 0 + 7 = 7S(1, 0) + gap = -1 + (-1) = -2S(0, 1) + gap = -1 + (-1) = -2

max

Page 55: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 4

IVNMYP

F

L

E

Y

P

-5

-4

-3

-2

7-1

-6-5-4-3-2-10S(i, j) =max

Global Alignment

Seq1: PYMNVISeq2: PYELF

S(0, 1) = maxØS(0, 0) + gap = 0 + (-1) = -1Ø

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 56: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 5

IVNMYP

F

L

E

Y

P

-5

-4

-3

-2

234567-1

-6-5-4-3-2-10S(i, j) =max

Global Alignment

Seq1: PYMNVISeq2: PYELF

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 57: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 6

IVNMYP

F

L

E

Y

P

14131314113-5

14141415124-4

11121312135-3

10111213146-2

234567-1

-6-5-4-3-2-10S(i, j) =max

Global Alignment

Seq1: PYMNVISeq2: PYELF

S(3, 3) = maxS(2, 2) + m(s13, s23) = 14+(-2) = 12S(3, 2) + gap = 13 + (-1) = 12S(2, 3) + gap = 13 + (-1) = 12

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 58: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 7

IVNMYP

F

L

E

Y

P

14131314113-5

14141415124-4

11121312135-3

10111213146-2

234567-1

-6-5-4-3-2-10S(i, j) =max

Global Alignment

Seq1: PYMNVISeq2: PYELF

S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 59: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Step 8IVNMYP

F

L

E

Y

P

14131314113-5

14141415124-4

11121312135-3

10111213146-2

234567-1

-6-5-4-3-2-10

seq1 PYMNVIseq2 PY-ELF

** :.

There is at less one path from the bottom-right to the top-left!

Global Alignment

Page 60: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Identity and SimilarityResidue: a letter; an amino acid in a protein; a base in a DNA.

Identity: If two sequences (protein or DNA) have the same length, the identity between them is defined as the percent of identical residues relative to their length.

Similarity: If two sequences (protein or DNA) have the same length, the similarity between them is defined as the percent of similar residues relative to their length. Who and who are similar, who and who not? They are defined by a matrix, such as BLOSUM.

What happens when two sequences have different lengths?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Identity? Similarity?

seq 1 : CVHKAseq 2 : CIHL

So far, we can define them for sequences with different lengths with the help of global alignment.

Page 61: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Redefinition of Identity and SimilarityIdentity: The identity between two sequences is defined as the percent ofidentical residues in their global alignment.

Similarity: The similarity between two sequences is defined as the percent of similar residues in their global alignment.

Introduction to BioinformaticsEnglish Courses for Graduate Students

PYMNVIPY-ELF** :.

Identity = 2 / 6 = 33.3%

Similarity = 4 / 6 = 66.7%

Page 62: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Local Alignment

Introduction to BioinformaticsEnglish Courses for Graduate Students

How is a local alignment generated?

Input:

Seq1: PYMNVI

Seq2: MN

substitution matrix (BLOSUM62)

gap penalty (-1 by default ): The score of an arbitrary residue vs. another arbitrary residue is given in the substitution matrix; a gap penalty gives the score of an arbitrary residue vs. a gap.

Output:

PYMNVI MN--MN-- or MN** **

Page 63: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 1

IVNMYP

N

M

Local Alignment

Page 64: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 2

IVNMYP

N

M

0

0

0000000

Local Alignment

Page 65: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 3

IVNMYP

N

M

0

00

0000000

Local Alignment

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

S(i, j) = max

Page 66: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN S(1, 1) = max

0S(0, 0) + m(s11, s21) = 0 + (-2) = -2S(1, 0) + gap = 0 + (-1) = -1S(0, 1) + gap = 0 + (-1) = -1

Local Alignment

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

P

M00

00

S(0, 0) S(0, 1)

S(1, 0) S(1, 1)

Seq 1

Seq

2

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 67: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 4

Local Alignment

IVNMYP

N

M

0

5000

0000000

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 68: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 5

Local Alignment

IVNMYP

N

M

910114000

2345000

0000000

S(i, j) = max

0S(i-1, j-1) + m(s1i, s2j)S(i, j-1) + gapS(i-1, j) + gap

Page 69: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 6

Local Alignment

IVNMYP

N

M

910114000

2345000

0000000

Find the maximum of the two borders.

Page 70: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 7

Local Alignment

IVNMYP

N

M

910114000

2345000

0000000

Trace back until reach 0.

Page 71: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Seq1: PYMNVISeq2: MN

Step 8

Local Alignment

IVNMYP

N

M

910114000

2345000

0000000Result: MNMN**

Page 72: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Making Global Alignment Over the InternetBLAST is an abbreviation of Basic Local Alignment Search Tool.

In a BLAST search, how does the most similar sequence found? Is the query sequence aligned to each sequence of the entire database?

–No. A BLAST search among 100,000 sequences needs less than 2 minutes, while calculation of 100,000 alignments needs more than 10,000 minutes.

BLAST uses a heuristic algorithm:

Introduction to BioinformaticsEnglish Courses for Graduate Students

What you need know is just how to use BLAST online.

Page 73: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 74: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Page 75: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Introduction to BioinformaticsEnglish Courses for Graduate Students

global.fastaglobal.fastaglobal.fastaglobal.fastaglobal.fasta

Page 76: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Page 77: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Page 78: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 79: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

small Gap Open +

large Gap Extend

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Page 80: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Introduction to BioinformaticsEnglish Courses for Graduate Students

small Gap Open +

large Gap Extend=

dispersive gaps in alignment

Page 81: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Introduction to BioinformaticsEnglish Courses for Graduate Students

large Gap Open +

small Gap Extend=

concentrative gaps in alignment

Page 82: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

adjust the gap openand gap extend

according to your expectation

Gap Open Gap Extend

Making Global Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_needle/

Page 83: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Making Local Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_water

Page 84: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Making Local Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_water/

local.fastalocal.fasta

Page 85: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

>Seq1SEQUENCEMHHHHHHSSGVDLGTENLYFQSMKTTQEQLKRNVRFHAFISYSEHDSLWVKNELIPNLEKEDGSILICLYESYFDPGKSISENIVSFIEKSYKSIFVLSPNFVQNEWCHYEFYFAHHNLFHENSDHIILILLEPIPFYCIPTRYHKLKALLEKKAYLEWPKDRRKCGLFWANLRAAIN>Seq2GTENLYFQSMKTTQEQLKRNVRFHAFISYSEHDSLWVKNELIPNLEKEDGSILICLYESYFDPGKEWCHYEFYFAHHNLFHENSDHIILILLEPIPFYCIPTRAAAAAAAAAAA

Introduction to BioinformaticsEnglish Courses for Graduate Students

Making Local Alignment Over the InternetEMBL Alignment Tool: http://www.ebi.ac.uk/Tools/psa/emboss_water/

Page 86: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Different between Global and Local Alignments

Introduction to BioinformaticsEnglish Courses for Graduate Students

Global alignment

Length: 186Identity: 103/186 (55.4%)Similarity: 103/186 (55.4%)

Local alignment

Length: 130Identity: 103/130 (79.2%)Similarity: 103/130 (79.2%)

Page 87: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Free Pairwise Alignment over the Internet

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://www.bioinfo.mpg.de/AlignMe/AlignMe.html

Alignment of Membrane Proteins

AlignMe

http://pir.georgetown.edu/pirwww/search/pairwise.shtml

GlobalPIR

http://homepages.ed.ac.uk/eang33/mcalign/mcinstructions.html

alignment of non-coding DNA sequences

MCALIGN

http://lagan.stanford.edu/lagan_web/index.shtml

GlobalLAGAN

http://www.ebi.ac.uk/Tools/psaGlobal/LocalEMBL

http://www.ch.embnet.org/software/LALIGN_form.html

Global/LocalLalign

URLAlignment TypeNameOnline Pairwise Alignment Programs

Page 88: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Multiple Sequence AlignmentA multiple sequence alignment (MSA) is a global sequence alignment of three or more sequences.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 89: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Multiple Sequence Alignment

Introduction to BioinformaticsEnglish Courses for Graduate Students

4 main criteria for building a multiple sequence alignment :

• Structural similarity - Amino acids that play the same role in each structure are expected in the same column. This is very difficult; only structure-superposition programs can satisfy this criterion.

• Evolutionary similarity - Amino acids in the common ancestor of all the sequences are put in the same column. Indeed, no automatic program exactly uses this criterion, but they all try to respect it.

• Functional similarity - Amino acids with the same function are in the same column. Also, no automatic program exactly uses this criterion, but if the information is available, you can edit your alignmentmanually.

• Sequence similarity - Amino acids in the same column are those that yield an alignment with maximum similarity. Most programs take this, because it is the easiest criterion.

Page 90: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Multiple Sequence AlignmentMain applications of MSA:

1. Extrapolation: whether an uncharacterized sequence is really a member of a protein family.

2. Phylogenetic analysis: the phylogenetic tree of aligned sequences can be reconstructed.

3. Pattern identification: very conserved positions with a certain function can be sent to generate sequence pattern or sequence logo.

4. Domain identification: to turn an MSA into a profile (position-specific weight matrix) that describes a protein domain.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 91: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Multiple Sequence AlignmentMain applications of MSA:

5. DNA regulatory elements: to turn a DNA MSA of a binding site into a profile and scan other DNA sequences for potential binding sites.

6. Structure prediction: to predict protein/RNA secondary structures by similarity.

7. nsSNP analysis: MSA can help you predict whether a non-synonymous single-nucleotide polymorphism is likely to be harmful.

8. PCR analysis: a good multiple alignment can help you identify the less degenerated portions of a protein family, in order to fish out new members by PCR (polymerase chain reaction).

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 92: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Choosing the Right SequencesMSA is not for an arbitrary group of sequences. Instead, the sequences should be members of the same protein family, and they all share a common ancestor.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Page 93: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

e.g. This_is_my_favorite_sequence_about_mouse

Choosing the Right SequencesNaming sequences in the right way:

Never use white spaces in your sequence names. Use the underline (_) to replace spaces.

Do not use special symbols. (such as Chinese symbols, @, #, &, ^ etc.).

Never use names longer than 15 characters.

Never give the same name to two different sequences in your set.

If you don’t obey these naming rules, some MSA programs may automatically change the name of your sequences, without the courtesy of telling you.

e.g. 我的序列壹 [email protected]

Introduction to BioinformaticsEnglish Courses for Graduate Students

e.g. My Seq 1 My_Seq_1

Page 94: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Choosing the Right SequencesChoosing the right number of sequences:

start with a relatively small number of sequence (10-15)

increase its size, after you get something interesting happening with this small set.

In any case, it’s hard to see any reason for generating a MSA with > 50 sequences.

If you start with hundreds of sequences, you immediately hit troubles:

Introduction to BioinformaticsEnglish Courses for Graduate Students

Computing big alignments is difficult.

Building big alignments is difficult.

Displaying big alignments is difficult.

Using big alignments is difficult.

Making accurate big alignments is difficult.

Page 95: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Before you start making multiple sequence alignments, you must know that none of the methods available today is perfect. They all use approximations.

IVNMYP

F

L

E

Y

P

14131314113-5

14141415124-4

11121312135-3

10111213146-2

234567-1

-6-5-4-3-2-10

3 sequences = 3D

seq1

seq2 seq2seq1

seq3

2 sequences = 2D n sequences = nD

Page 96: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

ClustalW - the most commonly usedMSA package.

Tcoffee - one of the latest MSA packages that you can use.

MUSCLE - one of the fastest alignment methods around.

Page 97: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

ClustalW is the latest of the Clustal software series. Clustal was the first multiple sequence alignment program. These days, with more than 35,000 citations, ClustalW is one of the most widely cited scientific publications in the history of biology.

ClustalW uses a progressive algorithm. This means that it adds sequences one by one, instead of aligning all the sequences at the same time.

Page 98: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://clustalw.ddbj.nig.ac.jp/top-j.htmlJapanDDBJhttp://bips.u-strasbg.fr/fr/Documentation/ClustalW

EuropeStrasbourg

http://www.genome.jp/tools/clustalwJapanGenomeNet

http://pir.georgetown.edu/pirwww/search/multialn.shtml

USAPIR

http://searchlauncher.bcm.tmc.edu/multi-align/Options/clustalw.html

USABCM

http://www.ebi.ac.uk/Tools/msa/clustalw2EuropeEBI

http://www.ch.embnet.org/software/ClustalW.html

EuropeEMBnet

URLLocationNameA List of ClustalW Servers

Page 99: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk

Page 100: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

msa.fasta

Human TLR1-10’s TIR domains

msa.fasta

Human TLR1-10’s TIR domains

msa.fasta

Page 101: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 102: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 103: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

The sequences in the alignment are sorted by the pairwise identity.

Page 104: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Red:hydrophobic

Blue:Acidic

Magenta:Basic

Green:Hydroxyl + Amine + Basic

Gray:Others

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 105: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

(*) A star indicates an entirely conserved column.

(:) A double-dot indicates columns where all the residues have roughly the same size and the same hydropathy.

(.) A single-dot indicates columns where the size or the hydropathy has been preserved in the course of evolution.

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 106: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 107: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

EMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 108: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Tcoffee is a recent method developed for conducting multiple sequence alignments. It uses a principle that’s a bit similar to ClustalW, but it yields more accurate alignments at the cost of a slightly longer running time. Tcoffeebuilds a progressive alignment like ClustalW, but it compares segments across the entire sequence set.

Home page : http://www.tcoffee.org

http://tcoffee.crg.cat

Page 109: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

http://www.es.embnet.org/Services/MolBio/t-coffeeEMBnethttp://cbsuapps.tc.cornell.edu/t_coffee.aspxCBSU

http://www.ebi.ac.uk/Tools/msa/tcoffeeEBI

http://toolkit.tuebingen.mpg.de/t_coffeeMax-Planck

http://tcoffee.vital-it.chSIB

http://www.igs.cnrs-mrs.fr/Tcoffee/tcoffee_cgi/ index.cgiCNRS

URLNameT-Coffee Mirror sites

Page 110: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Incorporate all the available structural information in your alignment. Will produce the best sequence alignments if the structures are available.

EXPRESSO

Evaluate the reliability of an existing multiple alignmentCORERun any requested Multiple sequence Alignment package and combine all the output into one final alignment.

MCOFFEE

Produce a multiple sequence alignment with Tcoffee.TCOFFEEDescriptionUsage

Available Tools on www.tcoffee.org

Aside from its accuracy, the main specificity of Tcoffee is its ability to align sequences and structures (EXPRESSO), the possibility of evaluating the accuracy of an alignment (CORE) and the possibility of combining many alternative multiple sequence alignments into one (Mcoffee).

Page 111: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

Page 112: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

Human TLR1-10’s TIR domains

msa.fastamsa.fasta

Page 113: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

Page 114: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

Page 115: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

score_html file

clustalw_aln file

fasta_aln file

phylip file

Page 116: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.catWhen you choose to store your data in a specific format, you must ask yourself four questions:

Do most programs support this format?

Will my collaborators be able to use it?

Can I store all the information I need with this format?

Is it easy to manipulate?

If the program you’re using doesn’t produce alignments in the format you need, it is possible to use a third-party conversion tool to get to the format you want.

fmtseq : http://www.bioinformatics.org/JaMBW/1/2 or

http://evol.mcmaster.ca/Pise/5.a/fmtseq.html

Page 117: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

Page 118: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

EXPRESSO is the latest development of Tcoffee, replacing what was known as 3D-Coffee. When you run Expresso, the program uses BLAST to search the PDB for structures whose sequences are similar to your sequences. It then uses theses structures to guide the alignment. Alignments based on structures are expected to be much more accurate than simple sequence alignments.

Page 119: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

Page 120: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.catEXPRESSO T-Coffee

Page 121: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

T-Coffee http://tcoffee.crg.cat

PDB ID

Page 122: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

MUSCLE - is a newcomer in the MSA area but it is aremarkably efficient package for making fast, high-quality multiple sequence alignments. MUSCLE is ideal if you want to align several hundredsequences.

Home page : http://www.drive5.com/muscle

Page 123: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

The most commonly used MSA packages.

Introduction to BioinformaticsEnglish Courses for Graduate Students

MUSCLE http://www.ebi.ac.uk/Tools/msa/muscle

Page 124: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Searching conserved patterns

Introduction to BioinformaticsEnglish Courses for Graduate Students

One sentence summarizes what you really want from your multiple alignment:

You want to identify important positions!

Page 125: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Searching conserved patterns

Introduction to BioinformaticsEnglish Courses for Graduate Students

One sentence summarizes what you really want from your multiple alignment:

You want to identify important positions!

Page 126: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Searching conserved patterns

Introduction to BioinformaticsEnglish Courses for Graduate Students

One sentence summarizes what you really want from your multiple alignment:

You want to identify important positions!

Page 127: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Searching conserved patterns

BB-Loop

BB-Loop - is important for the TIR domain dimerizationand interaction with downstream adaptors or inhibitors.

Human TLR 1-TIRHuman TLR 2-TIRHuman TLR 10-TIR

Page 128: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

score_html file

clustalw_aln file

fasta_aln file

phylip file

Getting Your Multiple Alignment in the Right Format

Page 129: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsFor editing and publishing a multiple sequence alignment, bioinformaticanshave developed text editors that are specific for multiple sequence alignment. They make it easy for you to see exactly what’s going on.

Most of these editors require that you install something on your computer. However, if you want to stick to your browser, you can use Jalview.

Jalview is a Java applet that you need only load into your Web browser for instant action. Home page : http://www.jalview.org

Do not load confidential sequences!Web interface is NOT secure.

Page 130: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsEMBL ClustalW http://www.ebi.ac.uk/Tools/msa/clustalw2

Page 131: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

Page 132: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

Page 133: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

run

Page 134: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

Close ALL the windows that appear within the Jalview Window, as they only contain sample data.

Page 135: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

results.clustalwresults.clustalw

Page 136: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

Page 137: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.htmlhttp://www.jalview.org/help.html

Page 138: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.html

Page 139: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.htmlColour -> Clustalx

Page 140: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing AlignmentsJalview http://www.jalview.org/download.htmlColour -> Clustalx

http://www.jalview.org/help.html

Page 141: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

When you edit an alignment, you usually want to do is collectively modify the alignment. To do this, you need to define them as a group, as follows:

Keep the Ctrl key pressed while you click names of sequences 1, 2, 3 and 4 to select them.

Page 142: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

1. Keep the Ctrl key pressed.2. Put your mouse pointer right where you want to insert or remove the gap.3. Drag to the left or to the right to shift your sequences

You can edit one sequence at a time by pressing the Shift key instead of Ctrl.

Page 143: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

perform PairwiseAlignment for a pair of selectd sequences

Page 144: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

calculate tree for all selected sequences

Page 145: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

predict secondary structure for a selected sequence.

Page 146: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

JNet Secondary Structure Prediction result

Page 147: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignmentssave your alignment as a text/picture

Page 148: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

Showtime has finally come: You have the multiple alignment you want, and you’re determined to show the world!

Page 149: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Editing and Publishing Alignments

A multiple alignment editor written in Java

http://www.jalview.orgJalView

A very powerful shading and-coloring tool

http://espript.ibcp.fr/ESPript/ESPriptESPript

Shading in black and whitehttp://www.ch.embnet.org/software/BOX_form.html

Boxshade

Adding optional HTML markup to control coloring and web page layout

http://bio-mview.sourceforge.netMView

DescriptionURLNameMultiple Alignment Beautifying Tools

Page 150: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

exercise.fasta

Can you make a MSA for these 5 protein sequences?Which two sequences are the most similar ones?How similar are they? (i.e. How about their sequence identity?)What kind of proteins are they?

Page 151: Introduction to Bioinformatics - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20120322211245... · Identity and Similarity Residue: a letter; an amino acid in a

Introduction to BioinformaticsEnglish Courses for Graduate Students

Notice:

Next time (2011/10/19) we will move to

8# building, 2nd floor, west,

多媒体教室DUOMEITIJIAOSHI.