Two sequences Multiple sequences Loc al Blastz (zPicture-dcode.org) ALIGNMENT CONVERVED TFBS LAGAN (mVISTA) Global Glob al TBA/Multiz (Mulan-dcode.org) Loc al rVISTA at at dcode.org PROMOTER SEQUENCE ALIGNMENT Promoter Sequence Alignment Daniel Rico, PhD. [email protected]
17
Embed
Two sequences Multiple sequences Local Blastz (zPicture-dcode.org) ALIGNMENTCONVERVED TFBS LAGAN (mVISTA) Global TBA/Multiz (Mulan-dcode.org) Local rVISTA.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Local aligners– Work by “stacking” pairwise alignments– High specificity– BlastZ, LastZ, TBA + MultiZ
• Global aligners– Need to pre-define collinear segments– Better sensitivity– AVID/MAVID, LAGAN/MLAGAN, Pecan
• Mixed aligners– Combine both approaches– Shuffle-LAGAN, MAUVE
2
Reference Sequence Idea
– A sequence is fixed as the reference to which all other sequences are compared
S1: A T G C T CS2: A G A G CS3: T T C T GS4: A T T G C A T G C
S1: A T - G C - T - CS2: A - - G A - G - CS3: - T - T C - T - GS4: A T T G C A T G C
S1: A T G C T CS2: A - G A G C
S1: A T G C T CS2: A - G A G CS3: - T T C T G
3
S1: A T - G C - T - CS2: A - - G A - G - CS3: - T - T C - T - GS4: A T T G C A T G C
S1: A T G C T CS2: A G A G CS3: T T C T GS4: A T T G C A T G C
4
BlastZ: Improved pairwise alignment of Genomic Sequences
Nucleotide local alignment program developed by Webb Miller's group (http://www.bx.psu.edu/miller_lab/)
BlastZ computes local alignments for sequences of any length based on the assumption that the input sequences are related and share blocks of high conservation that are separated by regions that lack homology and vary in length in the two sequences.
Penalizes gaps using a large gap-opening penalty and small gap-extension penalty, to reduce the over-penalization of longer gaps
Zpicture is web server for aligning 2 sequences wit BlastZ:
“sliding window” to measure sequence conservation(default window size 100bp)
Graphical presentation of sequence conservation as “peaks-and-valley” curve
>70% identity
base sequence coordinates
%identity
http://dcode.org/
(A) Standard stacked-pairwise visualization (smooth graph) of Mulan alignments of NOS-2 gene promoter. The human sequence (from -10 kb to +1 kb) was selected as the reference species. Repeats were masked in all species with RepeatMasker (Mulan settings); green regions in the base sequence indicate the human repeats. The graphical representations of the other sequences are displayed according to their similarity to the base sequence: the closer they are to human, the higher is the conservation (top sequences are less conserved). Parameters selected for detection of evolutionarily conserved regions (ECR) were 90 bp minimum length and minimum similarity of 65% (50% bottom cut-off). Red indicates regions that are upstream from the transcription start site; pink regions are downstream from it. Two conserved motifs in rodent NOS-2 promoters indicate the presence of distal and fragmented sequences that are very similar to the unique enhancer region conferring NF-κB regulation in human NOS-2. (B) A schematic representation of the hypothetical translocation of these sequences in human and rodents; double head arrows indicate the positional translocation.
Rico et al. BMC Genomics 2007 8:271 doi:10.1186/1471-2164-8-271