Top Banner
Repeats in the Genome Lecture 11/2
34

Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Mar 28, 2015

Download

Documents

Dulce Bafford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Repeats in the Genome

Lecture 11/2

Page 2: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Repeats in the genome

• Interspersed repeats

• Tandem repeats– Microsatellites– Minisatellites– Satellites

Page 3: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

http://mcb1.ims.abdn.ac.uk/djs/web/lectures/repeats1.html#anchor10305

Page 4: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Large repeats: Transposons

• “Transposable elements” (TE’s)– Sequences that get moved/copied into

different loci in the genome

• P elements in Drosophila: genes piggybacked on transposons and inserted into the genome, in the lab– “transgenic fruitflies”

Page 5: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Transposons

http://nitro.biosci.arizona.edu/courses/EEB600A-2003/lectures/lecture26/lecture26.html

Page 6: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Transposons

http://nitro.biosci.arizona.edu/courses/EEB600A-2003/lectures/lecture26/lecture26.html

Page 7: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Transposons

http://nitro.biosci.arizona.edu/courses/EEB600A-2003/lectures/lecture26/lecture26.html

Page 8: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Retrotransposons: 2 examples

• SINEs : Short Interspersed repeats– 100-500bp; up to 1M copies;– Non-autonomous– Example : “Alu” repeats– 13 % of human genome

• LINEs : Long Interspersed repeats– Up to 7 Kbp long; 4000 - 100,000 copies– Autonomous– Examples: LINE1, LINE2, LINE3– 21 % of human genome

Page 9: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Functions of interspersed repeats

• May cause disruptions, disease– Colorectal cancer

• Role in evolution of new genes

• Function of SINEs and LINEs not fully known– Selfish DNA ?

• Parasitic elements akin to viruses

Page 10: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

RepeatMasker

• Program to detect and mask interspersed repeats in a sequence

• Also finds low complexity sequences and masks them

• Can work with a library of known repeats

Page 11: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Tandem Repeats

• Satellites– In centromeres and telomeres– Repeating pattern 1bp - 1000s bp long

• Mini- and micro-satellites– simple, small sequence repeats

Page 12: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Microsatellite

541 gagccactag tgcttcattc tctcgctcct actagaatga acccaagatt gcccaggccc 601 aggtgtgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt gtatagcaga gatggtttcc 661 taaagtaggc agtcagtcaa cagtaagaac ttggtgccgg aggtttgggg tcctggccct 721 gccactggtt ggagagctga tccgcaagct gcaagacctc tctatgcttt ggttctctaa 781 ccgatcaaat aagcataagg tcttccaacc actagcattt ctgtcataaa atgagcactg 841 tcctatttcc aagctgtggg gtcttgagga gatcatttca ctggccggac cccatttcac

a microsatellite in a dog (canis familiaris) gene

http://www.bioinfo.rpi.edu/~bystrc/courses/biol4540/lecture24/lec24.pdf

• 1-5bp repeating pattern

Page 13: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Microsatellites

• Copy numbers variable across individuals

• Associated with human diseases– Fragile X syndrome, Huntington’s disease,

Myotonic dystrophy

• Can be used for genetic fingerprinting & paternity tests, due to high variability

Page 14: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Minisatellites

1 tgattggtct ctctgccacc gggagatttc cttatttgga ggtgatggag gatttcagga 61 tttgggggat tttaggatta taggattacg ggattttagg gttctaggat tttaggatta 121 tggtatttta ggatttactt gattttggga ttttaggatt gagggatttt agggtttcag 181 gatttcggga tttcaggatt ttaagttttc ttgattttat gattttaaga ttttaggatt 241 tacttgattt tgggatttta ggattacggg attttagggt ttcaggattt cgggatttca 301 ggattttaag ttttcttgat tttatgattt taagatttta ggatttactt gattttggga 361 ttttaggatt acgggatttt agggtgctca ctatttatag aactttcatg gtttaacata 421 ctgaatataa atgctctgct gctctcgctg atgtcattgt tctcataata cgttcctttg

Consensus AGGATTTT

• 6-20 bp repeating pattern

Page 15: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Minisatellites

• Highly polymorphic across individuals– Used for DNA fingerprinting

• Regulation of gene expression

Page 16: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Recognizing repeat sequences

“Dot plots”

Self-similarity

Page 17: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Tandem repeat detection

• Have to account for approximate tandem repeats– Repeating unit may not be exactly same

(mutations)– May not be exactly in tandem (indels)

Page 18: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

TRF (Benson)

• Assume > 80% sequence identity on average

• Assume < 10% rate of indels

• Basic idea

T A T A C G T C G A G A C T T A T C C A C G G A G A T A T T T A

Page 19: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.
Page 20: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Statistical criteria

• The candidate tandem repeat converted into a Bernoulli (head/tail) sequence

• Assess significance of this sequence, assuming a probabilistic model

CCACAACC-CGTCAGGCAAGT

CTGCACCATCGTCTGGGAAGT

HTTHHTHTTHHHHTHHTHHHH

Page 21: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Statistical criteria

• Sequence of length 100, with pH = 0.75

• >=95% of time, total number of heads is >=68• >=95% of time, total number of heads in runs

of length 5 or more is >=26• We are counting only head-runs of length k or

more• This tells us what would would be a

significant number of heads

Page 22: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Statistical criteria

• Due to indels, a repeating pattern of size d may induce exact-matching k-tuples separated by d,d1, d2 etc.

• Consider all such pairs, up to ddmax

• dmax calculated using an assumption about pI (the indel frequency) and a random-walk model

Page 23: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Statistical criteria

• Other criteria to– distinguish tandem repeats from non-

tandem direct repeats• matching k-tuples biased on one side

– pick tuple sizes

Page 24: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Mreps (another program)

• Different algorithm to detect repeats• Maximal run of k-mismatch tandem

repeats, with period p:– A maximal string such that any substring of

length 2p is a tandem repeat with at most k mismatches

– All such maximal runs can be computed in time O(nk log(k)), where n is length of sequence

Page 25: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Mreps: Statistical criteria

• Two reasons for insignificance– Short length

• Reject runs of length < p+9

– Too many mismatches• Create “random” DNA sequences, and infer

quality filter based on this

Page 26: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Gene Duplications

• If a region containing a gene is duplicated, a new copy of gene is created: paralogs

• Eases up the “selective pressure” on one of the copies– free exploration of sequence space

• Cases of entire genomes being duplicated– yeast, wheat

Page 27: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Pseudogenes

• Upon gene duplication, one of the two copies may gather a deleterious mutation– Example: premature “stop codon”

• Once the gene “dies” in this fashion, no more selective pressure on it. Such a “dead” copy of a gene is a “pseudogene”

Page 28: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Pseudogenes

• Any sequence that appears to code for a gene product, but does not do so

• Origins of pseudogenes– Gene duplication– Change of environment, gene no longer needed– portion of mRNA transcript reverse-transcribed

and inserted into genome

• Create problems for genome study– Mis-annotated as genes

Page 29: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Pseudogenes

• Pseudogenes mutate at “neutral” rate, free of any selective pressures

• Can be used for evolutionary analysis

• Example:– In Drosophila, insertions:deletions in the

ratio of 1:8, based on study of pseudogenes

Page 30: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Tandem Repeats and Binding Sites

• Regulatory modules have 20-40% coverage by tandem repeats– Based on a study on Drosophila– Very significant statistically, if assuming

low-order Markov background

• Relation between tandem repeats and binding sites ?

Page 31: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Tandem Repeats and Binding Sites

• Possibility: Tandem repeats help in creating duplicates of binding sites

• Multiple copies of binding site – helps exploring new binding sites– helps fine-tune binding affinity

• Faster evolution ?

Page 32: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Implications for regulatory sequence analysis

• Regulatory sequence modeled as a mixture of motif and non-motif “background”

• Background typically a Markov chain of fixed order– Given last k bases, S[i..i+k-1], next base

determined by a fixed probability distribution

Page 33: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Tandem Repeats in Model

• Tandem repeats violate Markov assumption: previous k bases S[i..i+k-1] may provide a probability distribution on next base, OR we may have a tandem repeat of previous j <= k bases

• Similarly, a binding site or a part of a binding site may also be tandem repeated

Page 34: Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Tandem Repeats in Model

• Need to modify the probabilistic model to include tandem repeats

• Research topic