Top Banner
Computational Molecular Biology MPI for Molecular Genetics 1 DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on genomic DNA Applications
56

1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

1

DNA sequence analysisGene prediction

Gene prediction methods

Gene indices

Mapping cDNA on genomic DNA

Applications

Page 2: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

2

DNA sequence analysisGene prediction

exon 2exon 1 exon npromotor

5‘UTR

3‘UTRProtein coding sequence

exon n-1

Page 3: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

3

Gene predictionStrategies for detecting ORFs / exons

Distribution of Stop-codons

Codon usage

Hexamer frequencies

Prediction of the coding frame

Splice site recognition (Eucaryotes only)

Page 4: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

4

Gene predictionCodon usage (single exon)

Frame 1

Frame 2

Frame 3

coding

non-coding

Page 5: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

5

Gene predictionCodon usage (single exon)

Frame 1

Frame 2

Frame 3

coding

non-coding

correct start

coding sequence

Page 6: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

6

Gene predictionCodon usage (multiple exons)

Frame 1

Frame 2

Frame 3

coding

non-coding

Splice sites

Exons:208. .2951029. .13491500. .16882686. .29343326. .34443573. .36804135. .43094708. .48464993. .50967301. .73897860. .80138124. .84058553. .87139089. .922513841. .14244

Page 7: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

7

Gene predictionCodon usage (multiple exons)

Frame 1

Frame 2

Frame 3

coding

non-coding

Splice sites

Exons:208. .2951029. .13491500. .16882686. .29343326. .34443573. .36804135. .43094708. .48464993. .50967301. .73897860. .80138124. .84058553. .87139089. .922513841. .14244

Page 8: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

8

Gene predictionAdditional criteria

Detection of start codons

Detection of potential promotor elements

Detection of repetitive sequences (mostly untranslated)

Homology to known genes of related

organisms

Page 9: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

9

Gene predictionSoftware

GENSCAN (C.Burge & S.Karlin)

Grail (neural network; Ueberbacher et al.)

MZEF (M. Zhang,1997)

FGeneH, Hexon (V.Solovyev et al., 1994)

Genie, etc.All programs are using dynamic programming for detection of theoptimal solution

Page 10: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

10

DNA sequences in public databases

Human

~ 4 million ESTs + 130 000 RNAs

Mouse

~ 2.7 million ESTs + 30 000 RNAs

Page 11: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

11

Expressed sequence tags (EST)

AAAAAA...mRNATTTTTT...

cDNA is usually oligo dT primed, or by random primers

Reverse transcriptase stops ‚randomly‘

cDNA

Several cDNAs for the same mRNA may be generated

Page 12: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

12

Expressed sequence tags (EST)

Average: 1500 bp

<700 bpVector

(known sequence)

Clone = mRNA fragmentDechiffered sequence (EST)

3‘-primer

Page 13: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

13

Expressed sequence tags (EST)

Isolation of mRNAs from tissue(s)

Generation of cDNAs reflecting parts of the RNAs

Cloning of cDNAs into a vector (often random orientation)

End sequencing of the clones

Page 14: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

14

Generation of ESTsbasecalling problems

close to 3‘ end of EST

close to 5‘ end of EST

missing bases

Page 15: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

15

Coverage of an mRNA by ESTs

AAAAAA...putativemRNA exon 15‘UTR exon 2 3‘UTR

expressed sequence tags(ESTs)

Page 16: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

16

Characteristics of ESTs

Highly redundant

Low sequence quality

(Cheap)

Reflect expressed genes

May be tissue/stage specific

Page 17: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

17

Gene indices

UniGene (NCBI)

TIGR Gene Indices

STACK (SANBI)

GeneNest (DKFZ,MPI)

Clustering of EST and mRNA sequences of an organism toreduce redundance in sequence data.

Goal: Each cluster represents one gene or mRNA

Page 18: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

18

Gene indicesGeneNest workflow

EMBL database Unigene database

Quality clipping Quality clipping

BLAST/QUASARsearch, clustering

Assembly,Consensus sequences

Visualization

Page 19: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

19

Gene indicesQuality clipping

Removal of vector sequence

Masking of repetitive sequences (e.g. Alu)

Removal of terminal sequences of low quality

In order to cluster based on gene-specific sequence datathe following steps have to be performed:

Page 20: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

20

Gene indices Clustering

Minimal % identity (e.g. > 95%)

Minimal length of match (e.g. >40 bp)

No internal matches (TIGR gene indices)

Same origin of tissue (only STACK)

Sequences are usually clustered if the matching part between two sequences fullfills several (empirical) criteria:

Page 21: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

21

Gene indices Assembly

Contigs, reflecting parts of different transcripts

One consensus sequence per contig

A relative order of the sequences (alignment)

Sequences in a cluster are assembled to group those sequences which are globally similar, resulting in

Page 22: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

22

Gene indicesConsensus sequences

Reduced error rate

Consensus often longer than any single sequence contributing

Efficient database search

Detection of exon/intron boundaries and alternative splice variants

Page 23: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

23

Gene indices Alignment

consensus

Page 24: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

24

Gene indices Alignment Software

Phrap (Phil Green)

CAP3 (X. Huang)

TIGR assembler

GAP4 (R. Staden)

Page 25: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

25

GeneNest visualization(http://genenest.molgen.mpg.de)

Page 26: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

26

GeneNest visualization(http://genenest.molgen.mpg.de)

Page 27: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

27

TIGR Gene Indices(http://www.tigr.org/)

Alignment scheme

Page 28: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

28

UniGene(http://www.ncbi.nih.nlm.gov/UniGene)

Page 29: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

29

UniGene(http://www.ncbi.nih.nlm.gov/UniGene)

Page 30: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

30

Mapping of consensus sequences on genomic DNA

genomic sequence

exons

consensus sequence( mRNA)

missing intron

Page 31: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

31

Mapping cDNA on genomic DNA

Page 32: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

32

Gene indicesApplications

Detection of exon/intron boundaries

Detection of alternative splicing

Detection of Single Nucleotide Polymorphisms

Genome annotation

Analysis of gene expression

Genome-genome comparison

Page 33: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

33

Alternative Splicing

hnRNA

mRNA 2exon 15‘UTR exon 2

mRNA 1exon 15‘UTR exon 3

exon 15‘UTR exon 2 exon 3

Page 34: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

34

Alignment of EST consensus sequences and genomic target

genomic sequence

Page 35: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

35

Detection of the appropriate genomic target sequence

Local similarity of EST consensus and genomic DNA>96% identity

genomic sequence

Page 36: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

36

Cutting out genomic target sequence

genomic sequence

Page 37: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

37

Alternative Splicing(mapping on genomic DNA)

genomic sequence

exons

consensus sequence( mRNA)

splice variant

Page 38: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

38

SpliceNest(http://SpliceNest.molgen.mpg.de)

putative exons

genomic sequence

aligned GeneNestconsensus

alternative exon

Page 39: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

39

Alternative Splicing(additional exon)

skipped exon

Splice variants of adenylsuccinate lyase

gene prediction errors ?

unspliced ?

Page 40: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

40

Alternative Splicing

Splice variants of APECED gene

number of sequences genomic sequencealternative variants

Page 41: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

41

Alternative splicing

Page 42: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

42

Alternative Splicing (alternative donor site)

Page 43: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

43

Alternative Splicing

Page 44: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

44

Alternative Splicing(alternative exons)

Page 45: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

45

SpliceNest(hypothetical gene Hs16936)

Page 46: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

46

Single Nucleotide Polymorphisms(SNP)

SNPs are single base differences within one species

Several million SNPs detected in Human

SNPs may be related to diseases

Page 47: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

47

Single Nucleotide Polymorphisms(SNP)

SNP or basecalling error ?

Page 48: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

48

Genome Annotation / Ensembl(http://www.ensembl.org)

Page 49: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

49

Analysis of gene expressiontissue-specificity

Counting frequency of EST derived from a specific tissue within one sequence cluster

Searching for cluster/contigs which are tissue specific (e.g. tumor)

Searching for alternative splice variants which are potentially tissue specific

Page 50: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

50

Analysis of gene expressionPDZ-domain containing protein PDZK1 (Hs.15456)

liver tumor

kidney

Page 51: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

51

Analysis of gene expressionsmall muscular protein, SMPX (Hs.88492)

heart

muscle

Page 52: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

52

Analysis of gene expressionhypothetical protein (Hs.32343)

thyroid tumor

heart

ovary

Page 53: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

53

Analysis of gene expressionnon-redundant gene set

Selection of ‚optimal‘ clones

Generation of gene-specific PCR-products

Page 54: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

54

Analysis of gene expression ‚optimal clones‘

clone availability

type of clone library

length of the clone

relative position to the consensus sequence

homology to other genes

existence of repetitive elements

Page 55: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

55

Analysis of gene expressiongene-specific PCR-products

putative gene consensussequence exon A exon Cexon B

repetitive sequencesimilarity to another gene

potential gene-specific fragment

potential gene-specific fragment

Page 56: 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Computational Molecular BiologyMPI for Molecular Genetics

56

Analysis of gene expressionoptimal gene-specific PCR-product

minimal similarity to other genes

minimal content of repetitive sequences

not spanning over several exons

+/- constant length of PCR-products of different genes