Top Banner
Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of Tatyana Popova R&D Centre in Biberach, Germany Alexander Gorban Centre for Mathematical Modelling
33

Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Dec 14, 2015

Download

Documents

Kacie Tetlow
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Seven clusters and four types of symmetry in

microbial genomes

Andrei Zinovyev

Bioinformatics service

Math@Bio group of M.Gromov

Tatyana Popova

R&D Centre in Biberach, Germany

Alexander Gorban

Centre for Mathematical Modelling

Page 2: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Symbol of GofG’05

Page 3: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Genomic sequence as a text in unknown language

tagggrcgcacgtggtgagctgatgctaggg

frequency dictionaries:t a g g g r c g c a c g t g g t g a g c t g a t g c t a g g g

ta gg gr cg ca cg tg gt ga gc tg at gc ta gg

tag ggr cgc acg tgg tga gct gat gct agg

tagg grcg cacg tggt gagc tgat gcta gggr

N = 4=41

N = 16=42

N = 64=43

N=256=44

gggrcgccacgttggtgagctgatgctagggrcgacgtgg

tagggrcgcacgtggtgagctgatgctagggrcgacgtgg

agggrcgcacgtggtgagctgatgctagggrcgacgtggc

..cgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcgacgtggtgagctgatgctagggrcgc…

Page 4: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

From text to geometrycgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcgacgtggtgagctgatgctagggrcgc

107

cgtggtgagctgatgctagggrcgcacggtgagctgatgctagggrcgcacacttgagctgatgctagggrcgcacaattcgtgagctgatgctagggrcgcacggtg……gagctgatgctagggrcgcacaagtga

length~200-400

10000-20000 fragments

RN

Page 5: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Method of visualizationprincipal components analysis

RNR

2

R2

PCA plot

Page 6: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Caulobacter crescentus

singles N=4

doublets N=16

triplets N=64

quadruplets N=256

!!!

the information in genomic sequence is encodedby non-overlapping triplets (Nature, 1961)

Page 7: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

First explanation

cgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcgacgtggtgagctgatgctagggrcgc

Page 8: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

tga tgc tag ggr cgc acg tgg

ctg atg cta ggg rcg cac gtg

Basic 7-cluster structure

gtgagctgatgctagggrcgcacgtggtgagc

gct gat gct agg grc gca cgt

gtgaatcggtgggtgaqtgtgctgctatgagc

atc ggt ggg tga gtg tgc tgc

tcg gtg ggt gag tgt gct gct

cgg tgg gtg agt gtg ctg ctg

Page 9: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Non-coding parts

gtgagctgatgctagggr cgcacgaat

Point mutations:insertions, deletions

a

Page 10: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

The flower-like 7 clusters structure is flat

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Page 11: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Seven classes vs Seven clusters

Stanford

TIGR

Georgia Institute of Technology

Page 12: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Computational gene prediction

Accuracy >90%

Page 13: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Mean-field approximationfor triplet frequencies

321KJIIJK PPPF

FIJK : Frequency of triplet IJK ( I,J,K {A,C,G,T} ):

FAAA , FAAT , FAAC … FGGC , FGGG : 64 numbers

position-specific letter frequency + correlations

: 12 numbersjiP

Page 14: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Why hexagonal symmetry?

0-+

-+0

+0-

+-0

-0+

0+-

GC-content = PC + PG

Page 15: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Genome codon usageand mean-field approximation

ggtgaATG gat gct agg … gtc gca cgc TAAtgagct

correct frameshift

64 frequencies FIJK

ggtgaATG gat gct agg … gtc gca cgc TAAtgagct

12 frequencies PI1 , PJ

2 , PK3

Page 16: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

PIJ are linear functions of GC-content

eubacteria

archae

Page 17: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

THE MYSTERY OF TWOSTRAIGHT LINES ???

R12 R64

FIJK = P1IP2

JP3K + correlations

Page 18: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Codon usage signature

0-+

Page 19: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

19 possible eubacterialsignatures

Page 20: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Example: Palindromic signatures

Page 21: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Four symmetry typesof the basic 7-cluster structure

eubacteria

flower-likedegeneratedperpendiculartriangles

paralleltriangles

Page 22: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

B.Halodurans (GC=44%)

S.Coelicolor (GC=72%)

F.Nucleatum (GC=27%)

E.Coli (GC=51%)

Page 23: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Web-site

http://www.ihes.fr/~zinovyev/7clusters

cluster structures in genomic sequences

Page 24: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Human genome (chr19)

non-repetitive sequencesrepetitive sequences

singles doublets triplets

Page 25: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Letter frequencies (3 dimensions)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1 2 3

a

c

g

t

GC-content (50%)

Purine-Pyrimidine (33%)

Amino-Keto

(17%)

a t

c g

a

tc

g a c

gt

Page 26: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Non-linear good 2D representation(elastic principal manifolds)

A T

G C

0%

100%

Page 27: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Measuring densities

A

T

G

C

A

T

G

C

Page 28: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Contrasting density distribution (two ideas)

• Noise is Gaussian

• Noise is smooth

Page 29: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Contrasted density

A

T

G

C

A

T

G

C

Page 30: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Excluding repeats

A

T

G

C

A

T

G

C

Page 31: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Excluding repeats

A

T

G

C

A

T

G

C

Page 32: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

Papers (type Zinovyev in Google)

Gorban A, Zinovyev AGorban A, Zinovyev APCA deciphers genome.PCA deciphers genome. 2005. Arxiv preprint

Gorban A, Popova T, Zinovyev A Gorban A, Popova T, Zinovyev A Codon usage trajectories and 7-cluster structure of 143 complete Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences.bacterial genomic sequences. 2005. Physica A 353, 365-387

Gorban A, Popova T, Zinovyev AGorban A, Popova T, Zinovyev AFour basic symmetry types in the universal 7-cluster structure of Four basic symmetry types in the universal 7-cluster structure of microbial genomic sequences. microbial genomic sequences. 2005. In Silico Biology 5, 0025

Gorban A, Zinovyev A, Popova T Seven clusters in genomic triplet distributionsSeven clusters in genomic triplet distributions. 2003. In Silico Biology. V.3, 0039.

Zinovyev A, Gorban A, Popova T Self-Organizing Approach for Automated Gene IdentificationSelf-Organizing Approach for Automated Gene Identification. 2003. Open Systems and Information Dynamics 10 (4).

Page 33: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre.

People

Dr. Tanya PopovaInstitute of Computational ModelingRussia

ProfessorAlexander GorbanUniversity of LeicesterUK