Synonymous mutations - from bacterial evolution to somatic changes in human cancer Fran Supek 1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona 2) Division of Electronics, RBI, Zagreb, Croatia XXI Jornades de Biologia Molecular Barcelona, 11.6.2014
57
Embed
Synonymous mutations - from bacterial evolution to somatic ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Synonymous mutations - from bacterial evolution to somatic
changes in human cancer
Fran Supek
1) Lehner group, CRG/EMBL Systems Biology Unit, Barcelona
2) Division of Electronics, RBI, Zagreb, Croatia
XXI Jornades de Biologia Molecular
Barcelona, 11.6.2014
synonymous mutations =changes in the gene sequencethat don’t alter the protein sequence
Synonymous mutations
• (some) synonymous mutations are subject to evolutionary pressures• clearly shown for many bacteria and yeasts
• likely also higher Eukarya (but weaker signal)
• how does selection for/against synonymous changes relate to gene function in (a) evolution of bacteria and (b) in carcinogenesis?
evolutionary trace across ~1000 bacterial genomes somatic mutations in ~4000 human cancers
malignant transformationadaptation to diverse environments
( plush microbes in photos are from http://www.giantmicrobes.com/ )
• In what way can evolution of synoymous codon preferences be used to systematically infer gene function in bacteria?
• There are other simpler (known) ways to determine gene function from the genome sequences:
• commonly/systematically applied: transfer of annotation via sequence similarity (BLAST, COG, Pfam...)
• >30% of genes end up with no known function annotated. They may not have known homologs, or their homologs may have no experimentally determined function.
• known but less common: genomic context methods, such as phyletic profiling
evolutionary trace across ~1000 bacterial genomes
adaptation to diverse environments
( plush microbes in photos are from http://www.giantmicrobes.com/ )
“We predict Gene Ontology annotations ... for about 1.3 million poorly annotated genes in 998 prokaryotes at a stringent threshold of 90% Precision...”
“...about 19000 of those are highly specific functions.”
published in:Skunca et al. PLoS Comp Biology 2013doi:10.1371/journal.pcbi.1002852
a randomization test to control for confounding phenotypes and phylogeny
This passes the randomization test:
This fails (association not unique):
associations between phenotypes, and also with phylogeny:
• 514 aerotolerant vs. 214 aerointolerant:
295 COGs are significantly enrichedwith HE genes
• obligate vs. facultative aerobes:
• thermophiles
• halophiles
+ 20 other phenotypes tested
control for confounders 23 COGs
11 COGs
16 COGs
6 COGs
Gene families linked to aerotolerance
all experiments: Anita Kriško lab (Mediterranean Institute for Life Science, Split, Croatia)published as Kriško et al, Genome Biology 2014. doi:10.1186/gb-2014-15-3-r44
0%
20%
40%
60%
80%
100%
120%
w.t
.
yjjB
flg
H
cysG
mn
mA
nlp
E
pro
Xosmotic oxidative heat
C
0%
20%
40%
60%
80%
100%
120%
w.t
.
clp
S
op
pA tig
ssu
D
nu
dF
pn
p
typ
A
mng
R
lsrR
yeb
S
rhlE
yajL
pyk
F
dtd
eu
tD
glo
B
yfcA
ma
rR
yccX
pn
cB
ttd
B
mo
aA
dsb
B
surv
ival
, no
rmal
ize
d to
w.t
.
heat oxidative osmotic
B
0x
1x
2x
3x
4x
5x
6x
0%
20%
40%
60%
80%
100%
120%
NA
C /
no
NA
C s
urv
ival
rat
io
surv
ival
, n
orm
aliz
ed
to
w.t
.
2.5 mM H2O2 5 mM NAC pretreatment heat shock osmotic shock
A
** ** **
* known antioxidant proteins in E. coli (or homologs in other organisms)
* known to be regulated in response to air or oxidative stress
ROS are typically not increased (except cysD, yaaU, rseC, and the positive control sodA)
Predicted functional interactions from STRING v9
Gene families whose codon biases are associated to aerobicity/aerotolerance:
ca
rbo
nyla
tion
incre
ase
DH
R-1
23
incre
ase
Ce
llRO
X
incre
ase
tota
lF
e
incre
ase
dip
yrid
yl
rescu
e
NA
DP
Hle
ve
lin
cre
ase
NA
DP
Hre
scu
e
fresufD
rseCsodA
w.t.
clpArecA
napFlon
ybeQ
yaaUcysD
ybhJgpmM
icdlpd
yidH
0 0.4 0.8Putative mechanisms of oxidative stress resistance
NAD(P)Hrelated
iron-related
unknown
all experiments: Anita Kriško lab (Mediterranean Institute for Life Science, Split, Croatia)published as Kriško et al, Genome Biology 2014. doi:10.1186/gb-2014-15-3-r44
2.5 mM H2O2 5 mM NAC pretreatment heat shock osmotic shock
A
Other phenotypes: thermophilicity, halophilicity
Knockout of candidate genes affects heat shock resistance and osmotic shock resistance.
Validation using synthetic genes with introduced suboptimal codons
0%
5%
10%
15%
20%
25%
30%
w.t. ΔclpS ΔclpS + clpS_w.t.
ΔclpS + clpS_15
ΔclpS + clpS_20
ΔclpS + clpS_25
% s
urv
ival
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.5 1 1.5 2 2.5
rela
tive
fre
qu
en
cy
codon distance (MILC) to ribosomal protein genes
ribosomal protein genes
all other E. coli genesw.t.
1520 25
w.t.
21 28 35
yjjB
clpS
0%
5%
10%
15%
20%
25%
30%
w.t. ΔyjjB ΔyjjB + yjjB_w.t.
ΔyjjB + yjjB_21
ΔyjjB + yjjB_28
ΔyjjB + yjjB_35
% s
urv
ival
osmotic shock
heat shockC
DB
A
all experiments: Anita Kriško lab (Mediterranean Institute for Life Science, Split, Croatia)published as Kriško et al, Genome Biology 2014. doi:10.1186/gb-2014-15-3-r44
recurrently mutated genes(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
complete set of 13219 noncancer genes
B
known cancer genes
in Census
others:336
39
38
C
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
0
0.2
0.4
0.6
0.8
1D- = 0.199P = 0.043
earlylate
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215P = 0.025
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821missense-activatedoncogenes
recurrently mutated(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185P = 0.061
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr
elat
ion
to
PC
2 (
24
.3 %
)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mbnon-carcinoma, 1Mb
pooled, 200kbliver, 200kb
liver, 1Mbbreast, 1Mb
H3K9me3,1Mb
GC3
RepliSeq,1Mb
hypothalamusliver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.224P = 0.017
0
0.2
0.4
0.6
0.8
1
9 19 29
D+ = 0.313P = 0.0004
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.464P = 2.4·10-8
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256P = 0.005
0
0.2
0.4
0.6
0.8
1
D+ =0.211P = 0.026
earlylate
oncogenes:
translocation(217)
missense(40)
copy number (12)
tumorsuppressors:
all mechanisms
(84)
Cancer GeneCensusA
recurrently mutated genes(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
complete set of 13219 noncancer genes
B
known cancer genes
in Census
others:336
39
38
C
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
0
0.2
0.4
0.6
0.8
1D- = 0.199P = 0.043
earlylate
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215P = 0.025
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821missense-activatedoncogenes
recurrently mutated(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185P = 0.061
„classical” cancer genes:newly discovered, fromcancer genomes:
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr
elat
ion
to
PC
2 (
24
.3 %
)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mbnon-carcinoma, 1Mb
pooled, 200kbliver, 200kb
liver, 1Mbbreast, 1Mb
H3K9me3,1Mb
GC3
RepliSeq,1Mb
hypothalamusliver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.224P = 0.017
0
0.2
0.4
0.6
0.8
1
9 19 29
D+ = 0.313P = 0.0004
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.464P = 2.4·10-8
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256P = 0.005
0
0.2
0.4
0.6
0.8
1
D+ =0.211P = 0.026
earlylate
oncogenes:
translocation(217)
missense(40)
copy number (12)
tumorsuppressors:
all mechanisms
(84)
Cancer GeneCensusA
recurrently mutated genes(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
complete set of 13219 noncancer genes
B
known cancer genes
in Census
others:336
39
38
C
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
0
0.2
0.4
0.6
0.8
1D- = 0.199P = 0.043
earlylate
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215P = 0.025
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821missense-activatedoncogenes
recurrently mutated(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185P = 0.061
Detecting positive selection on synonymous mutations in cancer
• create „matched sets” of genes closely following the oncogenes in:
• regional mutation rates• In 1 Mb and 200 kb windows
• expression levels in different tissues
• Heterochromatin, replication timing
• G+C content
How to find a good set of genes?
A genetic algorithm. An optimization technique that can (relatively)easily handle many criteria at once. Quite efficient. Many parameters.
Operators:
...crossover
...random mutation
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr
elat
ion
to
PC
2 (
24
.3 %
)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mbnon-carcinoma, 1Mb
pooled, 200kbliver, 200kb
liver, 1Mbbreast, 1Mb
H3K9me3,1Mb
GC3
RepliSeq,1Mb
hypothalamusliver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.224P = 0.017
0
0.2
0.4
0.6
0.8
1
9 19 29
D+ = 0.313P = 0.0004
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.464P = 2.4·10-8
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256P = 0.005
0
0.2
0.4
0.6
0.8
1
D+ =0.211P = 0.026
earlylate
oncogenes:
translocation(217)
missense(40)
copy number (12)
tumorsuppressors:
all mechanisms
(84)
Cancer GeneCensusA
recurrently mutated genes(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
complete set of 13219 noncancer genes
B
known cancer genes
in Census
others:336
39
38
C
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
0
0.2
0.4
0.6
0.8
1D- = 0.199P = 0.043
earlylate
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215P = 0.025
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821missense-activatedoncogenes
recurrently mutated(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185P = 0.061
Oncogenes: Tumor suppressors:
Distributions of regional mutation rates (1Mb and 200 kb), heterochromatin, etc. in the optimized sets of non-cancer genes closely match the cancer genes. Genetic algorithm tries to minimize the K-S statistic.
-0.5
-0.25
0
0.25
0.5
0.75
1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1corr
ela
tio
n t
o P
C2
(2
4.3
%)
correlation to PC1 (30.4 % variance)
carcinoma, 1Mbnon-carcinoma, 1Mb
pooled, 200kbliver, 200kb
liver, 1Mbbreast, 1Mb
H3K9me3,1Mb
GC3
RepliSeq,1Mb
hypothalamusliver
skeletal & heart muscle
6 tissues
regional mutation rates
mRNA levels
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.224P = 0.017
0
0.2
0.4
0.6
0.8
1
9 19 29
D+ = 0.313P = 0.0004
0
0.2
0.4
0.6
0.8
1
-2 0 2
D- = 0.464P = 2.4·10-8
0
0.2
0.4
0.6
0.8
1
9 19 29
D- = 0.256P = 0.005
0
0.2
0.4
0.6
0.8
1
D+ =0.211P = 0.026
earlylate
oncogenes:
translocation(217)
missense(40)
copy number (12)
tumorsuppressors:
all mechanisms
(84)
Cancer GeneCensusA
recurrently mutated genes(self-reported in literature)
matched sets of noncancer genes:
1517 genes (for oncogenes)
693 genes (for tumor suppressors)
complete set of 13219 noncancer genes
B
known cancer genes
in Census
others:336
39
38
C
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
# mutations per 200 kb(110 cancers, pooled tissues)
heterochromatin (H3K9me3levels in 1 MB windows)
replication timing (RepliSeqsignal in 1 MB windows)
mRNA levels, avg. of 6 tissues(log2 RPKM)
0
0.2
0.4
0.6
0.8
1D- = 0.199P = 0.043
earlylate
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D+ = 0.215P = 0.025
39 oncogenes (recurrently mutated)
38 tumor suppressors (recurr. mutated)
D
19 1821missense-activatedoncogenes
recurrently mutated(from literature)
oncogenes
0
0.2
0.4
0.6
0.8
1
0.1 0.3 0.5
D- = 0.185P = 0.061
Expected: the oncogenes and the tumor suppressors are highly enriched with missense mutations (~1.5 - 2.5x).
However, the oncogenes are also enriched with synoynmous mutations over their matched sets, ~1.2x.
Introns of oncogenes (from whole-genome sequencing) are not enriched with SNVs, compared to matched sets.
The matched sets method agrees with Invex, and with simply using neighboring genes as a baseline.
Tissue-specific oncogenes are more enriched with synonymous mutations in the corresponding tissue.
This effect is not due to mutation showers/clustered mutations, as the same cancer samples don't tend to contain both a synonymous and a missense mutation in same gene.
Synonymous enrichment in oncogenes is detectable across cancer types.
Some oncogenes are more highly enriched with synonymous mutations than others, e.g. PDGFRA, EGFR, GATA1, ELN, NTRK1, JAK3, ALK and others (n=16).
The synonymous SNV enrichment in these genes is not paralleled by intronic SNV enrichment.
The synonymous mutations tend to cluster together to a similar extent as the missense mutations in the affected oncogenes. They also (less prominently) cluster with missense mutations.
0%
10%
20%
30%
40%
50%
60%
optimalcodongain
optimalcodon
loss
nochange%
of
syn
on
ymo
us
mu
tati
on
s le
adin
g to
ou
tco
me
n.s.
-18
-13
-8
-3
mR
NA
fo
ldin
g fr
ee e
ner
gy
aro
un
d m
uta
ted
sit
es (
kcal
/mo
l)
50nt windows
w.t.mRNA
mut.mRNA
-31
-26
-21
-16
-11
-6 100nt windows
w.t.mRNA
mut.mRNA
0%
10%
20%
30%
40%
≤30 nt 31-70nt
>70 nt
p < 10-4
1.75
1.26
0.45
-2
-1
0
1
2
1 2 3 4 5 6 7
log 2
RP
KM
of
exo
n
exon # in transcript ENST00000334286
30 random samples w/o point mutations
6 samples w/ synonymous exonic mutations
EDNRB gene,colorectal cancer
-0.5
-0.3
-0.1
0.1
0.3
0.5
wholecDNA
sites w/phyloP>1.0
net
# o
f ga
ined
miR
NA
see
d
site
s p
er s
yn. m
uta
tio
n
16 oncogenes
matched set
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between properties of mutated positions in oncogenes vs. matched set
Relative preference value at C-cap (of α helices)
Normalized frequency of turn in all-α class
Alpha-helix indices for α-proteins
Relative preference value at N' (of α helices)
Relative preference value at N'' (of α helices)
Normalized frequency of α-helix in all-α class
t-testFDR<10%
0%
10%
20%
30%
enh.gain
enh.loss
sil.gain
sil.loss%
syn
. mu
tati
on
s (w
ith
in 3
0 n
t o
f sp
lice
site
) le
adin
g to
eve
nt
Ke et al. 2012 hexamers
1.53
0.83
0.60
1.90
p = 0.02
enh.gain
enh.loss
RESCUE-ESE
p = 0.003
1.90
0.53
sil.gain
sil.loss
FAS-hex2
p = 3·10-4
0.372.73
A B C
D E
G
F
0%
10%
20%
α-helix, 1st a.a.
α-helix, middle
α-helix, last a.a.
p=0.05n.s.
n.s.
1.43
1.12
0.79
0%
10%
20%
30%
40%
50%
coil
actualsynonymousmutations
randomizedmutationpositions
0%
10%
20%
middle next tocoil only
next to β-sheet
p = 4·10-5
0.97
1.01
2.60
α-helixparts:
0%
10%
20%
30%
40%
50%
coil
H
I
To do: Make nice schematicof alpha-helix as a legend here
Use of „optimal codons” miRNA binding sites Secondary structures in mRNA
What could the synonymous mutations do?
0%
10%
20%
30%
40%
50%
60%
optimalcodongain
optimalcodon
loss
nochange%
of
syn
on
ymo
us
mu
tati
on
s le
adin
g to
ou
tco
me
n.s.
-18
-13
-8
-3
mR
NA
fo
ldin
g fr
ee e
ner
gy
aro
un
d m
uta
ted
sit
es (
kcal
/mo
l)
50nt windows
w.t.mRNA
mut.mRNA
-31
-26
-21
-16
-11
-6 100nt windows
w.t.mRNA
mut.mRNA
0%
10%
20%
30%
40%
≤30 nt 31-70nt
>70 nt
p < 10-4
1.75
1.26
0.45
-2
-1
0
1
2
1 2 3 4 5 6 7
log 2
RP
KM
of
exo
n
exon # in transcript ENST00000334286
30 random samples w/o point mutations
6 samples w/ synonymous exonic mutations
EDNRB gene,colorectal cancer
-0.5
-0.3
-0.1
0.1
0.3
0.5
wholecDNA
sites w/phyloP>1.0
net
# o
f ga
ined
miR
NA
see
d
site
s p
er s
yn. m
uta
tio
n
16 oncogenes
matched set
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between properties of mutated positions in oncogenes vs. matched set
Relative preference value at C-cap (of α helices)
Normalized frequency of turn in all-α class
Alpha-helix indices for α-proteins
Relative preference value at N' (of α helices)
Relative preference value at N'' (of α helices)
Normalized frequency of α-helix in all-α class
t-testFDR<10%
0%
10%
20%
30%
enh.gain
enh.loss
sil.gain
sil.loss%
syn
. mu
tati
on
s (w
ith
in 3
0 n
t o
f sp
lice
site
) le
adin
g to
eve
nt
Ke et al. 2012 hexamers
1.53
0.83
0.60
1.90
p = 0.02
enh.gain
enh.loss
RESCUE-ESE
p = 0.003
1.90
0.53
sil.gain
sil.loss
FAS-hex2
p = 3·10-4
0.372.73
A B C
D E
G
F
0%
10%
20%
α-helix, 1st a.a.
α-helix, middle
α-helix, last a.a.
p=0.05n.s.
n.s.
1.43
1.12
0.79
0%
10%
20%
30%
40%
50%
coil
actualsynonymousmutations
randomizedmutationpositions
0%
10%
20%
middle next tocoil only
next to β-sheet
p = 4·10-5
0.97
1.01
2.60
α-helixparts:
0%
10%
20%
30%
40%
50%
coil
H
I
To do: Make nice schematicof alpha-helix as a legend here
Use of „optimal codons” miRNA binding sites Secondary structures in mRNA
No general effect was detected in any of these cases (although they may still be important in specific examples).
Exonic Splicing Enhancer
~ and ~
Exonic Splicing Silencer
From Cartegni, Chew & Krainer. Nat Rev Genet. 2002 3(4),285-98.
Synonymous SNVs tend to be closer to splice sites in oncogenes.
They also tend to cause gains of known exonic splicing enhancer motifs, and losses of exonic splicing silencer motifs.
They more often affect exons with weaker (noncanonical) splice sites.
The exonic splicing enhancers created may resemble SF2/ASF motifs.
The ESS sites that are lost upon mutation sometimes resemble hnRNP A2/B1, H2 and A1 motifs.
Roughly ½ of the putatively causal synonymous mutations alter splicing, as evidenced by examining RNA-seq data from cancer.
We don't (yet) know what the other ½ is doing. One possibility may be affecting protein folding.
In yeast: Pechmann & Frydmann Nature Struct Mol Biol 2013
F
0%
10%
20%
α-helix, 1st a.a.
α-helix, middle
α-helix, last a.a.
p=0.05n.s.
n.s.
1.43
1.12
0.79
0%
10%
20%
30%
40%
50%
coil
actualsynonymousmutations
randomizedmutationpositions
0%
10%
20%
middle next tocoil only
next to β-sheet
p = 4·10-5
0.97
1.01
2.60
α-helixparts:
0%
10%
20%
30%
40%
50%
coil
G
H
N’’ N’ Ncap Ccap C’ C’’
α-helix
turn
-0.3 -0.2 -0.1 0 0.1 0.2
normalized difference (Glass' delta) between mutated sites in oncogenes vs. matched set
relative preference value at C-cap
normalized frequency of turn in all-α class
α-helix indices for α-proteins
relative preference value at N'
relative preference value at N''
normalized frequency of α-helix in all-α class
FDR<10%
...also in cancer: we observe an enrichment of synonymous mutations at N-termini of alpha-helices, esp. if close to beta-sheets.Suggestive of effects on folding.