Top Banner
Chapter 5 Evolution of PRNP and SPRN Chapter 5: Evolution of PRNP and SPRN 131
40

Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Jun 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Chapter 5: Evolution of PRNP and SPRN

131

Page 2: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Chapter 5: Evolution of PRNP and SPRN 5.1 Introduction

What are evolutionary trajectories of the SPRN and PRNP genes?

I analysed evolution of the PRNP and SPRN genes by means of comparisons between

mammals and fish. This analysis was based on the free genomic sequence information

available in public databases. Apart from human, I used genomic sequence from mouse,

rat, Fugu rubripes, Tetraodon nigroviridis and zebrafish for the vertebrate-wide cross-

species comparisons (Chapter 3.1.4).

I compared genomic sequences harbouring mammalian PRNP and fish homologues,

and SPRN genes from mammals and fish, respectively, together with their adjacent

genes. This analysis used both homology and non-homology criteria to assess gene

orthology (Eisen and Wu 2002; Gilligan et al., 2002) from fish to mammals. Apart from

assessing gene similarity (homology criteria), I tested whether local rearrangements

have occurred in the genomic regions (non-homology criteria).

Further, I analysed the mammalian PRNP, PRND, PRNT, and SPRN gene features in

detail. Using the human, mouse, and Fugu SPRN sequences, I performed phylogenetic

footprinting to define conserved regions, potential regulatory elements.

In public databases, I found novel fish genes related to PrP: stPrP-2 from Tetraodon,

and stPrP-3 from zebrafish. I cloned the SPRN ORF sequence coding for Sho from

Tetraodon. For Fugu, Tetraodon, carp, and zebrafish I also found in silico a duplicated

SPRN gene paralogue (SPRNB) encoding a related Shadoo2 protein (Sho2).

Dr. Lars Jermiin (University of Sydney, Australia) conducted phylogenetic analysis of

the vertebrate PrP- and Sho-protein families. Dr. Jill Gready and Prof. Jenny Graves

constructed a model to rationalize evolution of both vertebrate PRNP- and SPRN-gene

families.

132

Page 3: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

This analysis shows different evolutionary pathways for the SPRN and PRNP genes.

5.2 Discovery of SPRNB in Zebrafish, Fugu, Tetraodon, and Carp

Using the zebrafish sprn ORF sequence as search query and the BLASTN program

available from the Ensembl (v14.2.1) interactive web service I discovered the zebrafish

genomic sequence containing the Sho2 (ctg10456) coding sequence. I found the

genomic sequence containing the Fugu gene SPRNB (scaffold_96) using the zebrafish

sprnb coding sequence as search query and the BLASTN program available from the

Ensembl (v12.2.1) web genome browser. I identified the Tetraodon genomic contig

(FS_CONTIG_41464_1) containing SPRNB coding sequence using the Fugu SPRNB

ORF sequence and the BLASTN search tool from the Genoscope database. The EST

(CA964511) containing the ORF for the carp SPRNB was detected by BLASTN search

of the NCBI est_others database, using the zebrafish sprnb coding sequence as search

query.

5.3. Discovery of New Fish stPrP- and PrP-like-Coding Genes

5.3.1 Discovery of Tetraodon PrP-like and stPrP-2

I identified genomic contig FS_CONTIG_4238_2 harbouring the Tetraodon PrP-like-

and stPrP-2-coding genes by using the sequence of the Tetraodon PrP-like ORF (Suzuki

et al. 2002) as search query and the BLASTN program provided in the Genoscope web

service.

5.3.2 Discovery of Zebrafish and stPrP-3 and stPrP-1

To identify the zebrafish stPrP-1-coding gene I used nucleotide sequence from the Fugu

stPrP-1 ORF as search query and the BLASTN web service from the Ensembl zebrafish

interactive genome database (v22.3b.1). I detected coding sequence for the gene on the

ctg30140. Genomic sequence containing the zebrafish stPrP-3 gene (assembly_234,

133

Page 4: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

NA3274.1, v14.2.1) was identified by using the Fugu stPrP-2 coding sequence as search

query and the BLASTN web service from the Ensembl zebrafish genome database, as

above.

5.4 Detection of Genomic Contexts

5.4.1 Detection of PRNP Genomic Context in Human, Mouse and Rat

The PRNP genes in human (chr20p13: 4614996 - 4630236 bp), mouse (chr2F3:

132911892 - 132940089 bp), and rat (chr3Q36: 112889678 - 112890442 bp) were

found by keyword search of the Ensembl human (v12.31.1), mouse (v12.3.1), and rat

(v11.2.1) genome databases, respectively. The local genomic environment is also

evident from the interactive web genome browser, as is the annotation of the genomic

sequence.

5.4.2 Detection of Genomic Contexts of Fugu stPrP-1 and stPrP-2

I identified stPrP-1- and stPrP-2-coding gene sequences and their local genomic context

in the Fugu interactive Ensembl genome browsers (v12.2.1) by using the sequences of

their ORFs as search query (AY141106, AY188583, respectively; NCBI) and the

server’s BLASTN (Altschul et al. 1990) search tool. The local genomic environment

and annotation of the genomic sequence was evident from this web service. The genes

encoding stPrP-1 and stPrP-2 are located on the genomic scaffold_96 and scaffold_155,

respectively.

5.4.3 Detection of Genomic Context of Zebrafish stPrP-1 and stPrP-3

The local genomic environments were evident from the interactive zebrafish Ensembl

web genome browser.

134

Page 5: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

5.4.4 Detection and Assembly of Genomic Context of Tetraodon PrP-like and stPrP-2

By using the terminal 200 bp of FS_CONTIG_4238_2 harbouring the PrP-like- and

stPrP-2-coding genes as search query and local BLASTN tool I identified in the

Genoscope database overlapping genomic clone FS_CONTIG_4238_1. In addition, I

found two more overlapping clones (FS_CONTIG_24895_1 and

FS_CONTIG_31286_1) by using the same strategy. Sequences of these contigs were

merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified

this assembly using the PiPMaker program (Chapter 3.1.4) and alignment to the

orthologous Fugu genomic sequence (Chapter 5.5.3). The sequence was also annotated

by using the NIX interactive web tool (Chapter 3.1.2).

5.4.5 Detection of Genomic Context of SPRN and SPRNB

Local genomic contexts of the SPRN gene (human, mouse, rat, Fugu, zebrafish) and

SPRNB gene (Fugu, zebrafish) were immediately evident in the Ensembl interactive

genome browsers.

5.4.6 Detection and Assembly of Genomic Context of Tetraodon SPRN

I used terminal 200 bp sequence of the genomic contig FS_ CONTIG_4144_1

harbouring the SPRN gene as search query and the BLASTN program on the

Genoscope web to identify overlapping clone FS_CONTIG_31029_1. Using the same

strategy, the next overlapping clone FS_CONTIG_37429_1 was identified. I assembled

these sequences into a virtual contig of 19029 bp (Tetraodon virtual contig 2; I used 10

kb sequence containing the SPRN gene from this virtual contig in further analyses). I

verified this assembly by using the PiPMaker program (Chapter 3.1.4) to align it to the

orthologous Fugu genomic sequence (Chapters 5.5.3), and I annotated it by using the

NIX interactive web tool (Chapter 3.1.2).

135

Page 6: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

5.5 Comparative Genomic Analysis

Sequence comparison is a multistep process (Frazer et al. 2003). In order to align the

genomic sequences and identify conserved regions, I first found genes of interest in

public genomic sequences from mammals and fish (Chapters 2.6 and 4.2), and defined

the local genomic contexts.

5.5.1 Genomic Sequences Containing PRNP in Mammals and Homologous Genes

in Fish

In Figure 5.1 I present the order of genes and their relative orientation in the local

genomic regions containing the mammalian PRNP gene and fish genes encoding stPrP-

1, stPrP-2, stPrP-3, and PrP-like proteins. In mammals, the genes adjacent to PRNP and

PRND are RASSF2 and SLC23A1. The PRNT gene is present in humans but not rodents

(Figure 5.1A-B; see also below). In pufferfish, the genes encoding stPrP-2 and PrP-like

are also adjacent to the Rassf2- and Slc23a1-coding genes (Figure 5.1C).

In contrast, the stPrP-1-coding genes from Fugu and zebrafish are located in different

genomic environments (Figure 5.1D-E). In the Fugu genome, stPrP-1 is flanked distally

by the TA-PPC2 (T-cell activation protein phosphatase 2C) and EPI-64 (epi64 protein).

Proximally, it is arranged head-to-head with a paralogue of the SPRN, SPRNB. The

SPRNB, in turn, is flanked by KCNIP3 (calsenilin). In the zebrafish genome, I found the

stPrP-1 also adjacent to sprnb in a head-to-head relative orientation. However, these

two genes are flanked by genes proximal (rassf2 encoding Rassf2) and distal (slk

encoding STE20-like kinase) different from those in Fugu, indicating a different

genomic environment. The proximity in fish of the two genes, stPrP-1 and SPRNB, is

suggestive of an evolutionary relationship between the genes. Further, the presence of

the Rassf2-coding gene adjacent to sprnb is of interest, as the RASSF2 homologue is

also adjacent to the mammalian PRND/PRNT.

136

Page 7: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.1: Overview of the genomic contexts of the PRNP gene in mammals, and stPrP-1-, stPrP-2-, stPrP-3- and PrP-like-coding genes in fish. Figure is approximately to scale as shown by rulers. PRND, doppel gene; PRNT, PRNT gene; RASSF2, Ras association domain family 2 gene; SLC23A1, Solute carrier 23, member 2 gene; KCNIP3, calsenilin gene; SPRNB, Shadoo2 gene; stPrP-1, stPrP-1 gene; TA-PPC2, T-cell activation protein phosphatase 2C gene; EPI-64, Epi64 protein gene; rassf2, Ras association domain family 2 gene; slk, STE20-like kinase gene. For ruler under B., gene sizes and intergenic distances refer to mouse; for ruler under C. and D., gene sizes and intergenic distances refer to Fugu. Genomic coordinates roughly correspond to those used for the cross-species Vista analysis (Chapter 3.1.4).

136a

Page 8: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

I found another stPrP-like gene (stPrP-3) in zebrafish in the genomic contig NA3274.1

(Ensembl). Its proximal flanking gene is unknown, while its distal flanking genes code

for green cone photoreceptor (gcp) and annexin a6 (anxa6) (not shown).

5.5.2 Genomic Sequences Containing SPRN in Mammals and Fish

The genomic environment of the SPRN gene (Chapter 4.6), updated with the Tetraodon

data, is summarized in Figure 5.2. Genes adjacent to SPRN both in mammals and two

pufferfish are those encoding the GTP-binding protein (GTP) and amine oxidase (AO);

in zebrafish, the most proximal adjacent gene is the long-chain fatty-acyl elongase-

coding gene (fae) rather than the AO.

5.5.3 Annotation of Tetraodon Genomic Sequences Containing stPrP-2 and SPRN

To verify assembly of the Tetraodon virtual contig 1, containing stPrP-2-coding gene

and its neighbour genes (Chapter 5.4.2), and to assess its validity for the comparative

genomic analysis, I aligned it with its orthologous Fugu genomic sequence using the

PiPMaker program (Chapter 3.1.4), which is able to compare both complete and

incomplete sequences. The PiP plot of the Fugu and Tetraodon genomic sequences is

given in Figure 5.3A, and its dot plot in Figure 5.3B. Among the four genes in this

genomic fragment, the exon-intron structure is known from comparison of the genomic

and cDNA sequences only for the PrP-like-coding gene. The single-exon ORF of the

stPrP-2-coding gene and GenScan predictions for Rassf2- and Slc23a1-coding gene

exons are also shown. The PiP plot indicates high conservation in predicted exons for

all genes (above 70% identity).

Second, using the PiPMaker program (Chapter 3.1.4) I aligned 10 kb of the Tetraodon

virtual contig 2, containing SPRN and adjacent genes, with its orthologous Fugu

sequence. The PiP plot is shown in Figure 5.3C, and the dot plot is in Figure 5.3D. The

GenScan exon predictions for the single-exon SPRN ORF and for genes encoding amine

oxidase and GTP-binding protein are shown. The PiP shows high conservation in exons

of the three genes (above 70% identity).

137

Page 9: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.2: Overview of the local genomic contexts of the SPRN gene in mammals and fish. AO, Amine oxidase gene; GTP, GTP-binding protein gene; FAE, long-chain fatty-acyl elongase gene. Figure is approximately to scale as shown by rulers. For ruler under D., gene sizes and intergenic distances refer to Fugu. Genomic coordinates roughly correspond to those used for the cross-species Vista analysis (Chapter 3.1.4).

137a

Page 10: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.3: PipMaker percent identity (Pip) and dot plots of the genomic fragments containing the stPrP-2- and PrP-like-coding, and SPRN genes in Fugu and Tetraodon. (A) Pip plot of stPrP-2- and PrP-like-coding genes and in (B) corresponding dot plot. (C) Pip plot of SPRN in Fugu and Tetraodon and in (D) Corresponding dot plot. Fugu sequence is shown along x-axis in A-D. Percentage of identity (50-100%) is shown on the y-axis in A and C, and Tetraodon sequence in B and D. Location of exons and directionality of genes is shown as black (coding) and grey (UTR) boxes, and horizontal arrows, respectively. Exons are numbered. Short dark grey and white boxes denote CpG islands with ratio 0.75 and 0.6-0.75, respectively. RASSF2, Ras association domain family 2 gene; SLC23A1, Solute carrier 23, member 2 gene; AO, Amine oxidase gene; GTP, GTP-binding-protein gene.

137b

Page 11: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

5.5.4 Annotation of PRNP and SPRN

Annotation of genomic DNA sequence comprises analysis of gene order and relative

transcriptional orientation, gene structure, gene density, distribution of repeat elements,

and distribution of GC islands. This information for the human and mouse genomes

could be compiled from the interactive Ensembl genome browsers. However, the rat,

Fugu and zebrafish genome annotations (Ensembl) are much less comprehensive

because the number of transcript libraries from these species is limited. Further, the

depth of fish transposable element analysis is less than that of primates and rodents

(Aparicio et al. 2002). In the following section, I annotate the human and mouse PRNP,

PRND, PRNT and SPRN genes, and their local genomic contexts. Analyses of the rat

genes and genomic regions were included where possible.

5.5.4.1 Gene Structure, Gene Features, Gene Density and CpG Islands

I found that gene density and GC content are higher in the SPRN genomic environment

than in the PRNP environment (the genomic environments described in this chapter

correspond to those presented in Figures 5.1 and 5.2 but include also proximal

intergenic sequences of PRNP and SPRN). There are three genes in 51425 bp of the

human SPRN gene context, which is 50.66% GC rich, compared with five genes in

380074 bp of the human PRNP local genomic environment, which is 45.02 % GC rich.

The same counts for rodents where three genes in the mouse SPRN genomic

environment of 36116 bp (GC level: 48.36 %) and rat SPRN genomic environment of

34399 bp (GC level: 47.69 %) compares with four genes in the mouse PRNP local

genomic context of 241524 bp (GC level: 45.70 %) and rat PRNP local genomic

context of 218689 bp (GC level: 46.22 %).

Gene structure, gene size, GC content and features of exons and introns of the human

and mouse PRNP, PRND, PRNT, and SPRN genes are summarized in Table 5.1.

138

Page 12: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Table 5.1: Features of PRNP, PRND, and SPRN gene in human and mouse, and PRNT gene in human

Species Gene Length (bp) GC (%) Human PRNP a 15241 41.8 Exon 1 159 77.99 Intron 12697 40.89 Exon 2* 2380 44.16 PRND b 6549 46.37 Exon 1 60 65 Intron 1 2528 47.31 Exon 2 611 60.72 Intron 2 1510 43.44 Exon 3* 1840 42.12 PRNT c 9387 43.26 Exon 1 69 59.42 Intron 7514 42.09 Exon 2* 1804 47.51 SPRN d 3913 66.06 Exon 1 101 78.22 Intron 779 70.86 Exon 2* 3033 64.42 Mouse Prnp e 28198 44.85 Exon 1 47 65.95 Intron 1 2191 44.45 Exon 2 98 45.92 Intron 2 23854 33.50 Exon 3* 2008 49.25 Prnd f 5269 47.33 Exon 1 54 57.41 Intron 2093 46.49 Exon 2 3122 47.73 Sprn g 2203 57.69 Exon 1 148 61.74 Intron 876 56.62 Exon 2* 1178 57.98

* denotes coding exon. Transcripts used in analysis: a, OTTHUMT00020000691 (Ensembl human v12.31.1); b, OTTHUMT00020000599 (Ensembl human v12.31.1); c, OTTHUMT00020000595 (Ensembl human v12.31.1); d, BC040198 (NCBI); e, ENSMUST00000040877 (Ensembl mouse v12.3.1); f, AF192384 (NCBI); g, C630041J07 (FANTOM).

138a

Page 13: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

In all four genes, PRNP, PRND, PRNT, and SPRN, the ORF is contained within a single

coding exon. The lengths of the genes correlate inversely with GC content, which is

higher in the exons. The GC content of the human and mouse SPRN is higher than those

of PRNP, PRND, and PRNT.

CpG islands are genomic fragments of exceptionally high GC content, typically a few

hundreds bp long. The CpG islands are associated with the 5’ ends of housekeeping

genes and, being nonmethylated, they are not subject to mutational entropy. I

determined the distribution of CpG islands in the mammalian genomic sequences by

using the cpgplot program (Chapter 3.1.2). Results for the human and mouse PRNP,

PRND, PRNT and SPRN genes are shown in Table 5.2.

5.5.4.2 Distribution of Transposable Elements in PRNP and SPRN

The depth of the repeat analysis using the RepeatMasker program (Chapter 3.1.2) is

150-200 and 100-120 million years ago for transposable elements in human and mouse,

respectively (Chapter 2.6). Transposon contents of the PRNP, PRND, PRNT and SPRN

genes are shown in Table 5.3. Whereas the SPRN contains no transposable elements, the

PRNP, PRND, PRNT accumulate repeats.

Analysis of the distribution of interspersed repeat elements in the local genomic

environments of the human, mouse, and rat PRNP and SPRN genes indicated no major

differences (not shown).

5.5.5 Cross-Species Comparisons

After I analysed genes and genomic sequences, I aligned the genomic sequences from

mammals and fish. Global alignments are particularly useful to detect conserved regions

in the long contiguous sequences (Frazer et al. 2003). I used the VISTA global

alignment tool (Chapter 3.1.4) for this purpose.

139

Page 14: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Table 5.2: CpG island distribution in human and mouse PRNP, PRND, PRNT, and SPRN gene

Species Gene Length (bp) Begina Enda

PRNP 1144 -207 937 PRND PRNT

- -

Human

SPRN 284 -12 272 Prnp 317 -191 126 Prnd -

Mouse

Sprn 195 -1 194

a begin and end denote genomic sequence positions relative to transcription start.

139a

Page 15: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Table 5.3: Summary of transposable elements content in human and mouse PRNP, PRND, PRNT, and SPRN gene

Species Gene Length (bp) SINE % LINE % LTR % DNA % Total % Human PRNP 15241 4.6 40.7 0 0.9 46.2

PRND 6549 18.5 5.6 0 0 24.1 PRNT 9387 21.4 4.1 3.8 4.8 34.1 SPRN 3912 0 0 0 0 0

Mouse Prnp 28198 6 3.6 25.3 0.4 35.3 Prnd 5269 5.1 0 0 3.9 9 Sprn 2203 0 0 0 0 0

139b

Page 16: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

5.5.5.1 Comparisons of PRNP Genomic Region

A VISTA plot of the PRNP genomic region is shown in Figure 5.4. There is

conservation in both coding and noncoding sequences of the PRNP, PRND, RASSF2,

and SLC23A1 genes between human and rodents. Conservation is evident in the exons,

and there are a few highly conserved regions in the introns as well.

The VISTA plot indicates that this conservation does not count for fish stPrP-2- and

PrP-like-coding genes, which do not align with human. In contrast, the adjacent

RASSF2 and SLC23A1 align in the coding exons. There is also some evidence that

rearrangements in the local genomic sequences occurred since divergence of mammals

and fish: whereas the PRND gene (and PRNT gene in human) exists in mammals but

not in fish, the PrP-like-coding gene is present in fish only.

I observed that conservation with rodents in the human PRNT gene region differs from

that of other genes shown in the plot. There is almost no conservation between rat and

human (none at all in exons), and conservation between human and mouse appears

poor. To test this observation I aligned human and mouse genomic sequence regions

between the PRND and RASSF2 genes using the PipMaker program (Chapter 3.1.4).

The results presented in Figures 5.5A and 5.5B agree with the Vista results in showing

lack of conservation of PRNT gene exons between human and mouse.

5.5.5.2 Comparisons of SPRN Genomic Region

A VISTA plot of the mammalian and fish genomic regions containing the SPRN and

adjacent genes shows conservation in all three genes (Figure 5.6). The coding exon

sequence of SPRN aligns in all five pairwise alignments.

The coding exons of the GTP-binding protein-coding gene are in general conserved

between mammals and fish. The large gap in the alignment (~24.5-39 kb in human

sequence) is due to the insertion of the LINE elements in human only (two complete

elements in antisense orientation and two truncated human LINE/L1 elements). The

140

Page 17: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN Chapter 5 Evolution of PRNP and SPRN

140a

140a

Page 18: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.4: VISTA plot showing peaks of similarity in pairwise sequence alignments between: 1, human vs. mouse; 2, human vs. rat; 3, human vs. Fugu; and 4, human vs. Tetraodon. PRNP, prion protein gene; PRND, doppel gene; PRNT, PRNT gene; RASSF2, ras association domain family 2 gene; SLC23A1, solute carrier 23, member 2 gene. Peaks are shown relative to their position in the reference (human) sequence (horizontal axis) and their percent identities (30-100%) are indicated on the vertical axis. For the reference sequence, the direction of gene transcription is indicated by a horizontal arrow, blue rectangles denote coding exons, and light blue rectangles indicate 5’ and 3’ untranslated regions. CNS, conserved non-coding sequence. Conservation of CNS (pink), UTR (light blue), and coding (blue) sequences fitting the experimental cut-off (50% over 50 bp) is indicated.

140b

Page 19: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.5: Analysis of human-mouse conservation in the genomic region of the human PRNT gene. (A) PipMaker plot of the human and mouse genomic sequence between the PRND and RASSF2 genes. Human sequence is along horizontal axis; percentage of identity (50-100%) is on the vertical axis. The location of exons and directionality of genes are shown as black (coding) and grey (UTR) boxes, and horizontal arrows, respectively. Other icons show repeats (LINE1, grey pointed boxes; LINE2, black pointed boxes; LTR, dark grey pointed boxes; SINEs other than MIR, light grey triangles; MIR, black triangles; other repeats, dark grey triangles). Short dark grey and white boxes denote CpG islands with ratio 0.75 and 0.6-0.75, respectively. (B) Corresponding dot plot.

140c

Page 20: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.6: VISTA plot showing peaks of similarity in pairwise sequence alignments between: 1, human vs. mouse; 2, human vs. rat; 3, human vs. Fugu; 4, human vs. Tetraodon; and 5, human vs. zebrafish. SPRN, SPRN gene; GTP, GTP-binding protein gene; AO, amine oxidase gene. Peaks are shown relative to their position in the reference (human) sequence (horizontal axis) and their percent identities (30-100%) are indicated on vertical axis. For the reference sequence, the direction of gene transcription is indicated by a horizontal arrow, blue rectangles denote coding exons, and light blue rectangles indicate 5’ and 3’ untranslated regions. CNS, conserved non-coding sequence. Conservation of CNS (pink), UTR (light blue), and coding (blue) sequences fitting the experimental cut-off (50% over 50 bp) is indicated.

140d

Page 21: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

distal end of the GTP-binding protein-coding gene overlaps with the distal end of the

SPRN gene in human. There are four polyadenylation signals in the human GTP-

binding protein-coding gene, resulting in alternative transcription of the non-coding part

of its 3' terminal exon. All four sites differ in one position from canonical consensus

polyadenylation signals AATAAA and ATTAAA. The sequence of the first (41227 -

41232 bp; for AK095872, BC00409, BC000920 transcripts), second (41262 - 41267 bp;

for BC026725 transcript), and third (41321 - 41326 bp; for BC035721 transcript) signal

is GTTAAA. The most distal fourth signal sequence of GTP-binding protein-coding

gene, which overlaps with the 3' end of the SPRN gene, is AATCAA (42068 - 42072

bp; for cDNAs AK074976, NM_138384). The sequence of the single polyadenylation

signal site for SPRN is canonical consensus AATAAA (41449 - 41454 bp; for

BC040198 transcript).

The exons of the third gene encoding amine oxidase are conserved between mammals

and pufferfish, but not zebrafish. In the zebrafish sequence, the third gene is for the

long-chain fatty-acyl synthetase (Figure 5.2).

5.5.6 Phylogenetic Footprinting of SPRN

In the following step, I tried to find the vertebrate-wide conserved sequences in SPRN,

potential regulatory elements. I used the program Footprinter (Chapter 3.1.5) that

reports sets of conserved motifs, taking into account a phylogenetic tree relating the

input species.

I identified 16 conserved motifs upstream of the SPRN ORF, in the intron, exon 1, and

upstream promoter (Table 5.4). In human and mouse, five motifs were detected in the

upstream promoter, one in the exon 1, and ten in the intron (Figure 5.7). Some motifs

are duplicated.

I next checked whether any known transcription factor-binding sites were among these

detected motifs using the MatInspector program (Chapter 3.1.5) for human and mouse.

In the human and mouse sequences, 155 and 159 likely transcription factor-binding

141

Page 22: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Table 5.4: Conserved motifs in human, mouse, and Fugu SPRN gene identified by phylogenetic footprinting

Motif (length in bp) and localization in human / mouse SPRN a, b, c

Coordinates in human / mouse / Fugu SPRN relative to ORF

Score g

1. TGGAGA (6) P / P -1290 -1301 -1143 0 2. GCTGAG (6) E1 / P -890 -1049 -886 0 3. GCTGAA (6) I / I -658 -574 -776 0 4. TCCAGA (6) d I / I TCCAGA (6)

-466 -434 -303 - -543 -

0

5. GAACCC (6) I / I GAACCC (6)

-242 -217 -253 - - -218

0

6. ATCTCC (6) I / I -58 -276 -63 0 7. CTTTCC (6) I / I -36 -292 -100 0 8. GAgAGCCA (8) P / P AGAGCCA (7)

-1411 -1479 -1453 - - -1224

1

9. TGaAACAA (8) P / P -1346 -1262 -1500 1 10. GGAGGCcT (8) P / P GGAGGC (6)

-933 -1011 -927 -1027 - -

1

11. GGAGGcTG (8) I / I AGGCTG (6) GATGCtG (7)

-782 -826 -780 - -517 -508 -732 -863 -836

1

12. CCAgCCAG (8) I / I -668 -786 -759 1 13. CAGGCCTaA (9) I / I AGGCCTGA (8) CCAGGCCT (8) GTCCTAA (7)

-202 -388 -209 -304 - - -161 - - - -76 -

1

14. GcGTGCAgAG (10) e I / I TGCAGA (6) TGCACA (6)

-361 -423 -291 - - -319 -261 -54 -

2

15. TGgGGCTaGT (10) P / P TGTGGCT (7)

-1237 -1230 -1351 - -1102 -

2

16. CCCCttCAGGCCT (13) f I / I CCCc/gAGGCTT (10) CCCCCAGGCC (10) CCCAGGC (7)

-166 -394 -215 -344 - -560 -206 - - -488, -520 - -

2

Localization corresponds to position in human and mouse SPRN intron (I), exon 1 (E1), and upstream promoter (P). a Capital letters, conserved position in motif; lower case letters, variable position in motif. b Motif duplications were identified manually after inspection of results. c TRANSFAC matrix sequences identifying potential transcription factor-binding sites were included in MatInspector program analysis: motif had to be detected both in human and mouse sequences. d, Motif denotes a part of the V$NBRE.01 matrix sequence for Nurr1; e, Motif denotes a part of the V$ATF6.02 matrix sequence for Activating transcription factor 6. f, Motif denotes a part of the V$MAZR.01 matrix sequence for MYC-associated zinc finger protein related transcription factor. g, Parsimony score corresponds to main motif.

141a

Page 23: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.7: Potential regulatory motifs in human SPRN and mouse Sprn genes identified by phylogenetic footprinting. Motifs 4, 14 and 16, labelled by *, denote potential nurr1, ATF6 (Activating Transcription Factor 6) and MAZR (MYC-Associated Zinc-finger-protein-Related) transcription factor-binding sites, respectively.

141b

Page 24: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

sites were found (69 or 82 in the intron, 11 or 14 in exon 1, and 75 or 62 in the upstream

promoter; not shown).

However, only three of the motifs detected by phylogenetic footprinting correspond to

predicted transcription-factor binding sites. All three motifs were detected in the human

and mouse SPRN intron and are present in the same relative order.

Motif 4 denotes the binding site for the nuclear receptor transcription factor nurr1.

Motif 14 corresponds to the Activating Transcription Factor 6 (ATF6)-binding site. A

part of the third conserved motif 16 binds the MYC-Associated Zinc-finger-protein-

Related transcription factor (MAZR).

5.5.7 PrP and Sho Protein Families: from Fish to Mammals

Fish PrP homologues are PrP-like, stPrP-1, stPrP-2 and stPrP-3, and fish Sho

homologue is Sho2. I expanded the dataset of fish proteins that belong to the PrP- and

Sho-families.

Firstly, I found a sequence of ORF encoding the Tetraodon stPrP-2 395 amino acids

(BN000527; EMBL) in the Tetraodon virtual contig 1. Secondly, I translated the

zebrafish stPrP-3 (BN000526; EMBL) from the Ensembl genomic contig NA3274.1

into a protein of 561 amino acids.

Using the Tetraodon virtual contig 2 sequence to design PCR primers, I cloned and

sequenced the Tetraodon SPRN ORF (AJ717305; EMBL; Chapter 3.2.1) and deduced a

155-residue protein, thus adding a third fish Sho sequence to those for zebrafish

(AJ490525, EMBL) and Fugu (BN000521, EMBL).

I discovered a new Sho-related protein, its Shadoo2 (Sho2) paralogue, also from

sequences deposited in public databases. I deduced the Sho2 protein sequences of 150,

150 and 135 amino acids, from the genomic information for Fugu SPRNB ORF

(BN000522, EMBL), Tetraodon SPRNB ORF (BN000525, EMBL) and zebrafish

142

Page 25: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

SPRNB ORF (BN000523, EMBL). The carp Sho2 ORF (BN000524, EMBL) was

conceptually translated from the EST CA964511 (NCBI) to give a protein of 145 amino

acids. Together with the human, mouse and rat Shos (BN000518, BN000519 and

BN000520; EMBL), this new protein family now has 10 members.

The model for the general structures of these expanded sets of Sho- and PrP-related

proteins determines four protein regions (Chapters 2.1 and 4.4): the basic region 1, the

repeat or low complexity sequence region 2, the hydrophobic region 3, and the C-

terminal region 4 (Figure 5.8).

5.5.8 Phylogenetic Analysis

Dr. Lars Jermiin (University of Sydney, Australia) conducted phylogenetic analysis of

the two protein families using the MOLPHY and PrtoML programs (Adachi and

Hasegawa, 1996). The trees are shown in Figures 5.9 and 5.10.

The analysis of the PrP-related sequence set identified a single most likely tree, shown

in Figure 5.9, and 281 near optimal trees, none of which differed significantly from the

most likely tree. The most likely tree groups the human, chicken, turtle and frog PrP

sequences together to the exclusion of all fish sequences. The total tree length is 5.65,

implying that every site in the alignment (not shown) has changed on the average 5.65

times. This in turn implies that interpretation of the tree must be done with some

caution, due to several low local bootstrap probability (LBP) and relative likelihood

score (RLS) values. However, finding of only 282 "good" trees (i.e. the most likely tree

and near optimal trees) out of 34,459,425 possible trees permits some confidence in the

result.

The analysis of the Shadoo protein family identified a single most likely tree (Figure

5.10) and 48 near optimal trees, none of which differed significantly from the most

likely tree. The single most likely tree indicates that Shos and Shos2s lie on two

separate branches. The total tree length is 4.77, implying that every site in the alignment

143

Page 26: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

143a

Figure 5.8: Overall structures of PrPs, PrP-related proteins from fish (stPrP-2, stPrP-1, stPrP-3, PrP-like) and Sho proteins. Numbers indicate the first residue of each section, and last one of each protein. S, signal sequence; B, basic region; R/PGH, PGH-rich repeats; H, hydrophobic region; N, N-glycosylation site; S-S, disulphide bond; GPI, glycophosphatidylinositol anchor; B,R, basic repeats; GY, GY-rich region; B,R/RG, RG-rich repeats. A, F, numbers refer to human; E, numbers refer to Fugu; G, H, numbers refer to zebrafish.

Page 27: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.9: The most likely tree based on phylogenetic analysis of the PrP protein family. ChPrP, chicken PrP; TuPrP, turtle PrP; XePrP, Xenopus laevis PrP, HuPrP, human PrP; FustPrP-2, Fugu stPrP-2; TestPrP-2, Tetraodon stPrP-2; FustPrP-1, Fugu stPrP-1; ZestPrP-3, zebrafish stPrP-3; ZePrP-like, zebrafish PrP-like; TePrP-like, Tetraodon PrP-like, FuPrP-like, Fugu PrP-like. Local bootstrap probabilities (LBP) are listed above the edges and relative likelihood scores (RLS) are listed below the edges. The error bar at bottom corresponds to 1.0 substitution per site.

143b

Page 28: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.10: The most likely tree (and the consensus tree) based on phylogenetic analysis of the Sho protein family. CaSho2, carp Sho2; ZeSho2, zebrafish Sho2; FuSho2, Fugu Sho2, TeSho2, Tetraodon Sho2; MoSho, mouse Sho; RaSho, rat Sho; HuSho, human Sho; ZeSho, zebrafish Sho; FuSho, Fugu Sho; TeSho, Tetraodon Sho. Local bootstrap probabilities (LBP) are listed above the edges and relative likelihood scores (RLS) are listed below the edges. The error bar at bottom corresponds to 1.0 substitution per site.

143c

Page 29: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

(not shown) has changed on the average 4.77 times. Again, this implies caution must be

applied in interpretation of the tree, this need being reflected also in some low LBP and

RLS values. However, again only 49 "good" trees (i.e., the most likely tree and the near

optimal trees) were found out of 2,027,025 possible trees, setting the result in a better

light.

The discoveries of the novel genes and proteins permitted a number of analyses, ranging

from genome comparisons to phylogenetic studies. I combine all these analyses to

define evolutionary trajectories of the PRNP and SPRN genes.

5.6 Discussion

I firstly discuss features of the genomic sequences and the outcome of cross-species

analyses. Secondly, I comment on the new protein set characteristics, and on the

phylogenetic analysis. Finally, I infer evolution of the PRNP and SPRN genes.

5.6.1 Genomic Sequences Containing PRNP and SPRN in Mammals and Related

Genes in Fish

My analysis indicates different evolution of the local genomic regions containing PRNP

and SPRN genes.

The mammalian and fish genome regions containing PRNP and its homologues differ.

Neither PRND nor PRNT genes were detected in fish nor were PrP-like-coding genes

found in mammals. The stPrP-2 shares its position and relative orientation with respect

to the adjacent genes with mammalian PRNP, suggesting an evolutionary relationship.

Yet, the phylogenetic analysis (Chapter 5.5.8) indicated that the stPrP-2 shares an

ancestral gene with the fish gene encoding stPrP-1, and that this gene duplication

occurred after the evolutionary separation of fish and mammals. The mammalian and

fish sequences have also diverged beyond recognition in the comparative genomic

analysis (Chapter 5.5.5). I concluded therefore that the mammalian PRNP and fish

stPrP-2 are not orthologous.

144

Page 30: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

On the other hand, the local gene order and relative orientations in the SPRN local

genomic contexts are conserved between fish and mammals. The genes also aligned in

the comparative genomic analysis (Chapter 5.5.5) and clustered together in the

phylogenetic tree (Chapter 5.5.8). These observations indicate orthology between the

mammalian and fish SPRN.

5.6.2 Annotation of Tetraodon Genomic Sequences

As the Tetraodon sequence reads (Genoscope) were assembled into the two virtual

contigs, it was important to test and verify these assemblies.

The dot plots (Figure 5.3B and D) indicate that both Tetraodon virtual contig 1 and

Tetraodon virtual contig 2 were assembled in an order consistent with the orthologous

Fugu genomic sequences and are valid for comparative analysis.

In the Pip-plot showing alignment of the Tetraodon virtual contig 2 and Fugu sequence

(Figure 5.3), conservation of the sequences proximal (~4.5 kb) and distal (~6 kb) to the

GTP-binding-protein-coding gene may denote exons not recognised by GenScan

prediction (Chapter 2.6.6). Indeed, eight out of ten human GTP-binding-protein-coding

gene exons aligned between human and fish in my cross-species comparisons (Figure

5.6) but only three exons were predicted by the GenScan.

Thus, it is possible to assemble Tetraodon genomic sequence reads correctly into larger

contigs and use them in cross-species analysis.

5.6.3 Annotation of PRNP and SPRN Genes

There are both similarities and differences between the mammalian PRNP and SPRN

genes.

145

Page 31: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

The gene structure is similar for the PRNP and SPRN genes. Both genes have one or

two non-coding exons, one or two introns and a 3’-terminal exon harbouring the

complete ORF. However, the GC content of mammalian SPRN is much higher than that

of PRNP. The mammalian GC content is known to vary genome-wide at different scales

(Chapter 2.6). The mammalian genomic DNA tends to evolve more AT-rich sequence,

so the higher GC content in SPRN may indicate stronger evolutionary pressure acting

on the gene.

The promoters of both PRNP and SPRN are associated with CpG islands. By contrast, I

found no CpG islands in the PRND or PRNT gene promoters, as already shown by

Comincini et al. (2001) and Makrinou et al. (2002). This gene feature is therefore more

similar between the PRNP and SPRN than between the PRNP and PRND.

I showed striking differences in the transposable-element content between PRNP and

SPRN. The PRNP gene has been expanding independently in all lineages since the

mammalian radiation by insertions of numerous transposable elements (Lee et al., 1998

and Chapter 6.4.3). In striking contrast, the short and GC rich SPRN is utterly devoid of

the transposable elements, again suggesting stronger selective pressure acting on the

gene.

The frequency of fixation of transposable elements is known to vary genome wide

(Chapter 2.6). There is also a strong correlation between divergence in non-coding

DNA and the amount of repetitive DNA (Chiaromonte et al., 2001): “flexible” genomic

regions accumulate many changes while “rigid” regions accumulate fewer. Rigidity of

sequence may reflect strong selection on a large number of gene regulatory elements

(Lander et al. 2001), or, alternatively, may be determined by the local genomic mutation

rate (Chiaromonte et al. 2001). SPRN is the only gene lacking repeats in its local

genomic environment, indicating that strong selection acting on the gene has prevented

integration of transposable elements. Conversely, PRNP's flexibility and “promiscuity”

for accepting repeat insertions suggests a more relaxed evolutionary history of the gene.

146

Page 32: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Thus, the features of the SPRN gene could suggest that it evolves more conservatively

than PRNP. The other analyses indeed confirm this assumption.

5.6.4 Cross-Species Comparisons

Orthologous genes could usually be aligned and recognized by comparative genomic

analysis. Between evolutionarily more distant species, such as mammals and fish, it is

the coding regions that are primarily recognizable (Frazer et al. 2003). However, where

rapid divergence of nucleotide sequence, indels, and gene loss or acquisition has

occurred, coding sequences cannot readily be aligned (Kellis et al. 2003). The analysis

of Thomas et al. (2003) showed that almost one third of human coding sequences did

not align with corresponding fish sequences.

The coding exons of the human PRNP and fish stPrP-2 and PrP-like did not align

(Figure 5.4), indicating divergence of their sequences beyond detectable conservation.

Further, there is also evidence of rearrangements in the local genomic encironments

since divergence of mammals and fish. Thus, neither homology criteria nor non-

homology criteria for gene orthology are fulfilled between human (mammal) PRNP and

fish homologues.

On the other hand, the SPRN gene aligned between mammals and fish (Figure 5.6)

satisfying homology criteria for gene orthology. Conserved contiguity between

mammals and fish indicates that no rearrangement occurred in this genomic fragment

after the evolutionary divergence of fish and mammals 450 million years ago (non-

homology criteria for gene orthology). Thus, the SPRN gene is likely to be orthologous

between mammals and fish. This is also supported by phylogenetic analysis of SPRN.

The functional significance of the overlapping of the 3’UTRs between SPRN and GTP-

binding protein-coding genes is unclear. A few other examples of such anti-parallel

overlapping of untranslated exons of functional genes have been reported (Miyajima et

al. 1989; Batshake and Sundelin 1996; Dan et al. 2002). Untranslated gene fragments

147

Page 33: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

may contain regulatory sequences that affect mRNA stability and translation efficiency

(Beaudoing and Gautheret 2001; Chapter 6.5).

There is no conservation of the PRNT between human and rodents. When I translated

the exon 2 of the (human) PRNT gene (Figure 5.5), I detected two potential ORFs

encoding proteins of 60 and 94 residues, and also several smaller ORFs (not shown).

Makrinou et al. (2002) reported the transcription of PRNT exclusively in testis, as for

the PRND gene, and noted 50% similarity and 42% identity between the potential 94-

residue protein and human Dpl. The presence of disrupted ORFs suggests that human

PRNT is a pseudogene, appearing originally from duplication of PRND. Pseudogenes

are remnants of duplicate genes arising either from tandem duplication or

retrotransposition. While their sequences show similarities to coding regions of known

proteins, they have acquired many stop codons or frameshifts so that they no longer

code for full-length protein. They are usually not transcribed but may be occasionally

resurrected (Harrison and Gerstein 2002), or may acquire additional functions such as a

specific regulatory role (Hirotsune et al. 2003). In the intron of the PRNT pseudogene I

also detected a processed pseudogene sequence with high homology to the mRNA for

isopentenyl-diphosphate delta isomerase 1, disrupted by an Alu insertion. It is likely

that the PRNT pseudogene appeared in the human lineage after the evolutionary split

with rodents but it is also possible that the PRND duplication is more ancient and that

PRNT survives as a pseudogene in other mammalian lineages as well as human, but has

been deleted in rodents.

The cross-species comparison indicates that there is no orthology between the human

PRNP gene and its fish homologues. However, the SPRN gene from human and fish

could be othologous.

5.6.5 Phylogenetic Footprinting of SPRN

The set of motifs found by phylogenetic footprinting of the human, mouse and Fugu

SPRN (Table 5.4. and Figure 5.7) represents candidate regulatory regions among which

148

Page 34: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

the motifs predicted to bind ATF6, Nurr1 and MAZR are conserved in human, mouse

and Fugu.

The whole set of conserved motifs may contain false positives, but the FootPrinter

program may also miss motifs present in a single species, shorter motifs present in

multiple species, motifs containing indels, motifs that fail to meet statistical

significance, and dimers with variable internal sequences (Blanchette and Tompa 2002).

The SPRN intron is devoid of transposable elements indicating that selection acts

against integration of transposons in the intron, and suggesting its importance for

regulation of gene activity.

Nurr1 is expressed in mammalian brain and plays an important role in coordinate

neuroendocrine regulation of activity of the hypothalamic/pituitary/adrenal axis

(Murphy and Conneely 1997). Its expression could be induced also in peripheral tissues.

In brain, it is critical for dopaminergic neuron development by activating tyrosine

hydroxylase transcription in a cell-context dependent manner (Kim et al. 2003).

Aberrations in the dopaminergic system are associated with Parkinson's disease and

schizophrenia.

ATF6 is a member of the basic leucine-zipper family of transcription factors. It is

strongly induced by the endoplasmic reticulum (ER) stress response that induces

transcription of genes encoding molecular chaperones and folding enzymes located in

the endoplasmic reticulum (Chapter 1.4.1). Some genes in this pathway are directly

activated by ATF6. Upstream to ATF6 in the ER-stress response is IRE1 (Wang et al.

2000). The ER-stress response pathway is involved in familial Alzheimers disease

(FAD) pathogenesis. For instance, the FAD-linked PS1 mutants attenuate

autophosphorylation of IRE1 and lead to impaired induction of the ER-stress response.

These mutants also attenuate the ATF6 signalling pathway (Kudo et al. 2002).

Remarkably, much evidence indicates involvement of chaperones in the PrP pathogenic

transformations (Chapter 1.2.4).

149

Page 35: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

MAZR interacts with Bach2, a B-cell and neuron-specific transcription repressor

(Kobayashi et al. 2000).

The most exciting outcome of this analysis is a possibility that the SPRN may be

involved in the ER-stress response (Chapter 1.4.1). Chaperones and enzymes involved

in the ER-stress response give misfolded proteins second chance to fold properly, and

they are also involved in pathogenesis of protein folding diseases. Genetic evidence

supports involvement of the molecular chaperone (protein X) in prion disease

pathogenesis (Chapter 1.2.4). Can one speculate that, in fact, Sho may be the protein X

(Chapter 7)?

5.6.6 New Members of PrP and Sho Protein Families

The new collection of PrPs, Shos and their homologues permits novel insights about

evolution of their sequences.

The most characteristic feature of all mammalian and fish members of the PrP and Sho

protein families is a stretch of hydrophobic amino acids in the middle of the protein

(Figure 5.8), which is essential for both the function and pathogenic transformation of

mammalian PrP (Prusiner 1998).

Fish proteins from the stPrP set (FustPrP-2, 424 residues; TestPrP-2, 395; FustPrP-1,

461; ZestPrP-3, 561) are much longer than tetrapod PrPs (frog, 216 residues; turtle, 270,

chicken, 273; human, 253) or fish PrP-likes (~170-190 residues), and show sequence

heterogeneity in the repeat region and in the C-terminal region (Figure 5.8).

The fish proteins from the Sho protein family are all of similar length (FuSho, 146

residues; TeSho, 149; ZeSho, 131; FuSho2, 150; TeSho2, 150; ZeSho2, 136; CaSho2,

145). There is an insertion in the Fugu and Tetraodon Sho basic repeats that is not

present in other Shos. However, the Sho2 sequence in this region is different from that

in Shos, lacking the basic region N-terminal to the hydrophobic region. The C-terminal

150

Page 36: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

regions of fish Shos and Sho2s are quite diverged; this is also the sequence region most

diverged between fish and tetrapod Shos.

Thus, the Sho protein family from mammals and fish show better conservation than the

PrP protein family.

5.6.7 Phylogenetic Analysis

The aim of phylogenetic analysis is to examine evolutionary relationship among the

members of the two protein families.

The human, chicken, turtle and frog PrP sequences cluster together to the exclusion of

all fish sequences (Figure 5.9). Although the inferred evolutionary relationship among

higher vertebrates has human PrP more distantly related to birds and reptiles than to

frog, the most likely tree is not significantly different from others consistent with the

current view on tetrapod evolution (i.e., with HuPrP and XePrP changed over). The

divergence of the amino-acid sequence between the fish proteins related to PrP and the

PrPs of higher vertebrates, suggests no orthology between these proteins. The clustering

pattern of the stPrPs also indicates that the genes coding for these proteins were

duplicated in fish after the evolutionary split from tetrapods.

The Shos and Shos2s lie on two separate branches of the tree (Figure 5.10), implying

that these two genes duplicated before the divergence of fish from tetrapods. Most

importantly, the fish Shos cluster with their mammalian homologues, rather than with

their fish Sho2 paralogues. This clustering pattern strongly suggests orthology between

mammalian and fish Shadoos.

It is established that whole genome and/or gene(s) duplications occurred in several fish

lineages, so that many duplicated fish genes have only one homologue in mammals

(Taylor et al. 2003; Aparicio et al. 2002). Two possible fates of duplicated genes have

been proposed. The classical model of neofunctionalization predicts that one of the

duplicate loci retains its original function while the other duplicate is fixed only if rare

151

Page 37: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

beneficial mutations occur (Ohno 1970). This model could fit current knowledge of

Shos and Sho2s, as they cluster separately in the phylogenetic analysis. The alternative

model proposes that both duplicates are preserved due to subfunctionalization, where

proteins encoded by the duplicates complement each other functionally (Force et al.

1999). This model may fit the fish stPrPs which have sequences more similar to each

other than to those of tetrapod PrPs, grouping separately in the phylogenetic study.

5.6.8 Hypothetical Model for Evolution of PRNP- and SPRN-gene families

Dr. Gready and Prof. Graves constructed a hypothetical model (Figure 5.11), proposing

that the ancestral gene leading to all the PRNP-related and SPRN-related genes was

SPRN-like. First, an ancient pre-vertebrate duplication produced SPRNA and SPRNB

within an environment which may have contained the AO- and GTP-binding protein-

coding genes proximally, and the Rassf2- and Slc23a1-coding genes distally. The model

then proposes physical separation of the SPRN and SPRNB genes by a translocation of

half of the gene cluster to another chromosome. The subsequent history of the two

branches suggests that the genomic environment containing the SPRNB gene was highly

recombinogenic, while that containing the SPRNA gene was stable, leading to the

currently known fish and mammal SPRN orthologues in the same genomic context. It

was predicted that orthologous SPRN genes will be found in the same genomic context

in the other tetrapod lineages. A duplication of the SPRNB gene is then proposed, still

before the divergence of fish from tetrapods, to produce SPRNB1 and SPRNB2

protogenes.

Acquisition of additional sequence to form the complete C-terminal domain at this stage

in necessary to

explain subsequent gene evolution steps. The C-terminal domain sequence of genes

evolved from SPRNB1 and SPRNB2 has been truncated or replaced, leading to the PrP-

like and SPRNB in fish. This occured independently and recently to the duplicate PRNT

gene in human. SPRNB2 translocated in the fish lineage, but was deleted in tetrapods,

since no genes descendent from SPRNB2 have been found in mammals.

152

Page 38: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

Figure 5.11: An evolutionary model for orgin of Sho- and PrP-related coding genes.

152a

Page 39: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

After divergence of fish from tetrapods, the model proposes independent duplications of

these protogenes in the two lineages. In the tetrapod lineage a gene duplication of

SPRNB1 produced a gene cluster containing PRNP and PRND genes, between the

Rassf2- and Slc23a1-coding genes. It is not known at what stage the SPRNB2 gene was

deleted in tetrapod evolution, or whether it has simply diverged beyond levels currently

detectable in mammals.

In the fish branch, initial duplications of the SPRNB1 protogene to produce stPrP-2- and

PrP-like-coding genes, and of the SPRNB2 protogene to produce SPRNB and stPrP-1-

coding gene, are proposed. As depicted in Figure 5.11, these gene clusters are already

separated. If the separation occurred after these duplications, translocation of the

SPRNB and stPrP-1 fragment might more conveniently explain the apparently different

contexts observed in Fugu and zebrafish.

This model suggests that the PRNP- and SPRN-gene families evolved from the same

gene.

5.7 Conclusion: Evolvability of PRNP and SPRN

My comparative genomic analysis, together with the complementary phylogenetic

analysis, showed different evolutionary trajectories for PRNP and SPRN. On the one

hand, the dispensable mammalian PRNP appears to have relaxed evolutionary

constraints. It did not align with the fish homologues, there were local rearrangements

from fish to mammals, it accumulates transposable elements extensively and sequences

of proteins belonging to the PrP family vary. In contrast, the GC-rich mammalian SPRN

aligned with its fish homologues in comparative genomic analysis, it harbours no

transposable elements, there is conserved contiguity between fish and mammals and the

protein sequences are conserved. This evolutionary dialectic therefore indicates that the

SPRN gene is more conserved than PRNP, implying that it may have a more prominent

function than the PRNP gene.

153

Page 40: Chapter 5: Evolution of PRNP and SPRN · merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified this assembly using the PiPMaker program (Chapter

Chapter 5 Evolution of PRNP and SPRN

An interplay between conservation and change enables perpetuation of life: whereas

maintenance of organization requires conservation, variation allows adaptation

(Radman et al. 1999). Whereas the PRNP and its homologues under weaker

evolutionary constraints may have adapted to different roles in different vertebrate

lineages, the conserved SPRN under stronger selective pressure may have retained

basic, vertebrate-wide conserved function. Many lines of evidence indicate redundancy

between the dispensable PRNP and another gene; perhaps this gene is the more

conserved SPRN. Finally, the pathogenic potential of PRNP could evolved as a

consequence of relaxed evolutionary constraints.

With two such a different players now in hand, new avenues for research appear: the

comparison between the two genes is a way to understand their functions better

(Chapter 7).

154