Structural and functional analysis of ataxin-2 and ataxin-3 Mario Albrecht 1, *, Michael Golatta 2, *, Ullrich Wu ¨ llner 3 and Thomas Lengauer 1 1 Max-Planck-Institute for Informatics, Saarbru ¨cken, Germany; 2 Institute for Medical Biometry, Informatics, and Epidemiology, University of Bonn, Germany; 3 Department of Neurology, University of Bonn, Germany Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) are autosomal-dominantly inherited, neurodegenerative dis- eases caused by CAG repeat expansions in the coding regions of the genes encoding ataxin-2 and ataxin-3, respectively. To provide a rationale for further functional experiments, we explored the protein architectures of ataxin- 2 and ataxin-3. Using structure-based multiple sequence alignments of homologous proteins, we investigated domains, sequence motifs, and interaction partners. Our analyses focused on presumably functional amino acids and the construction of tertiary structure models of the RNA- binding Lsm domain of ataxin-2 and the deubiquitinating Josephin domain of ataxin-3. We also speculate about dis- tant evolutionary relationships of ubiquitin-binding UIM, GAT, UBA and CUE domains and helical ANTH and UBX domain extensions. Keywords: spinocerebellar ataxia; Machado–Joseph dis- ease; polyglutamine disorder; ubiquitin; valosin-containing protein. Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) are autosomal-dominantly inherited, neurodegenerative dis- orders [1,2]. SCA3 has also been known as Machado– Joseph disease (MJD), and SCA2 and SCA3 belong to a heterogeneous group of trinucleotide repeat disorders. This group includes Huntington disease (HD), dentatorubral- pallidoluysian atrophy (DRPLA), and other spinocerebellar ataxia types such as SCA1, SCA7 and SCA17 [3–7]. The age of onset of SCA2 and SCA3 is in the third to fourth decade [8]. The disorders share common phenotypic features such as the degeneration of specific vulnerable neuron popula- tions and the presence of intracellular aggregations of the mutant proteins in affected neurons. In contrast, the expression of the disease-associated genes occurs in a great variety of tissues and is not restricted to neuronal cells. The SCA2 and SCA3/MJD genes have been mapped to chromosomes 12q24.1 and 14q32.1 [1,2]. The common underlying genetic basis of SCA2 and SCA3 is the expansion of a CAG repeat region beyond a certain thresh- old. These CAG repeats encode a polyglutamine (polyQ) tract in the respective proteins ataxin-2 and ataxin-3. The polyQ stretch in ataxin-2 lies near the N-terminus at the 5¢ - end of the coding region of exon 1 [9], but the polyQ region of ataxin-3 is contained in exon 10 close to the C-terminus [10]. While ataxin-2 is located predominantly in the Golgi apparatus [11], ataxin-3 is found in both the nucleus and the cytoplasm of cells [12]. To provide a rationale for further experiments, we characterized the protein architectures of ataxin-2 and ataxin-3 and investigated domains, sequence motifs, and interaction partners. To explore the functional implications, we assembled a multiple sequence alignment for the Lsm domain of ataxin-2 homologues including the yeast homologue Pbp1. We also constructed a 3D structural model for the RNA-binding Lsm domain of ataxin-2. Similarly, we used a structure-based multiple sequence alignment of the Josephin domain of ataxin-3 homologues to derive a 3D model of this domain and to analyse specific residues involved in deubiquitination. Materials and methods Protein sequences were retrieved from the NCBI [13], Ensembl [14], and SWISS-PROT/TrEMBL (SPTrEMBL) [15] databases and protein domain architectures from the Pfam [16] and SCOP [17] databases. Sequence accession numbers are given in the respective figure legends and Tables S1 and S2. Species names are abbreviated by first letters (Table S3). Protein structures were obtained from the PDB database [18]. The secondary structure assignments of PDB structures were taken from the DSSP database [19]. A single capital letter appended to the actual PDB identifier denotes the chosen structure chain. We used the PSI-BLAST suite of programs [20] to search for homologues (E-value cut-off 0.005) and the web servers PSIPRED [21], SAM-T99 [22], and SSpro2 [23] to predict the secondary structure of proteins and to form a consensus prediction by majority voting [24]. To predict intrinsically unstructured and disordered regions in proteins, we explored the consensus of the results returned by the DisEMBL [25], DISOPRED [26], GlobPlot [27], NORSp [28] and PONDR [10] online Correspondence to M. Albrecht, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbru¨ cken, Germany. E-mail: [email protected]Abbreviations: A2BP1, ataxin-2 binding protein 1; DRPLA, denta- torubral pallidoluysian atrophy; DUB, deubiquitinating enzymes; HD, Huntington disease; MJD, Machado–Joseph disease; NLS, nuclear localization signal; OTU, otubains; PABP, poly(A)-binding protein; RMSD, root mean square deviation; SCA, spinocerebellar ataxia; SnRNPs, small nuclear ribonucleoproteins; UBP, ubiquitin- specific protease; UCH, ubiquitin C-terminal hydrolase; UIM, ubiquitin-interacting motif; VCP, valosin-containing protein. *Note: M. Albrecht and M. Golatta contributed equally to this work. (Received 6 April 2004, accepted 7 June 2004) Eur. J. Biochem. 271, 3155–3170 (2004) ȑ FEBS 2004 doi:10.1111/j.1432-1033.2004.04245.x
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Structural and functional analysis of ataxin-2 and ataxin-3
Mario Albrecht1,*, Michael Golatta2,*, Ullrich Wullner3 and Thomas Lengauer1
1Max-Planck-Institute for Informatics, Saarbrucken, Germany; 2Institute for Medical Biometry, Informatics, and Epidemiology,
University of Bonn, Germany; 3Department of Neurology, University of Bonn, Germany
Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) areautosomal-dominantly inherited, neurodegenerative dis-eases caused by CAG repeat expansions in the codingregions of the genes encoding ataxin-2 and ataxin-3,respectively. To provide a rationale for further functionalexperiments, we explored the protein architectures of ataxin-2 and ataxin-3. Using structure-based multiple sequencealignments of homologous proteins, we investigateddomains, sequence motifs, and interaction partners. Ouranalyses focused on presumably functional amino acids and
the construction of tertiary structure models of the RNA-binding Lsm domain of ataxin-2 and the deubiquitinatingJosephin domain of ataxin-3. We also speculate about dis-tant evolutionary relationships of ubiquitin-binding UIM,GAT, UBA and CUE domains and helical ANTH andUBX domain extensions.
Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) areautosomal-dominantly inherited, neurodegenerative dis-orders [1,2]. SCA3 has also been known as Machado–Joseph disease (MJD), and SCA2 and SCA3 belong to aheterogeneous group of trinucleotide repeat disorders. Thisgroup includes Huntington disease (HD), dentatorubral-pallidoluysian atrophy (DRPLA), and other spinocerebellarataxia types such as SCA1, SCA7 and SCA17 [3–7]. The ageof onset of SCA2 and SCA3 is in the third to fourth decade[8]. The disorders share common phenotypic features suchas the degeneration of specific vulnerable neuron popula-tions and the presence of intracellular aggregations of themutant proteins in affected neurons. In contrast, theexpression of the disease-associated genes occurs in a greatvariety of tissues and is not restricted to neuronal cells.
The SCA2 and SCA3/MJD genes have been mapped tochromosomes 12q24.1 and 14q32.1 [1,2]. The commonunderlying genetic basis of SCA2 and SCA3 is theexpansion of a CAG repeat region beyond a certain thresh-old. These CAG repeats encode a polyglutamine (polyQ)tract in the respective proteins ataxin-2 and ataxin-3. ThepolyQ stretch in ataxin-2 lies near the N-terminus at the 5¢-end of the coding region of exon 1 [9], but the polyQ region
of ataxin-3 is contained in exon 10 close to the C-terminus[10]. While ataxin-2 is located predominantly in the Golgiapparatus [11], ataxin-3 is found in both the nucleus and thecytoplasm of cells [12].
To provide a rationale for further experiments, wecharacterized the protein architectures of ataxin-2 andataxin-3 and investigated domains, sequence motifs, andinteraction partners. To explore the functional implications,we assembled a multiple sequence alignment for the Lsmdomain of ataxin-2 homologues including the yeasthomologue Pbp1. We also constructed a 3D structuralmodel for the RNA-binding Lsm domain of ataxin-2.Similarly, we used a structure-based multiple sequencealignment of the Josephin domain of ataxin-3 homologuesto derive a 3D model of this domain and to analyse specificresidues involved in deubiquitination.
Materials and methods
Protein sequences were retrieved from the NCBI [13],Ensembl [14], and SWISS-PROT/TrEMBL (SPTrEMBL)[15] databases and protein domain architectures from thePfam [16] and SCOP [17] databases. Sequence accessionnumbers are given in the respective figure legends andTables S1 and S2. Species names are abbreviated by firstletters (Table S3). Protein structures were obtained from thePDB database [18]. The secondary structure assignments ofPDB structures were taken from the DSSP database [19]. Asingle capital letter appended to the actual PDB identifierdenotes the chosen structure chain. We used the PSI-BLASTsuite of programs [20] to search for homologues (E-valuecut-off 0.005) and the web servers PSIPRED [21], SAM-T99[22], and SSpro2 [23] to predict the secondary structure ofproteins and to form a consensus prediction by majorityvoting [24]. To predict intrinsically unstructured anddisordered regions in proteins, we explored the consensusof the results returned by the DisEMBL [25], DISOPRED[26], GlobPlot [27], NORSp [28] and PONDR [10] online
Correspondence toM.Albrecht,Max-Planck-Institute for Informatics,
*Note: M. Albrecht and M. Golatta contributed equally to this work.
(Received 6 April 2004, accepted 7 June 2004)
Eur. J. Biochem. 271, 3155–3170 (2004) � FEBS 2004 doi:10.1111/j.1432-1033.2004.04245.x
servers. The nuclear localization signals in ataxin-3 homo-logues were discovered with help of the prediction serverPSORT II [29].
Multiple sequence alignments were assembled by meansof T-COFFEE [30] and improved manually by minoradjustments based on structure prediction results and pair-wise structure superpositions computed by the program CE
[31]. The root mean square deviations (RMSDs) were takenfrom the CE superpositions.We investigated the results of allstate-of-the-art fold recognition methods available via theonline meta-server BioInfo.PL [32], which contacts a dozenother state-of-the-art prediction servers (the names of whichare listed on the web site http://Bioinfo.PL/Meta/). Theassociated 3D-Jury system allows for the comparison andevaluation of the predicted 3D models in a consensus view[33]. To model the protein structure of ataxin-2 and ataxin-3, we submitted the constructed sequence–structure align-ments to the 3D modelling server WHAT IF [34]. Thesequence alignments depicted in the figures were prepared inthe SEAVIEW editor [35] and illustrated by the web serviceESPript [36]. The protein structure images were drawnin the Accelrys Discovery Studio ViewerLight. The onlineversion of this manuscript contains supplementary material,and our web site will provide additional pictures.
Results and discussion
Protein architecture of ataxin-2
Ataxin-2 has 1312 residues (including 22 glutamines of thepolyQ stretch) and a molecular mass of� 140 kDa. Ataxin-2 is a highly basic protein except for one acidic region(amino acid 254–475) containing 46 acidic amino acids(Fig. 1). This region covers roughly exons 2–7 and ispredicted to consist of two globular domains named Lsm(Like Sm, amino acid 254–345) [37] and LsmAD (Lsm-associated domain, amino acid 353–475). The LsmAD
domain of ataxin-2 contains both a clathrin-mediatedtrans-Golgi signal (YDS, amino acid 414–416) and anendoplasmic reticulum (ER) exit signal (ERD, amino acid426–428) [11,38]. It is composed mainly of a-helicesaccording to the results from secondary structure predictionservers.
The rest of ataxin-2 outside of the Lsm and LsmADdomains is only weakly conserved in eukaryotic ataxin-2homologues and is predicted to be intrinsically unstructuredaccording to the consensus result from the DisEMBL,DISOPRED, GlobPlot, NORSp and PONDR onlineservers. These nonglobular, flexible N- and C-terminal tails(amino acid 1–253 and 476–1312) contain the polyQ region(amino acid 166–187), several highly conserved shortsequence motifs as possible protein interaction sites, andconspicuous (R)RG peptides at the C-terminus of theLsmAD domain. One of the sequence motifs constitutes aputative PABP [poly(A)-binding protein] interacting motifPAM2 (amino acid 908–925) [39], and (R)RG peptides arewell-known to bind RNA in other proteins [40]. The N- andC-terminal tails of ataxin-2 also have a high content ofproline (179 prolines out of 1090 amino acids, 16.4%).
This property and the low complexity of unstructuredsequence regions may lead to several significant, butprobably false-positive, hits during a PSI-BLAST search forhomologues of ataxin-2. For instance, despite the use of thestandard low complexity filter, our PSI-BLAST search withhuman ataxin-2 homologues found several questionable hitsoutside globular domains to homologues of the poly-glutamine DRPLA gene product atrophin. For instance,starting the PSI-BLAST search with an Arabidopsis thalianaataxin-2 homologue (SPTrEMBL: Q94AM9), humanatrophin is retrieved in the third iteration with an E-valueof 5 · 10)11. Conversely, using the rat atrophin homologue(SPTrEMBL: Q62901) as the start sequence, human ataxin-2 was detected in the second iteration with an E-value of8 · 10)04.
Fig. 1. Protein architectures of human ataxin-2, its yeast homologue Pbp1, and the P. falciparum homologue PF13_0048 of the decapping enzyme
DCP2 (DCP2_Pf).
3156 M. Albrecht et al. (Eur. J. Biochem. 271) � FEBS 2004
RNA binding of ataxin-2
The Lsm domain of ataxin-2 is typical of RNA-binding Smand Sm-like proteins, which often form cyclic 6-, 7- or even14-oligomers [41–43]. Generally, Lsm domain proteins areinvolved in a variety of essential RNA processing eventsincluding RNA modification, pre-mRNA splicing, andmRNA decapping and degradation. Some of them are alsoimportant components of spliceosomal small nuclear ribo-nucleoproteins (snRNPs).
The LsmAD domain is contained in the Pfam databasewith the name Ataxin-2_N and also occurs in another, asyet uncharacterized Plasmodium falciparum/yoelii yoeliigene products PF13_0048/PY07327 without an Lsmdomain (Fig. 1). Both Plasmodium gene products have anadditional N-terminal DCP2 domain (also termed box A),which is always followed by a NUDIX domain [44] in allknown DCP2 homologues. This NUDIX domain consti-tutes the catalytic subunit of the mRNA decappingholoenzyme DCP1–DCP2 [45,46].
The physiological function of ataxin-2 and closely relatedeukaryotic homologues in RNA processing is as yet quiteunexplored [47–50]. Interestingly, ataxin-2 has beenobserved to interact with A2BP1 (ataxin-2 binding protein1) [38], whose RNA-binding Caenorhabditis elegans homo-logue, fox-1, regulates tissue-specific alternative splicing [51].Disruption of the humanA2BP1 genemay cause epilepsy ormental retardation [52]. In addition, ataxin-2 shows signi-ficant homology to the yeast protein Pbp1 (Pab1/PABP-binding protein 1), which also contains the Lsm andLsmAD domains; regions outside of these two globulardomains are predicted to be mainly unstructured in Pbp1 asin ataxin-2.
Although the C-terminal tail of Pbp1 does not contain aPAM2 motif [39], this yeast protein regulates polyadenyla-tion after pre-mRNA splicing and interacts with theC-terminal part of the yeast homologue PAB1 of thehuman PABP [53]. A2BP1 and PABP are also evolutionar-ily related and possess RNA recognition motifs [38]. Theseobservations strongly suggest that ataxin-2 is involved insimilar mRNA processing tasks.
Structural modelling of ataxin-2
First, we compiled a list of ataxin-2 homologues includingthe yeast homologue Pbp1 and several Lsm domains ofsnRNPs and other Sm and Sm-like proteins from variousspecies. Then, we assembled a structure-based multiplesequence alignment of the Lsm domains, crystallographi-cally determined structures of which reveal a close structuralhomology between archaeal and eukaryotic proteins(Fig. 2) [42,43,54–65]. This suggests that the function andthe RNA-binding mode of the Lsm domain have beenpreserved during evolution.
The RNA-binding Lsm domain is characterized by aconserved sequence motif consisting of two short segmentsknown as Sm1 and Sm2, which are separated by a variablelinker [66,67]. The very strong conservation of certainglycine residues is especially striking and also demonstratesthe evolutionary relationship of ataxin-2 to Lsm domainproteins. The amide groups of the glycines are known tostabilize the protein structure when forming hydrogen
bonds to adjacent b-strands [55]. The secondary structurepredictions of ataxin-2 and its yeast homologue Pbp1 arealso in good agreement with the known structure of the Lsmdomain as open b-barrel, consisting of an N-terminala-helix followed by a strongly bent five-stranded antiparallelb-sheet with a 310 helical turn in some cases before the fifthb-strand.
The top two alignment rows in Fig. 2 show humanataxin-2 aligned with the Pyrococcus abyssi Sm1 protein(PDB identifier 1m8v, chain A), the crystal structure ofwhich consists of a heptameric ring with a central cavity likeother Lsm domain oligomers [65]. This Sm1 proteinprovides the only Lsm domain structure, which is boundto RNA inside and outside of the doughnut-shaped ring atan internal and an external binding site. Therefore, we usedthis alignment of ataxin-2 to Sm1 to model the 3D structureof the Lsm domain of ataxin-2 in complex with RNA andLsm domains of ataxin-2 protomers (Fig. 3).
Functional analysis of the Lsm domain
We applied the same colour scheme to functionally relevantresidues shown in the multiple sequence alignment and the3D model of ataxin-2 (Figs 2 and 3). Based on the crystalstructure of Sm1 fromP. abyssi bound to uridine heptamers(U7), we marked several amino acids in Sm1, which areinvolved in RNA binding [65] and are mostly physico-chemically conserved in ataxin-2 (Sm1/ataxin-2 residuenumbers). The residues forming the internal U7 binding siteare H37/K299, N39/L302 and R63/K330, while ionicinteractions between K22/K284, R63/K330 and D65/S332stabilize theRNA-binding area. The residues involved in theexternal U7 binding site are R4/R266, H10/T272 and Y34/Y296, stabilized by a hydrogen bond between H10/T272and Y34/Y296. It is interesting to note that Sm1 fromP. abyssi and from Archaeoglobus fulgidus (PDB identifier1i4k, chain A) share identical RNA-binding residues exceptfor H10, which is replaced by an asparagine [59,65].
Furthermore, we investigated whether ataxin-2 may alsoform oligomers through the Lsm domain. To this end, weused the detailed crystal structure analyses of the verysimilar snRNP heterodimers D1–D2 andD3–B [55]. Becauseof analogous intermolecular interactions in both dimers, wefocused on the complex of D3 with B. This complex isstabilized mainly by the pairing of the fifth b-strand (b5)fromD3 with the fourth b-strand (b4) from B (D3/ataxin-2–B/ataxin-2): R69/V335–R73/K330, L71/V337–L71/L328,and L73/F339–L69/S326. In addition, two hydrophobicclusters formed by residues of D3 and B contribute to thestability of the dimer. The first cluster includes F70/V336and I72/Q338 (both in b5 strand) of D3 and F27/Y289(b2 strand), L67/M324, V70/I327 and L72/L328 (all in b4strand) of B. The second cluster consists of P6/M267, L10/L271 (both in a-helix), V18/C279 (b1 strand), L32/F293 (b2strand), I33/K294 (loop after b2 strand), I68/F334, L71/V337 and L73/F339 (all in b5 strand) of D3 and I41/L304,C43/A306 (both in b3), L69/S326 and L71/L328 (both inb4) of B. Stacking interactions between guanidinium groupsof arginines R69/V335 of D3 and R25/G287 and R49/T312of B as well as an ionic interaction between E21/Q282 of D3
and R65/S322 of B stabilize the dimer further. However, thelatter salt bridge is not observed in the D1–D2 complex
� FEBS 2004 Analysis of ataxins 2 and 3 (Eur. J. Biochem. 271) 3157
3158 M. Albrecht et al. (Eur. J. Biochem. 271) � FEBS 2004
despite identical amino acids. Altogether, the degree ofconservation of amino acids relevant for heterodimerizationis only moderate, but may still suggest that ataxin-2 mayform Lsm domain oligomers.
Protein architecture of ataxin-3
The longest splice variant of ataxin-3 possesses 376 aminoacids (including 22 glutamines of the polyQ stretch, aminoacid 296–317) and an approximate molecular weight of42 kDa. Ataxin-3 consists of a globular deubiquitinatingN-terminal Josephin domain (amino acid 1–170) [68,69] anda flexible C-terminal tail containing two ubiquitin-interact-ing motifs (UIMs) [70] (also termed LALAL motifs andPUBs [71], amino acid 223–240 and 243–260) and the polyQregion (amino acid 296–317) (Fig. 4) [72]. A slightly shorteralternative splice variant of ataxin-3 with 373 amino acidshas a third UIM (amino acid 334–351) at the C-terminus.An as yet uncharacterized ataxin-3 paralogue on the Xchromosome (sequence identity 70%) is expressed in testis(ataxin-3t) [10]. The Josephin domain is also found withouta C-terminal tail in other, as yet uncharacterized, proteinsnamed josephins (Fig. 5) [73].
A highly conserved, putative nuclear localization signal(NLS) is found upstream of the polyQ stretch (RKRR,amino acid 282–285), which may be bipartite in theCaenorhabditis elegans homologue of ataxin-3, consistingof 17 residues (RRDRQKFLERFEKKKEE, amino acid
296–312). This NLS follows a potential casein kinase II(CK-II) phosphorylation site (TSEE, amino acid 277–280),which may determine the rate of the observed ataxin-3transport into the nucleus [74]. Ataxin-3 may also containa nuclear export signal (NES) following the Josephindomain (ADQLLQMIRV, amino acid 174–183) based onour comparisonwith a published sequence profile of nuclearexport signals [75]. Furthermore, ataxin-3 contains severalconserved sequence motifs similar to NR- and CoRNR-boxes L-x-x-L-L/[IL]-x-x-[IV]-I of transcriptional coactiva-tors and corepressors, respectively [73]. Indeed, ataxin-3interacts with histones and the histone acetyltransferasesCBP, p300, and PCAF, which work as transcriptionalcoactivators. In particular, dependent on these cofactors,ataxin-3 represses histone acetylation and transcription [76],and altered protein acetylation has already been implicatedin polyglutamine disease processes [77]. Generally, the(de-)ubiquitination of histones has been linked to transcrip-tional regulation [78], which may also explain the observedinteractions of ataxin-3.
Ataxin-3 is evolutionarily conserved in eukaryotes inclu-ding P. falciparum and plants, but not yeast. The P. falci-parum homologue PFL1295w of ataxin-3 (ataxin-3_Pf),whose gene expression is upregulated similarly to theP. falciparum josephin homologue PF11_0125 in gameto-cytes [79–81], constitutes an exception because it has onlythe second UIM conserved (amino acid 250–267) and hasan additional ubiquitin-like UBX domain [82–85] at the
Fig. 3. 3D model of the Lsm domain of ataxin-2 using three adjacent protomers of the Sm1 protein from P. abyssi as template (PDB identifier 1m8v,
chain A, B and G). The model illustrates predicted internal (blue) and external (green) binding sites of ataxin-2 to RNA (grey). a-Helices are in
shown in red, b-strands are shown in cyan. Only functionally relevant residues of the central ataxin-2 protomer are annotated as follows: dark blue
boxes point to residues forming the internal site, and light blue boxes mark amino acids stabilizing the RNA binding area; dark green boxes
highlight residues involved in the external site, and light green ones indicate stabilizing hydrogen bonds.
� FEBS 2004 Analysis of ataxins 2 and 3 (Eur. J. Biochem. 271) 3159
C-terminus (amino acid 271–381) instead of the polyQ-containing region [69]. Like human ataxin-3, this ataxin-3homologue PFL1295w also has a potential casein kinase IIphosphorylation site (TSDE, amino acid 278–281) close tobasic amino acids, which can be indicative of an NLS(KKIH, amino acid 293–296) near the N-terminus of theUBX domain. In contrast, the prediction server PSORT II
returns another region inside the UBX domain as a possibleNLS (PRRK, amino acid 339–342). It is unclear whichNLSmotif may be functionally more relevant because bothNLS motifs correspond to amino acids at solvent exposedN-termini of the second and fourth b-strand in the crystalstructure of the UBX domain of the cofactor p47 (PDBidentifier 1s3s) [86]. Similar to theP. falciparum homologue,the Cryptosporidium parvum homologue of ataxin-3 alsopossesses only one UIM motif (amino acid 266–283) and aC-terminal UBX domain (amino acid 288–397) instead of apolyQ region.
Ubiquitin binding of ataxin-3
Ubiquitination fulfills many cellular functions in cytoplas-mic trafficking, guiding specific proteins through theendocytic pathways, and targeting proteins to the protea-
some [84,87–93]. Above all, the ubiquitin–proteasomalpathway is involved in processing mutant or damagedproteins that cause neurodegenerative diseases. The smallubiquitin protein can be covalently linked to other proteinsas single molecule or polyubiquitin chain.
Recently, the two UIMs between the Josephin domainand the polyQ stretch of ataxin-3 have been shown to becapable of binding tetraubiquitin and polyubiquitinatedproteins [68,94–97]. In our previous study, we used theC-terminal ANTH domain extension, which consists of anantiparallel three-helix bundle, to model the structure of theUIMs in the C-terminal tail of ataxin-3 [73]. In fact, novelstructure determinations have shown that UIM peptides area-helices and can form helix bundles in the crystal structure[98]. In contrast, the NMR solution structures of UIMpeptides reveal that they are single amphiphatic a-helicesconnected by unstructured linkers [99,100]. The latterobservation is in agreement with the observed flexibility ofthe C-terminal tail of ataxin-3 [72].
Furthermore, the ANTH domain itself is evolutionarily,structurally, and functionally related to a VHS domain[101]. Lately, the structure of the GAT (GGAs and Tom1)domain directly following the VHS domain of Tom1 andGGAs (Golgi-associate, c-adaptin ear-containing, Arf-binding proteins) was determined crystallographically[102–105]. The GAT domain contains a three-helix bundle,which we found to superimpose very well with the helicalbundle of the C-terminal ANTH domain extension (RMSD3.1 A, PDB identifiers 1o3x and 1hx8, A chains).
Interestingly, the GAT domains of GGAs and Tom1have been reported to interact with ubiquitin [106–108].The corresponding ubiquitin binding site was located to thethird a-helix of the GAT three-helix bundle, and hydro-phobic amino acids like leucines are important for theinteraction (Fig. 5). The same residue type also plays anessential role in binding ubiquitin to the UIM a-helix [98–100] and the third a-helix of the helical bundle in thehomologous CUE and UBA domains [109]. However, thesequence similarity is quite low, and thus it is difficult todeduce an evolutionary relationship, although the ubiquitinbinding sequence of GGAs and Tom1 resembles anoncanonical UIM whose, otherwise strictly conserved,serine residue is replaced by an asparagine except in case ofhuman GGA3 (Fig. 5).
Further interaction partners of ataxin-3
It has been shown that ataxin-3 interacts with the ubiquitin-like (UBL) domain of the homologous ubiquitin- andproteasome-binding factors hHR23A and hHR23B, whoseyeast orthologue is Rad23 [96,110–112]. The latter factorsare also involved in the nucleotide excision repair pathwayby targeting the ubiquitinated nucleotide excision repairfactor XPC/Rad4 to the proteasome [113]. Their UBLdomain binds to aUIMhelix of the 26S proteasome subunitS5a, and this interaction disrupts the interdomain contactsbetween the N-terminal ubiquitin-mimicking UBL domainand the two C-terminal ubiquitin-binding UBA domains,thereby inducing the change from a closed to an openprotein conformation [109,111,114,115]. Rad23 and theyeast orthologueRpn10 of S5a serve as alternative ubiquitinreceptors for the proteasome [116], and the UBA domains
Fig. 4. Protein architectures of human ataxin-3, its P. falciparum
homologue PFL1295w (ataxin-3_Pf), and human josephin 1.
3160 M. Albrecht et al. (Eur. J. Biochem. 271) � FEBS 2004
of Rad23 inhibit proteasome-catalysed proteolysis bysequestering Lys48-linked polyubiquitin chains [117,118].In particular, the NMR solution structures of the UBLdomain of hHR23A/B bound to a UIM peptide of S5a[99,119] could be used to model the complex of hHR23A/Band ataxin-3. Similarly, the complex of a UIM of ataxin-3with ubiquitin could be modelled based on the NMRsolution structure of the UIM of the Vps27 protein boundto ubiquitin [100].
The C-terminal region of ataxin-3 including the polyQregion interacts with the N-terminal cofactor/substrate-binding adaptor domain of the valosin-containing proteinVCP/p97/Cdc48/VAT/ter94 [96,120–123]. VCP is animportant multifunctional AAA+ ATPase with two C-terminal ATPase domains after the adaptor domain, whichprovide the energy for major conformational changes [124].VCP forms hexamers and works as molecular chaperoneinvolved in a variety of intracellular functions including cellcycle progression, membrane fusion, vesicle-mediated trans-port, transcription activation, apoptosis prevention, andubiquitin-proteasome degradation, modulating polygluta-mine-induced neurodegeneration [96,120–123,125–127].VCP binds the ubiquitin E3 ligase and the chain assemblyfactor UFD2a/E4B, which is a U box homologue of yeastUfd2 [128], and interacts with and regulates the degradationof the proteasome-associated ataxin-3, forming a trimericcomplex of ataxin-3, VCP, and UFD2a [96,127,129–131].Interestingly, Ufd2 binds the UBL domain of Rad23 andcompetes with Rad23 for binding to the Rpn1 proteasomesubunit, while the N-terminal UBL domain of the ubiquitinC-terminal hydrolase Ubp6 interacts with Rpn1 withoutcompetition with Rad23 [116,132].
Furthermore, VCP also binds the C-terminal UBXdomain of the membrane fusion adaptor p47/SHP1/EYC/Ubx3 [85,86,133], which consists of three domains UBA-SEP-UBX [134]. The crystallographically determined com-plex of the N-terminal adaptor domain of VCP with thisUBX domain (PDB identifier 1s3s) indicates the interactingresidues [86] and could be used to model the putativecomplex of VCP with the C-terminal UBX domain of theataxin-3 homologue from P. falciparum (ataxin-3_Pf). LiketheUBXdomain of p47, ataxin-3_Pf contains the conservedloop that is essential for an interaction with VCP because itinserts into a hydrophobic pocket of VCP [86]. The UBXdomain structure of p47 is extended at its N-terminus by adisordered peptide structure and an additional a-helix of asyet unknown functional relevance [86]. The length of thisa-helix is similar to a UIM a-helix (Fig. 5), and such a UIM
also precedes the UBX domain of ataxin-3_Pf. Therefore,this a-helix of p47 might be related to the second UIM inataxin-3 homologues (recall that the first UIM is missing inataxin-3_Pf). In addition, the arrangement of one UIMhelix followed by aC-terminalUBXdomain is also found inthe cofactor Ubx2 with domain architecture UBA-UAS-
Fig. 5. Multiple sequence alignment of UIM peptides, divided into
groups by horizontal lines from top to bottom: UIM sequences of the
Pfam seed alignment including first, second, and third UIMs of ataxin-3
homologues, UIM-like peptides from GGAs and Tom1, and related
AP180 sequences. The latter are derived from the 3D structure super-
position of the GAT domain of human GGA1 with the AP180 exten-
and 1hx8, respectively). The second group of UIMs in ataxin-3 homo-
logues also includes the similar N-terminal a-helix of the UBX domain
extension of p47 (PDB identifier 1s3s). For each group, amino acids in
alignment columnswith amajority of identical residues are printed on a
black background, and similar amino acids are highlighted in grey.
� FEBS 2004 Analysis of ataxins 2 and 3 (Eur. J. Biochem. 271) 3161
UIM-UBX [133]. TheUIMofUbx2 binds ubiquitin chains,and the UBX domain interacts with VCP. Thus the sameinteractions can be expected for ataxin-3_Pf.
The C-terminal, presumably VCP-binding, UBX domainof ataxin-3_Pf appears to correspond to the VCP-bindingC-terminal part of human ataxin-3, which follows thesecond UIM and includes the polyQ region [120,123,131].In addition, the polyQ tract of ataxin-3 has been shown tobe indispensable for the interaction with VCP, and its lengthcorrelates with the strength of the interaction. These obser-vations raise the question how human ataxin-3 binds VCPin contrast to its P. falciparum homologue. This is partic-ularly interesting because VCPmay suppress polyQ inducedneurodegeneration, and mutations in VCP have beenobserved to cause cytoplasmic vacuoles followed by celldeath because of a dysfunctional second ATPase domainand inclusion body formation [120–123,127,135,136]. Wealso observed that all VCP sequence variations associatedwith Paget disease of bone and frontotemporal dementia(IBMPFD) [135] are not located in the binding interface of aUBX domain with the N-terminal adaptor domain of VCP,but are involved in interactions between protein regions(for details see the online supplement). Therefore, motionsof the adaptor domain, which are essential for proper VCPfunction [124,127], may be impaired by IBMPFD-associ-ated mutations.
According to a recent yeast-2-hybrid screen [137], ajosephin homologue from Drosophila melanogaster(CG3781) on the X chromosome interacts with the heatshock protein HSP60b (CG2830), which is involved inspermatogenesis [138,139], suppresses ubiquitination [140]and associates with 38 further proteins including a ubiquitinE3 ligase, but no other deubiquitinating enzyme exceptjosephin (CG8184). Interestingly, HSP40 and HSP70 chap-erones have already been observed to associate with VCP,and they also colocalize with intranuclear ataxin-3 aggre-gates and may play an important role in the disease processand the impairment of the ubiquitin-proteasome system[121,141–149].
Structural modelling of the Josephin domain
Recently, it has been observed that the Josephin domaincontains highly conserved amino acids reminiscent of thecatalytic residues of a deubiquitinating cysteine protease[69], and first experimental results support this functionhypothesis [68]: decrease of polyubiquitination of 125I-labelled lysozyme by removal of ubiquitin, cleavage of theubiquitin protease substrate ubiquitin-AMC, and binding ofthe specific ubiquitin protease inhibitor ubiquitin-aldehyde(Ubal). Mutating the catalytic cysteine in ataxin-3 inhibitsthese functions [68].
Previously, we modeled the 3D structure of ataxin-3based on the ANTH domain [150] of the adaptin AP180as structural template [73]. However, this prediction has tobe revised with regard to the N-terminal Josephin domainbecause of the identified cysteine protease signature [69].In contrast to our previous prediction [73], which reliedon the secondary structure prediction from a single server,we now formed the consensus result of the three state-of-the-art secondary structure prediction servers PSIPRED,SAM-T99, and SSpro2. All three online servers basically
returned the same secondary structure for human ataxin-3and josephin 1, resulting in a much more reliablesecondary structure prediction of b-strands besides a-heli-ces. We propose that the increased accuracy of thisprediction is due, at least in part, to a substantial growthof protein sequence and structure databases. The predic-ted b-strands in the Josephin domain corroborate acysteine protease fold of deubiquitinating enzymes(DUBs) and do not support the ANTH domain structureconsisting solely of a-helices. In hindsight, the foldrecognition methods applied in the past to predict thestructure of ataxin-3 may have been misguided by thepronounced prediction of a-helices only.
DUBs process ubiquitin proteolytically at the C-terminusand can be divided into at least two evolutionarily relatedfamilies of cysteine proteases, UBPs (ubiquitin-specificproteases) and UCHs (ubiquitin C-terminal hydrolases)[151,152]. However, new ubiquitin-specific families such asotubains (OTU) and JAMMs with low sequence similarityto known DUBs are still being discovered [151]. Aconsensus of fold recognition servers now selects bothavailable UCH domain structures of human UCH-L3 [153]and yeast YUH1 [154], which superimpose with a lowRMSD of 2.0 A (PDB identifiers 1uch and 1cmxA,respectively), as best modelling templates with a moderateconfidence score for human josephin 1, but still with only aweak score for ataxin-3. The pairwise sequence–structurealignments returned by the structure prediction servers for3D modelling differ mainly in the central part of theJosephin domain (amino acid 47–117 in ataxin-3) aligned toDUBs. This finding underpins the distant relationship of theJosephin domain to known DUBs. The central part doesnot contain catalytic residues and is thus less conserved,containing insertions of variable length and structure inother cysteine proteases [155].
Based on a multiple sequence alignment of Josephindomain homologues (Fig. 6), we used the crystallographi-cally determined structure of YUH1 bound to the ubiquitin-like inhibitor Ubal (PDB identifier 1cmx, chains A and B,respectively) to model the tertiary structure of the Josephindomain of ataxin-3 in complex withUbal (Fig. 7). Thus, thestructure of ataxin-3 is predicted to be distinct from thefinger–palm–thumb architecture of UBPs such as USP7/HAUSP [156]. Because of the low degree of conservation inthe central part, we believe that ataxin-3 and josephin 1adopt slightly different structures in this part, which are notvery similar to YUH1. In addition, we observed that theJosephin domain also resembles the OTU domain becauseboth have a highly conserved histidine three residuesdownstream of the catalytic cysteine. Interestingly, likeataxin-3, the deubiquitinating OTU domain proteinVCIP135 interacts with the N-terminal adaptor domain ofVCP through the C-terminal tail including a UBL domainand dissociates p47 from the complex with VCP duringATP hydrolysis of VCP [157,158]. This observation alsoindicates a close functional relationship of the homologousubiquitin-like UBL and UBX domains.
Functional analysis of the Josephin domain
The active site of UCHs is divided into two parts asfollows (YUH1/ataxin-3 residue numbers) [153,154]: The
3162 M. Albrecht et al. (Eur. J. Biochem. 271) � FEBS 2004
N-terminal part consists of a glutamine (Q84/Q9) upstreamof a cysteine (C90/C14), both of which form an oxyanionhole to accommodate the negative charge on the substratecarbonyl oxygen during catalysis. The C-terminal partcontains a histidine (H166/H119), which is thought to bedeprotonated, and an asparagine or aspartate (D181/N134),both of which activate the side chain of the cysteine to
unleash a nucleophilic attack on the carbonyl carbon atomof the scissile peptide bond. The cysteine, histidine, andasparagine/aspartate constitute the catalytic triad charac-teristic of cysteine proteases such as papain.
While all four discussed catalytic residues are strictlyconserved in the Josephin domain (Fig. 6), a function-ally relevant disordered loop (E144–N164/V79–Q100)
Fig. 6. Structure-based multiple sequence alignment of the Josephin domains of ataxin-3 homologues with the crystallographically determined UCH
domains of human UCH-L3 and yeast YUH1. The known DSSP secondary structure assignments of UCH-L3 and YUH1 are shown at the top of
the alignment (curled lines for a-helix, arrows for b-strands). The corresponding consensus secondary structure predictions for human ataxin-3 and
josephin 1 are also depicted. Alignment columns with identical residues are highlighted in purple-coloured boxes, those with more than 50%
physico-chemically similar amino acids in yellow boxes (bold-printed letters). Text labels (including UCH-L3/YUH1 and ataxin-3/josephin 1
residue numbers) point to catalytic residues (four grey-shaded boxes) and to other highly conserved amino acids in the Josephin domain. The PDB/
SPTrEMBL identifiers of UCH-L3 and YUH1 are 1uch/P15374 and 1cmxA/P35127, respectively. NCBI or Ensembl accession numbers for
Josephin domain homologues are given in Table S3.
� FEBS 2004 Analysis of ataxins 2 and 3 (Eur. J. Biochem. 271) 3163
positioned over the catalytic cleft is aligned in the lessconserved central part. This loop maintains an inaccessibleactive site, but becomes ordered upon binding of Ubal [154].Therefore, it may control substrate specificity together withfurther strongly conserved amino acids such as N88/L13,which forms hydrogen bonds with main chain groups of theloop, and Y167/W120 next to the catalytic histidine [154].
Unfortunately, the structure of the central part and theloop function remains unclear for the Josephin domainbecause of insufficient sequence similarity to UCHs. TheJosephin domain is also missing the N-terminal extensionsof UCHs, which are involved in substrate recognition [154].In addition, a functional relevance of a second strictly
conserved histidine H17, two highly conserved asparaginesN20 and N21, and another identical glutamine Q24downstream of the catalytic cysteine C14 cannot be derivedeither from the structural model of the Josephin domain(Figs 6 and 7). However, considering their distance from theactive site and location inside the protein, theymay be solelyimportant for the stability of the domain fold. This may alsohold true for the strictly conserved S135 and P140 after thecatalytically active N134. In contrast, it is easy to interpretan alternative splice variant of ataxin-3 [10], which consistsof a deletion of the residues from E10 to Q64 includingthe catalytic cysteine and thus cannot possess proteolyticactivity.
Fig. 6. (Continued).
3164 M. Albrecht et al. (Eur. J. Biochem. 271) � FEBS 2004
Comparison to other polyQ proteins
The polyQ stretch of both ataxin-2 and ataxin-3 lies insequence regions whose degree of conservation is very lowin contrast to the globular domains and which arepredicted to be intrinsically unstructured. This predictionhas been confirmed experimentally for ataxin-3 [72], andpolyQ tracts themselves also adopt a random coilconformation [159]. So we decided to investigate otherpolyglutamine disease proteins such as ataxin-1 (SCA1),ataxin-7 (SCA7), atrophin (DRPLA) and huntingtin (HD)as to whether their polyQ regions are also predicted to besurrounded by disordered structure. For this purpose, weused several online prediction servers (DisEMBL, DISO-PRED, GlobPlot, NORSp, PONDR), which consensusbasically indicates that the polyQ tracts are generallylocated in unstructured regions (Table S4). This is also inagreement with secondary structure prediction results,which do not indicate globular domains consisting ofa-helices or b-strands (data not shown), and othercomputational predictions of locally unfolded regions[160]. Therefore, it is not surprising that mutant polyglu-tamine proteins can readily form aggregates via thesolvent-exposed polyQ region.
Conclusions
We presented a detailed analysis of ataxin-2 homologuesincluding the yeast homologue Pbp1, using a structure-based multiple sequence alignment of Sm and Sm-likeproteins and a 3D model of the Lsm domain of ataxin-2.Our comparison revealed a high degree of conservation ofchemical properties forRNA-binding residues in the alignedLsm domains in general and between Sm1 from P. abyssiand human ataxin-2 in particular. Based on this observa-tion, we propose that ataxin-2 is capable of binding RNAby the identified residues. Therefore, an essential function ofataxin-2 homologues in RNA processing should beexplored experimentally and could implicate the regulationof polyadenylation of mRNA as it is known for Pbp1. Inaddition, the similarity of amino acids involved in theformation of Lsm domain oligomers as derived from theD1–D2 and D3–B heterodimers may suggest that ataxin-2may also form such complexes.
Our structural model of the Josephin domain of ataxin-3confirms the evolutionary relationship with deubiquitinat-ing cysteine proteases of the UCH family. Interestingly, thisrelates ataxin-3 to another ubiquitin hydrolase termedUSP14, which is involved in synaptic dysfunction in ataxic
Fig. 7. 3D model of the deubiquitinating Josephin domain of ataxin-3 using the structure of yeast YUH1 bound to the ubiquitin-like inhibitor Ubal (in
CPK view mode) as template (PDB identifier 1cmx, chains A and B, respectively). Grey-shaded text labels indicate the four catalytic residues (ball-
and-stick view) forming the active site of the ubiquitin hydrolase. The remaining text boxes point to other residues, which are highly conserved in the
Josephin domain. Residues are coloured in agreement with the alignment columns in Fig. 6. The N-terminal extension of YUH1, which is missing
in ataxin-3 homologues, is depicted in the background as thin dark brown protein backbone only. The less conserved central part of ataxin-3 is
shown in green; it could not be modelled reliably using YUH1 as template because of low sequence similarity.
� FEBS 2004 Analysis of ataxins 2 and 3 (Eur. J. Biochem. 271) 3165
mice [161,162].Moreover, the polyglutamine disease proteinataxin-1 interacts with the ubiquitin-specific protease USP7/HAUSP, and the length of the polyQ region influences thestrength of the interaction [163]. Unfortunately, the centralpart of the Josephin domain is difficult to model because oflow sequence similarity. Therefore, it cannot be deducedwhether the ataxin-3 mechanism of ubiquitin recognitionworks similarly to UCHs.
It is striking that both human ataxin-3 and its P. falci-parum homologue ataxin-3_Pf can bind the N-terminaladaptor domain of the molecular chaperone VCP at theirC-termini, although they differ considerably and an evolu-tionary relationship is not apparent: ataxin-3 contains thepolyQ region, but the P. falciparum homologue has aubiquitin-like UBX domain. Another open question iswhether a deubiquitinating analogue of ataxin-3 exists inyeast, since the Josephin domain is not found in any yeastprotein, but complexes of the ataxin-3 and proteasomebinding proteins hHR23A/B, VCP and UFD2a are con-served in yeast. Generally, it remains to be seen how thenormal functions of ataxin-2 and ataxin-3 are affected inmutant proteins with an expanded polyglutamine tract.
Acknowledgements
Part of this research was funded by the German Research Foundation
(DFG) under contract no. LE 491/14–1, by the Federal Ministry of
Education and Research (BMBF) under contract no. 01gs0115-NV-
S02T12, and by the European Commission through the EUROSCA
project under contract no. LSHM-CT-2004-503304.
References
1. Kawaguchi, Y., Okamoto, T., Taniwaki, M., Aizawa, M.,
Inoue, M., Katayama, S., Kawakami, H., Nakamura, S.,
Nishimura, M., Akiguchi, I. et al. (1994) CAG expansions in a
novel gene forMachado-Joseph disease at chromosome 14q32.1.
Nat. Genet. 8, 221–228.
2. Pulst, S.M., Nechiporuk, A., Nechiporuk, T., Gispert, S., Chen,
X.N., Lopes-Cendes, I., Pearlman, S., Starkman, S., Orozco-
Diaz, G., Lunkes, A., DeJong, P., Rouleau, G.A., Auburger, G.,
Korenberg, J.R., Figueroa, C. & Sahba, S. (1996) Moderate
expansion of a normally biallelic trinucleotide repeat in spino-
cerebellar ataxia type 2. Nat. Genet. 14, 269–276.
3. Evert, B.O., Wullner, U. & Klockgether, T. (2000) Cell death in
162. Ehlers, M.D. (2003) Ubiquitin and synaptic dysfunction: ataxic
mice highlight new common themes in neurological disease.
Trends Neurosci. 26, 4–7.
163. Hong, S., Kim, S.J.KaS., Choi, I. & Kang, S. (2002) USP7, a
ubiquitin-specific protease, interacts with ataxin-1, the SCA1
gene product. Mol. Cell Neurosci. 20, 298–306.
Supplementary material
The following material is available fromhttp://blackwellpublishing.com/products/journals/suppmat/ejb/ejb4245/ejb4245sm.htmAppendix. Supplementary online material.
3170 M. Albrecht et al. (Eur. J. Biochem. 271) � FEBS 2004