Supporting Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis Given that the analysis using dozens of algal and protist genomic datasets showed that homologs of all of the S genes are expressed, we asked whether some of them might be differentially expressed in response to stress. This question was motivated by two ob- servations: (i ) The composite genes entered eukaryote nuclear genomes via primary endosymbiosis, and therefore they may still retain ancient cyanobacterial functions even when fused to novel domains, and (ii ) many of the S genes are redox enzymes or encode domains involved in redox regulation, and therefore their roles may involve sensing and/or responding to cellular stress resulting from the oxygen-evolving photosynthetic organelle. To address this issue, we inspected RNA-seq data from organisms that encoded particular S genes that had either been generated under conditions of cellular stress or spanned the light–dark transition. For genes present in Chlamydomonas reinhardtii, we used data from ref. 53 that compared triplicate transcript data from this alga grown in standard Tris-acetate-phosphate (TAP) medium (54) or TAP with the addition of 200 mM NaCl. The composite gene families 10, 24, and 18 showed significant dif- ferential expression (DE) under salt stress [up-regulation in both cases (P = 4.14e-4, P = 0.0268, and P = 0.0077, respectively)]. These genes encode bacterial–cyanobacterial domain fusions that are involved in stress responses (i.e., rhodanese domain in 10) and in preventing protein misfolding (i.e., DnaJ/Hsp40 do- main at the N terminus of 24). Interestingly, the DnaJ domain is fused to an upstream SCP superfamily region that likely forms an extracellular domain. This novel protein is found only in green algae and plants, and may play a role in responding to salt stress. For S genes present in the diatom Phaeodactylum tricornutum, we used RNA-seq data from ref. 30 that compared cultures grown under control and nitrogen (N)-depleted conditions. This analysis showed that out of six families present in this alga (2, 1, 28, 20, 35, 61), three are differentially expressed under N stress (28, 35, 61). One of these is family 28, which encodes an N-terminal bacterium-derived, calcium-sensing EF-hand domain fused, intriguingly, to a cyanobacterium-derived region with sim- ilarity to the plastid inner-membrane proten import component Tic20, which acts as a translocon channel. This fused protein may use Ca 2+ as a signal for protein import, and is found in several stramenopile (brown algal) species. The homolog in P. tricornutum (NCBI gi:219117465) is significantly down-regulated (P = 2.57e-23) under N depletion. Finally, for S genes present in Arabidopsis thaliana, we used RNA-seq data reflecting light-dependent DE in seedlings, coty- ledons, and roots (14). This analysis showed that four distinct S-gene families have DE in plant tissues in the presence of light (14, 19, 1, 18). One of the gene families showing DE is family 1, which is broadly distributed in algae and plants and is composed of a cyanobacterium-derived N-terminal hydrolase domain fused to a non-cyanobacterium-derived LPLAT family (lysophospho- lipid acyltransferases) domain. This gene was significantly up- regulated in all three plant tissues in the presence of light. Méheust et al. www.pnas.org/cgi/content/short/1517551113 1 of 5
5
Embed
Supporting Information...algae and plants, and may play a role in responding to salt stress. For S genes present in the diatom Phaeodactylum tricornutum, we used RNA-seq data from
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Supporting InformationMéheust et al. 10.1073/pnas.1517551113S-Gene Expression AnalysisGiven that the analysis using dozens of algal and protist genomicdatasets showed that homologs of all of the S genes are expressed,we asked whether some of them might be differentially expressedin response to stress. This question was motivated by two ob-servations: (i) The composite genes entered eukaryote nucleargenomes via primary endosymbiosis, and therefore they may stillretain ancient cyanobacterial functions even when fused to noveldomains, and (ii) many of the S genes are redox enzymes orencode domains involved in redox regulation, and therefore theirroles may involve sensing and/or responding to cellular stressresulting from the oxygen-evolving photosynthetic organelle. Toaddress this issue, we inspected RNA-seq data from organismsthat encoded particular S genes that had either been generatedunder conditions of cellular stress or spanned the light–darktransition. For genes present in Chlamydomonas reinhardtii, weused data from ref. 53 that compared triplicate transcript datafrom this alga grown in standard Tris-acetate-phosphate (TAP)medium (54) or TAP with the addition of 200 mM NaCl. Thecomposite gene families 10, 24, and 18 showed significant dif-ferential expression (DE) under salt stress [up-regulation in bothcases (P = 4.14e-4, P = 0.0268, and P = 0.0077, respectively)].These genes encode bacterial–cyanobacterial domain fusionsthat are involved in stress responses (i.e., rhodanese domain in10) and in preventing protein misfolding (i.e., DnaJ/Hsp40 do-main at the N terminus of 24). Interestingly, the DnaJ domain isfused to an upstream SCP superfamily region that likely forms an
extracellular domain. This novel protein is found only in greenalgae and plants, and may play a role in responding to salt stress.For S genes present in the diatom Phaeodactylum tricornutum,
we used RNA-seq data from ref. 30 that compared culturesgrown under control and nitrogen (N)-depleted conditions. Thisanalysis showed that out of six families present in this alga (2, 1,28, 20, 35, 61), three are differentially expressed under N stress(28, 35, 61). One of these is family 28, which encodes anN-terminal bacterium-derived, calcium-sensing EF-hand domainfused, intriguingly, to a cyanobacterium-derived region with sim-ilarity to the plastid inner-membrane proten import componentTic20, which acts as a translocon channel. This fused protein mayuse Ca2+ as a signal for protein import, and is found in severalstramenopile (brown algal) species. The homolog in P. tricornutum(NCBI gi:219117465) is significantly down-regulated (P = 2.57e-23)under N depletion.Finally, for S genes present in Arabidopsis thaliana, we used
RNA-seq data reflecting light-dependent DE in seedlings, coty-ledons, and roots (14). This analysis showed that four distinctS-gene families have DE in plant tissues in the presence of light(14, 19, 1, 18). One of the gene families showing DE is family 1,which is broadly distributed in algae and plants and is composedof a cyanobacterium-derived N-terminal hydrolase domain fusedto a non-cyanobacterium-derived LPLAT family (lysophospho-lipid acyltransferases) domain. This gene was significantly up-regulated in all three plant tissues in the presence of light.
Méheust et al. www.pnas.org/cgi/content/short/1517551113 1 of 5
Fig. S1. Taxonomic distribution of the 67 S genes discovered in our study. The taxonomic distribution of the data is shown with black boxes indicating a presence and white boxes indicating presumed absence in the genome or transcriptome data from the taxon.
Méheust et al. www.pnas.org/cgi/content/short/1517551113 2 of 5
Fig. S2. Mapping data for S genes in Picochlorum and A. thaliana. (A) S-gene family 31, which encodes RIBR + DUF1768 (PyrR) and is involved in riboflavinbiosynthesis. The CDS derives from the MMETSP database and encodes this 1,848-nt S gene from Picochlorum oklahomensis CCMP2329. The 6,609 RNA-seqreads mapped to it are from the closely related species Picochlorum SE3 (55). The homologous SE3 gene is encoded on Picochlorum contig 185.g609.t1.(B) S-gene family 23, which encodes a TPR repeat/RING and an ATP-dependent protease domain. This CDS also derives from the MMETSP database, andencodes the 1,554-nt S gene from P. oklahomensis CCMP2329. The 9,253 RNA-seq reads mapped to it are from Picochlorum SE3. The homologous SE3 gene isencoded on Picochlorum contig 43.g1 98.t1. (C) S-gene family 14, which encodes GIY–YIF superfamily and thioredoxin superfamily domains. This genic regionis from A. thaliana (ArGrxS16), and the 16,906 unique reads mapped to the exons are also from this species [see Table S2 for Sequence Read Archive (SRA)run accession numbers]. Thin blue lines indicate a spliceosomal intron. The mapping in all cases is colored in green, red, and blue for forward, reverse, andpaired-end reads, respectively. The fused domains and their putative annotations are shown. These data are typical for all of the Picochlorum and Arabidopsismappings and for many of the other plant and algal S genes when sufficient RNA-seq data are available (Table S2). This unambiguous, “deep” mapping acrossthe region that spans the domain fusion argues strongly against misassembly of this (and other) S genes.
Méheust et al. www.pnas.org/cgi/content/short/1517551113 3 of 5
Fig. S3. Genomic PCRs that targeted S genes identified in Picochlorum species. The contig number in the Picochlorum SE3 assembly is given for each S gene, asis the S-gene family number (see Table S2 for details). The PCR primers were complementary to regions at the 5′ and 3′ termini of the S genes to span thedomain-fusion region. The sizes of these S-gene CDS fragments matched the fragment sizes resulting from PCR amplification as follows: S gene 11 (1,015 nt), Sgene 23 (1,419 nt), S gene 4 (1,214 nt), S gene 34 (924 nt), and S gene 12 (1,119 nt). Sanger sequencing of these PCR fragments showed identity to the genomicregion in Picochlorum SE3, and BLASTX analysis using the fragments showed that each spanned the domain-fusion region in the respective S gene. The matchof the CDS size to the genomic region is explained by the paucity of spliceosomal introns in Picochlorum SE3. These data demonstrate that the tested S genesexist as intact fragments in this green alga.
Fig. S4. Maximum-likelihood (RAxML) (57) tree of species encoding S-gene family 31. This composite gene is limited to the Viridiplantae (Fig. S1) and encodesfused RIBR + DUF1768 (PyrR) domains that are involved in riboflavin biosynthesis. This manually trimmed alignment includes a selection of taxonomicallydiverse green lineage species and is of length 524 amino acids. The intact gene (see also mapping evidence in Fig. S2) was analyzed using the LG + Γ + Ievolutionary model with the results of 100 bootstrap replicates, when ≥50% are shown at the branches. The topology of this tree is consistent with theexpected phylogeny of Viridiplantae (e.g., 56), indicating an ancient origin of this S gene. The NCBI gi numbers are shown after each species name, whenavailable.
Méheust et al. www.pnas.org/cgi/content/short/1517551113 4 of 5