This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
www.sciencemag.org/cgi/content/full/1176620/DC1
Supporting Online Material for
Complete Resequencing of 40 Genomes Reveals Domestication Events and Genes in Silkworm (Bombyx)
Qingyou Xia, Yiran Guo, Ze Zhang, Dong Li, Zhaoling Xuan, Zhuo Li, Fangyin Dai, Yingrui Li, Daojun Cheng, Ruiqiang Li, Tingcai Cheng, Tao Jiang, Celine Becquet, Xun Xu, Chun Liu, Xingfu Zha, Wei Fan, Ying Lin, Yihong Shen, Lan Jiang, Jeffrey Jensen, Ines Hellmann, Si Tang, Ping Zhao, Hanfu Xu, Chang Yu, Guojie Zhang, Jun Li, Jianjun
Cao, Shiping Liu, Ningjia He, Yan Zhou, Hui Liu, Jing Zhao, Chen Ye, Zhouhe Du, Guoqing Pan, Aichun Zhao, Haojing Shao, Wei Zeng, Ping Wu, Chunfeng Li, Minhui Pan,
Materials and Methods SOM Text Figs. S1 to S7 Tables S1 to S10 References
Materials and Methods Sample collection
In order to include major silkworm systems kept in the laboratories worldwide, we collected strains from diverse geographic regions, such as China, Japan, Europe and tropical areas (mostly southeast Asian: India, Cambodia and Laos), as well as silkworms from the mutant system. All 29 domesticated samples listed in Table S1 are from the Institute of Sericulture and Systems Biology in Southwest University of China. Two important developmental characteristics, voltinism (number of generations per year) and moltinism (number of larval molts per generation), and sex were recorded for each of those 29 domesticated silkworms. Of these, 18 are monovoltine, 8 are bivoltine and others are polyvoltine. We also captured 11 wild silkworms from mulberry fields in China, facilitating the comparative analysis between domesticated and wild groups.
An advantage of the domesticated silkworm over other lepidopteran species is that many mutations (morphological, biochemical, and behavioral mutations) and many inbred geographic strains (e.g., Chinese, Japanese, Korean, European, Tropical strains) are available and represent important resources for studying artificial selection and silkworm domestication. It is likely that farmers first moved wild silkworms from field to house so that they could be reared to produce silk in a predator free environment. Then high silk production traits and easy handling may have evolved by artificial selection. Sequentially, they were brought by human to different countries in the world through commercial trade. Finally, the domesticated silkworm underwent long-term rearing and breeding by local farmers, forming geographically different varieties with specific characteristics (such as voltinism and moltinism) affected by local climate. Currently, these geographic varieties are maintained in different stock centers and preserved by close inbreeding within each variety. Library construction and sequencing
Genomic DNA was extracted from silkworm pupae and moths using a standard protocol for genomic DNA extraction. We only sequenced a single individual for each variety of both domesticated and wild silkworms. The manufacturer’s instructions (S1) were followed to prepare libraries. We used the workflow, as described (S2), to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking, and denaturization and hybridization of the sequencing primers. Then we applied a base-calling pipeline (SolexaPipeline-0.3) (S1) to detect sequences from the raw fluorescent images. Public data used
The silkworm reference genome sequence and annotation information were downloaded from the Silkworm Genome Database (S3, S4). We reconstructed silkworm chromosomes by joining genomic scaffolds with 500 bp N’s, according to their mapping relationship (S5). Unmapped scaffolds were joined by 500 bp N’s to form a chromosome UN. Because the insert size of paired-end (PE) libraries is less than 500 bp, we used gaps of 500 bp N’s to make the chromosomes for convenience of analysis. Additional sequences containing complete CDS were retrieved from NCBI as of Feb. 8th, 2009. We then made a non-redundant annotation file by comparing those two datasets. Microarray-base gene expression data for the domesticated silkworm, whose genome is taken as reference, was downloaded from BmMDB (S6). Reads mapping
We used SOAP v1.09 (S7) to map raw single-end (SE) and PE reads onto the finished silkworm reference genome (S5). Reads were classified into three categories, “uniquely aligned” (those with unique alignment positions), “repeatedly aligned” (those that can be mapped to multiple genomic locations with the same least base differences; only one randomly chosen
chromosome position was reported) and “unaligned” reads. The same trimming strategy, as described (S2), was applied when dealing with mismatches. PCR duplications were removed by a PERL script which discards pairs with identical outer coordinates to improve the accuracy. SNP calling
A four step procedure was utilized to detect SNPs. (1) We used SOAPSNP (S8) to calculate the likelihood of each individual’s genotypes. (2) We integrated all the individual likelihood files together to produce a pseudo-genome for each site in the total sample of 40 genomes by maximum likelihood estimation (MLE). Sites passing criteria according to copy number, sequencing depth, quality score and minor allele count, were kept for the following rank sum test adjustment. SNPs passed the rank sum test (S2) (P>= 0.005) were fixed as members of the high quality (HQ) SNP set. (3) For domesticated strains as a whole, another pseudo-genome for domesticated group was made without filtering. Polymorphic positions overlapped with HQ SNPs were retained as SNPs for the domesticated silkworms. We took a similar process for the wild ones and obtained a SNP set of them. (4) We allocated base types back to each individual based on genotypes of HQ SNPs and each individual likelihood file. The genotype with the largest likelihood was directly chosen as the consensus genotype in each individual. Short indel detection
A three-step approach was used to call indels. (1) For each individual, we conducted a second run of SOAP, allowing for gaps. Individual indel sets were obtained by a pipeline developed before (S2). (2) For each genomic position supported by at least one individual indel set, reads from all the samples were considered to pass the filtering criteria with number of supporting reads. The resulting indels was termed high quality indels. (3) Assigned indels back to each individual. From the high quality indel set, we picked indel sites in each individual with at least one supporting read. Experimental validation of SNPs and indels
We used the Sequenom Genotyping Platform (S9) to validate HQ SNPs picked randomly according to their characteristics mentioned in the section “SNP calling”. We genotyped 4,840 sites in 121 SNP positions across all the 40 silkworms and confirmed that 117 were polymorphic.
As a pilot phase for indel validation, we randomly selected 10 high quality indel positions and found these indels 69 times in the 40 samples. Then we performed PCR-Sanger dideoxy sequencing using AB 3730XL at those sites. After manually checking all the intensity trace files we found all the polymorphic positions were confirmed by the PCR-sequencing result. Detection of structural variations (SV)
A three-step strategy was used to detect the SVs for 36 silkworms with PE sequencing reads. (1) SVs were called individually, as described (S2), and regions of at least 2 supporting abnormal read pairs were retained for the second step. (2) We treated all the PE reads from 36 silkworms as from a single individual and maintained for the next step potential candidates with (a) at least 10 abnormally mapped supporting pairs and (b) at least 2 qualified individuals each with at least 2 supporting pairs. The resulting SVs were termed as high quality SVs. (3) We assigned high quality SVs back to each individual. Individual SVs with 80% of its length overlapping with any high quality SV were reported. Calculation of Linkage Disequilibrium (LD)
To measure LD level in the silkworm population, we calculated correlation coefficient (r2) of alleles after setting -maxdistance 200 -dprime -minGeno 0.6 -minMAF 0.1 -hwcutoff 0.001
by the software Haploview (S10). Then curves were plotted with R scripts which draw averaged r2 against pairwise marker distances. Domestication associated site (da-SNP, da-indel) detection
Genomic polymorphic sites where at least 28 domesticated strains and at least 10 wild ones have unique reads, corresponding to a minimal concordance rate of 95%, were chosen to enter the χ2 test for domestication association. Then a Bonferroni corrected P value of 2.96×10-8 and 1.87×10-6 was used to screen out significant da-SNPs and da-indels, respectively. Construction of silkworm phylogeny
Individual SNPs generated after step (4) of the SNP calling section were used to calculate distances between silkworms. The p-distance between two individuals i and j is defined to be
( )
1
1 Ll
ij ijl
D dL =
= ∑ ,
where L is the length of regions where HQ SNPs can be identified, and given the alleles at position l are A/C, then
( )
0, if genotypes of the two individuals are and ,0.5, if genotypes of the two individuals are and ,0.5, if genotypes of the two individuals are and ,1, if genotypes of the two ind
lijd =
AA AA
AA AC
AC AC
ividuals are and .
⎧⎪⎪⎨⎪⎪⎩ AA CC
Then a neighbor-joining method was used to construct the phylogenetic tree on the basis of the distance matrix calculated by the software PHYLIP 3.68 (S11). Bootstrap values were calculated in 1,000 replicates. PCA analysis
Following the procedure of (S12), we considered only autosomal data with n=40 individuals, and ignoring sites with more than two alleles or missing data (S=14,056,247 SNPs). The genotype of individual i at SNP k was transformed to dik=0, 1 or 2 if individual i is homozygous for the reference allele, heterozygous, or homozygous for the non-reference allele, respectively. M is an n×S matrix containing the normalized genotypes: dik’=(dik-E(dk))/
( ) (1- ( ) / 2) / 2k kE d E d× , where E(dk) is the mean of dk. An n×n matrix of the sample covariance of the individuals was calculated by X=MMT/S. The eigenvector decomposition of X was performed using the R function eigen and the significance of the eigenvectors was determined with a Tracey-Widom test implemented in the program twstats provided with the EIGENSOFT software (S12). We obtained the latitude and longitude of the capital of a province or country of origins with Google Earth program (for Europe we took the center define by Google Earth). Correlations between phenotypes and eigenvalues were tested with Kendall’s τ statistics (S13). Population structure inference
First, ped files were created as input for PLINK (S14, S15) with parameters --ped ped_file --recode12 --geno 0.5 --map output_map.
Then the program frappe (S16, S17) was utilized to infer population structure and ancestry information of the silkworms. The analysis was based on 13,066,429 SNP sites and we did not assume any prior information about their ancestry. We run 10,000 iterations and pre-defined the number of cluster, K, from 2 to 9.
Population history model In order to understand the impact of the initial domestication event on observed levels of
variation, we fit a simple bottleneck model to the data. The following parameters are assumed: domestication occurred 5,000 years ago, there is one generation per year, and there was a stepwise reduction in variation at the time of domestication. We here estimate both the severity of the population reduction, and the rate of population growth subsequent to that event.
Two criteria are used to fit a bottleneck model. First, we use the empirically observed level of reduction, determined by the observation that the domesticated strains harbor ~83% of the variation observed in the wilds (with a ratio of 0.015/0.018). Second, we fit the estimated demographic model to the observed site frequency spectrum. In order to fit a model to both the observed level of reduction and the frequency spectrum, we take a simulation approach. Using the program ms (S18), a grid of parameter values were simulated, varying from a population size reduction at the time of domestication from 1% to 99%, and an exponential rate of growth ranging from no increase in population size, to a 1000-fold increase from the time of domestication to the present. Identification of Genomic Regions of Selective Signals (GROSS)
A sliding window approach was applied to quantify the polymorphism levels (θπ, pairwise nucleotide variation as a measure of variability) (S19), selection statistics (Tajima’s D, a measure of selection in the genome) (S20) and genetic differentiation between domesticated and wild populations (Fst) (S21). Our analysis was performed for 5 Kb windows sliding in 500bp steps and SNPs for each population were from subsection (3) in the “SNP calling”. We developed a series of PERL scripts that consider genotype frequencies in the two groups and calculate values of θπ, Tajima’s D for both groups, Fst between the two populations following the formulas for those statistics (S19-21).
Then we considered the distribution of PiR (defined to be the ratio of θπ,domesticated to θπ,wild), and the distribution of TDD (Tajima’s D for domesticated silkworms). We used an empirical procedure and selected windows with significantly low PiR and significantly low TDD values (Z test, P<0.005 for both; Fig. 2A) as candidates of selection signals along the genome. Neighboring windows were joined where possible, forming larger regions (GROSS). Microarray analysis for genes in GROSS
The microarray data of these genes in GROSS came from the Bombyx mori microarray database (S6). Hierarchical clustering of the data was performed with the program Cluster (S22), and the cluster data were visualized using the program TreeView (S22).
Supporting Text Data production
We performed whole-genome resequencing for each silkworm varieties using the Illumina Genome Analyzer II (GA II) and produced 1.50 billion short reads (averaging 42 bp in length), which corresponds to 63.25 Gb raw data. In total, we obtained a 118.1 X effective depth for all 40 varieties, with an average depth of 3X for each variety (Table S2A). The mean genome coverage for domesticated and wild silkworms was 82.0% and 83.0%, respectively, and the mean gene region coverage was 91.8% and 94.2%, respectively. Mapping results for domesticated and wild strains are summarized in Table S2B. We observed ~5% higher of bases mapped for domesticated silkworms than for wild ones and ~0.6% lower mismatch rate for the domesticated strains, both of which can be due to the high genetic diversity between reference genome and wild strains. However a higher average sequencing depth for the wild ones compensate this difference and the resulting genome/gene region coverage are comparable between the two groups. Variation detection
Making full use of the massive number of short reads provided by next-generation sequencing technology, the approach we took in this report can effectively cover around 80% of each individual’s genome at a depth of 3X for the 432MB sequence. Guided by a “pool to individual” strategy (see Materials and Methods for details), we can detect high quality SNPs. Of the identified SNPs, 3,504,749 (21.9%) were within genes (introns and exons) and 422,815 (2.64%) were in the coding sequences (CDS) (Table S3A). We estimated that the ratio of synonymous to non-synonymous changes in the CDS was 2.91:1.
We can also identify short indels (1-3 bp) as well as structural variations in a similar way. It would be difficult to confidently detect individual genomic variants at such a depth per individual, unless the population-level information of 118.1X coverage was taken into account. We found that only 1,433 (0.46%) of the indels are in the CDS, and 1,014 of these would cause a frameshift affecting 866 genes (Table S4A). For structural variation detection, we found a mean length of 560 bp, and genomic deletions comprise 98.8% of them, which can be explained by the limitation of short insert size. Mutation
We calculated mutation rates for SNPs in different functional categories. We found that, for every functional class, the value for wild varieties is higher than for the domesticated ones (Table S3B). This observation is from calculating the estimate of the population mutation rate θS (S23), which corrects for sample size in the two groups (29 domesticated vs. 11 wild). Accordingly, we also noticed a higher θS value (Mann Whitney U, P=7.69×10-6) for indels in wild silkworms compared to domesticated ones (Table S4B). In comparison to Gallus gallus, for which this information is available (S24), silkworms have a two fold higher level of θS at CDS, intron and genome-wide levels (Table S3B).
We also estimated θπ values for SNPs in B. mori and found they are 0.0061, 00136 and 0.0136 for CDS, intronic regions and whole genome, respectively. θπ values in B. mandarina are 0.0070, 0.0157 and 0.0153 for these three categories, respectively. Compared with Drosophila simulans (S25), all of these data are at a lower polymorphism level. Linkage disequilibrium (LD) pattern
We assessed the linkage disequilibrium (LD) levels in the silkworm domesticated and wild varieties by calculating the pairwise LD measure r2 (S26, see Materials and Methods) and present curves representing LD decay with increasing genomic distance between SNP pairs (Fig. S1). We find that LD decays rapidly in silkworms, with r2 decreased to half of its maximum at a distance of around 46 bp and 7 bp for the domesticated and wild varieties, respectively. The faster
decay of LD in B. mori as compared to the decay of LD measurement in D. melanogaster [which also decreases rapidly to half of its maximum value at about several hundreds bp (S27)] is likely due to a higher recombination rate of 2.97 cM/Mb (S28) in the silkworm genome as compared to 1.59 cM/Mb (S29) for the fruitfly, as well as to high effective population sizes. The relatively slower decay of LD in the domesticated strains is most likely caused by inbreeding within each strain, although population structure, reduced effective population size, and a possible increased rate of positive selection may also have contributed. These results show that association mapping combining multiple domesticated strains is possible but can be confounded by the extensive population structure and inbreeding. By contrast, association mapping based on wild individuals will be difficult due to low levels of LD.
As sample size is an important parameter influencing LD patterns, we randomly selected 11 domesticated silkworms to perform this analysis to adjust the sample size. For chromosome 2, we repeated the analyses for three independent sets of 11 randomly selected domesticated silkworms and found similar results. Demography of silkworms
In the PCA analysis, there is a significant correlation with voltinism for the first four principle components in the domesticated varieties. Moltinism (number of larval molts per generation) also correlates with eigenvector 1 and 3. We observed a significant correlation between latitude of the sample origins and eigenvectors 2 and 4 (Kendall’s τ, P=0.03 and 0.04, respectively) (Table S7), and a lack of connection between longitudes and any of the principle components. These key traits relating to silkworm biology and yield are defining genetically distinct subgroups, suggesting that genetic mapping of these traits may be complicated by the general genetic differentiation between strains with different molting and voltine values. Mapping studies may benefit from using varieties with large differences in the relevant moltism and voltinism traits, but with otherwise little genetic differentiation. After fitting the demographic model (see Materials and Methods), we observed that a 90% reduction in population in the domesticated variety could account for the observed levels of variability (Fig. S2). The surprisingly high levels of variability in the domesticated variety suggest that a large amount of individuals were used in the initial domestication event. An alternative hypothesis is substantial gene-flow between the wild and domesticated varieties after domestication, but the very clear differentiation between domesticated and wild varieties suggests that gene-flow from the wild to domesticated varieties may not have been strong. The distinct separation of strains does show that the genetic variation in the domestic strains has been maintained despite local inbreeding. It is commonly assumed that domestication leads to a significant reduction in variability (S30) because the domesticated species might have arisen from a geographically limited group of individuals and thus subjected to a bottleneck in population size during domestication, and they have been subjected to strong artificial selection subsequent to the domestication event. In many domesticated species [e.g., rice (S31) or wheat (S32)] the domesticated species contains much less variability at the nucleotide level than the corresponding wild species. We did not, however, find that these factors have been sufficiently strong enough in the silkworm to lead to extensive loss of genetic variability.
We also inferred population ancestry with frappe (S16) and no ancestral information was assumed before the calculation. For K=2, the results show a clear domesticated/wild split (Fig. S3). This is consistent with the phylogeny and PCA results derived from our data. When K = 3, a new component including D5, D7, D15, D16 and D24 was separated from the entire domesticated group, also consistent with the same subgroup in the phylogenetic tree. From K = 3 to 4, another sub group emerged including D17-D23, D27 and D28, which clustered together in the phylogeny. When K = 5, the two Japanese high silk production strains stand out as a new group. At K = 6 or above, additional clusters came out as outlier populations which disturb previous organization of the population structure and make little biological sense.
Details of GROSS To determine if certain SNPs were more common in the domesticated strains, we adopted a complex trait association study methodology (S33). We treated domesticated and wild individuals as phenotypically distinct and conducted a series of association tests for each qualified SNP (Materials and Methods). In total, we found that 1,347 of the polymorphic sites were significantly different (Chi square; P<2.96×10-8) in their association with domesticated versus wild varieties (termed domesticated associated SNPs, or da-SNPs), and that 410 (30.4%) of these lie within 298 genes (Table S8).
Looking at the domesticated vs. wild variety association of the indels, we found that 34 indel sites were significantly different (Chi square; P<1.87×10-6) in their association with domesticated versus wild varieties (termed domesticated associated indels, or da-indels). We found that more than 45% of all the da-SNPs are located in GROSS; this indicates that da-SNPs, which may be in the initial stages of becoming SNPs fixed in the domesticated group, are enriched in GROSS compared to genomic background.
We found 212 GROSS contain only one gene, which means that approximately 60% (212/354) of all the genes (Table S9) found to be potentially important to domestication are unique to a GROSS (Table S10). This indicates that most GROSS genes were probably under selection by themselves, and had little chance to have experienced hitchhiking. Genes likely important for domestication are found in GROSS
In addition to GROSS genes enriched in silk gland, we also found midgut- and testis- enriched genes. While the former is related to metabolism of carbohydrates, amino acids and lipids, which play an important role in food digestion and nutrient absorption, the latter is annotated as having binding, catalytic, and motor activity related to reproduction.
Among 32 midgut-enriched genes, nine participate in the dietary protein digestion (serine protease), carbohydrate metabolism (malate dehydrogenase and pyruvate dehydrogenase), substance transporting (organic cation transporter, sodium- and chloride- dependent glycine transporter 2, ATP-binding cassette transporter, and zinc transporter 5), and lipid metabolism [fatty acid binding protein (FABP) and scavenger receptor]. The malate dehydrogenase gene in B. mori shares 57% amino acid sequence similarity with its homolog in Escherichia coli, in which the mutant results in decreased activity of its encoded enzyme (S34). FABP is mainly involved in the binding and transport of unsaturated fatty acids, such as linolenic and linoleic acids, both of which are essential to silkworm and, like in other animals such as human (S35), can only be absorbed through food uptake (mulberry leaves for the silkworms). Artificial diet-based nutrition research has confirmed that there is a 60% of weight loss in silkworms fed on food without those two unsaturated fatty acids, compared to the ones in the control group (S36). The identification of these genes involved in energy metabolism indicates that the energy metabolism process has been under artificial selection in the process of silkworm domestication.
Among 54 testis-enriched genes, five genes are involved in spermatogenesis: permidine synthase, sperm protein SSP411, t-complex-associated testis expressed 1, intersex, and shaggy. In addition, three genes are related to sperm motility: myosin class II heavy chain, outer dense fiber of sperm tails protein 2, and axonemal dynein intermediate chain inner arm i1. These results provide evidence for possible selective pressure on B. mori reproduction during the domestication process. Additional notes
Genome-wide single base-pair level genetic variation maps have only been generated for species with small genomes, including yeast (S37), Salmonella (S38), Plasmodium falciparum (S39), and human rhinovirus (S40). For larger genomes, no comprehensive single-base resolution maps are currently available, although high-density SNP maps have been built for human (S41)
and mouse (S42), and moderate-density ones for chicken (S24), dog (S43), sheep (S44), and cattle (S45). Our strategy here provides a nearly complete genome level variation map, which gives more reliable information on genetic polymorphisms in a population.
There are two sub-populations of B. mandarina, Chinese wild silkworms (from China, each with 28 chromosomes, the same as B. mori) and Japanese wild silkworms [from Japan, each with 27 chromosomes (S46)], and a common viewpoint of silkworm domestication (S47) states that the domesticated silkworms were tamed from the Chinese wild ones. Although this statement is the basis of our effort presented in this paper, mitochondrial results took advantage of these 40 samples and public data of the Japanese wild silkworm (NCBI Accession Number: NC_003395) does support compelling evidence of this argument (Li et al., personal communication).
B. mori is not only well adapted to human handling, but is wholly dependent on humans for survival, in addition it is well-differentiated trait-wise from its wild cousin. Of equal importance, this event took place in a different geographical region (Asia vs. the Fertile Crescent) (S48) and in a distinctly different culture from the earliest known domestication events. These aspects make silkworm domestication a unique event in agricultural history, deserving the same kind of attention as the domestication of livestock and crop plants. We directly tested for selection related specifically to domestication by comparing variability in domesticated versus wild, and sorting out genomic regions with significant difference in polymorphism density between those two groups (e.g., Fig. S7). Although others are in the pipeline, it is unprecedented to have such a source of near-relatives in this clade for comparative genome analysis which can be aimed not only at identifying genes associated with domestication in the candidate GROSS we detected, but also for annotating and defining regulatory regions which can complement our knowledge about functional elements in the silkworm genome.
Supporting Figures
Fig. S1
Fig. S1. Linkage disequilibrium (LD) patterns. LD measured by r2 decays with pairwise marker distance suggesting a bottleneck at the time of domestication. The inset shows details of this trend for the first 100 bp. The maximum of r2 for domesticated and wild varieties, at the pairwise distance of 1 bp, are 0.829 and 0.733, respectively. When LD drops to half of the maximal levels, on average, SNP positions are 46 bp (r2
domesticated=0.412) and 7 bp (r2wild=0.348) apart for the
domesticated and wild varieties, respectively.
Fig. S2
Fig. S2. A bottleneck model estimation to illustrate silkworm domestication. Simulations showed that a 90% reduction in domesticated population size could account for the maintenance a ~83% variation of the wild varieties.
Fig. S3
Fig. S3. Population structure for the 40 silkworms. Number of ancestral populations, K, are set from 2 to 5 (top to bottom).
Fig. S4
Fig. S4. WEGO result: functional annotation for genes in GROSS.
Fig. S5
Fig. S5. A two-way hierarchical cluster analysis of the expression patterns of 159 GROSS genes in different Dazao tissues. Microarray signals for different tissue types (columns) and genes (rows) are shown, with continuous expression levels from dark green (lowest) to bright red (highest). A/MSG: anterior/middle silk gland; PSG: posterior silk gland.
Fig. S6
Fig. S6. Comparison of the relative expression of bHLH genes in the silk gland of fifth larval-instar of the reference B. mori strain and a high silk production strain. The relative expression of bHLH genes was assessed by quantitative real-time polymerase chain reaction (qRT-PCR) analysis. BmActin gene was used as internal control and the highest relative quantities were set to 1. We found that bHLH is up-regulated four fold in the higher silk production strain compared to the reference strain on day 3 of the fifth larval instar.
Fig. S7
Fig. S7. An example GROSS containing only one gene Sgf-1 which is important to silk production. Density of polymorphism (θπ), test statistics for selection (Tajima’s D), diversity between two populations (Fst), and genome annotation are shown (from top to bottom). Both θπ and Tajima’s D for the domesticated and wild varieties are shown in red and green, respectively.
Supporting Tables
Table S1. Silkworm samples and detailed traits. Voltinism characterizes generation per year and moltinism denotes the number of larval molts per life cycle. (*: “V1” represents monovoltine, “V2” bivoltine and “V3” polyvoltine. #: “M2” represents bimoulting, “M3” trimoulting, “M4” tetramoulting and “M5” pentamoulting.)
Sample ID Strain name Sex Voltinism* and moltinism# System or location Other traits and comments Latitude Longitude
D01 J7532 Male V2M4 Japan High silk production, hybrid strain 35.69 139.69 D02 J04-010 Female V1M4 Japan - 35.69 139.69 D03 J872 Unknown V2M4 Japan High silk production, hybrid strain 35.69 139.69 D04 J106 Male V2M4 Japan - 35.69 139.69 D05 N4 Female V2M4 Japan - 35.69 139.69 D06 Cambodia Male V3M4 Cambodia - 11.54 104.90 D07 LaoⅡ Female V3M4 Laos - 17.97 102.61 D08 India M3 Male V2M3 India - 28.64 77.23 D09 Europe18 Female V1M4 Europe - 54.53 15.26 D10 Italy16 Female V1M4 Italy, Europe - 41.87 12.57 D11 Soviet Union No.1 Female V1M4 Former SU, Europe - 55.76 37.62 D12 15-010 Unknown V1M5 Mutation - NA NA D13 02-210 Female V1M4 Mutation - NA NA D14 15-001 Male V3M3 Mutation - NA NA D15 Mutation M2 Unknown V2M2 Mutation - NA NA D16 A06E Unknown V2M4 Guangdong province, China - 23.12 113.26 D17 Damao Unknown V1M3 Sichuan province, China - 30.66 104.08 D18 Ankang No.4 Male V1M3 Shanxi province, China - 34.26 108.95 D19 ZT500 Female V1M3 Gansu province, China - 36.07 103.75 D20 Zhugui Female V1M4 Zhejiang province, China - 30.27 120.15 D21 Bilian Female V1M4 Jiangsu province, China - 32.05 118.77 D22 ZT900 Female V1M3 Sichuan province, China - 30.66 104.08 D23 ZT100 Female V1M3 Hunan province, China - 28.20 112.98 D24 Sihong15 Male V1M4 Jiangsu province, China - 32.05 118.77 D25 Xiaoshiwan Female V1M4 Zhejiang province, China - 30.27 120.15 D26 C108 Female V2M4 Chongqing, China - 29.55 106.55 D27 Sichuang M3 Female V1M3 Sichuan province, China 30.66 104.08 D28 Qiansanmian Male V1M3 Guizhou province, China - 26.59 106.73 D29 Handan Male V1M4 Hebei province, China - 38.03 114.48 W01 B. mandarina Ziyang Unknown Unknown Sichuan province, China - 30.66 104.08 W02 B. mandarina Nanchong Unknown Unknown Sichuan province, China - 30.66 104.08 W03 B. mandarina Hongya Unknown Unknown Sichuan province, China - 30.66 104.08 W04 B. mandarina Pengshan Unknown Unknown Sichuan province, China - 30.66 104.08 W05 B. mandarina Ankang Unknown Unknown Shanxi province, China - 37.87 112.57 W06 B. mandarina Yichang Unknown Unknown Hubei province, China - 30.57 114.29 W07 B. mandarina Yancheng Unknown Unknown Jiangsu province, China - 32.05 118.77 W08 B. mandarina Luzhou Unknown Unknown Sichuan province, China - 30.66 104.08 W09 B. mandarina Hunan Unknown Unknown Hunan province, China - 28.20 112.98 W10 B. mandarina Suzhou Unknown Unknown Jiangsu province, China - 32.05 118.77 W11 B. mandarina Rongchang Unknown Unknown Chongqing, China - 29.55 106.55
Table S2. Data production. (A) Sequencing summary.
Table S10. Number of genes per GROSS. Gene # per GROSS GROSS #
1 212
2 42
3 9
4 4
5 3
Supporting References and Notes S1. http://www.illumina.com/. S2. J. Wang et al., Nature 456, 60 (2008). S3. J. Wang et al., Nucleic Acids Res. 33, D399 (2005). S4. Silkworm Genome Database (http://silkworm.swu.edu.cn/silkdb/ or
http://silkworm.genomics.org.cn/). S5. The International Silkworm Genome Consortium, Insect Biochem. Mol. Biol. 38, 1036 (2008). S6. BmMDB (http://silkworm.swu.edu.cn/microarray/). S7. R. Li, Y. Li, K. Kristiansen, J. Wang, Bioinformatics 24, 713 (2008). S8. R. Li et al., Genome Res. 19, 1124 (2009). S9. http://www.sequenom.com/. S10. J. C. Barrett, B. Fry, J. Maller, M. J. Daly, Bioinformatics 21, 263 (2005). S11. J. Felsenstein, (2005). S12. N. Patterson, A. L. Price, D. Reich, PLoS Genet. 2, e190 (2006). S13. M. Kendall, Biometrika 30, 81-89 (1938). S14. S. Purcell et al., Am J. Hum. Genet. 81, 559 (2007). S15. http://pngu.mgh.harvard.edu/ purcell/plink/. S16. H. Tang, J. Peng, P. Wang, N. J. Risch, Genet. Epidemiol. 28, 289 (2005). S17. http://med.stanford.edu/tanglab/software/frappe.html. S18. R. R. Hudson, Bioinformatics 18, 337 (2002). S19. F. Tajima, Genetics 105, 437 (1983). S20. F. Tajima, Genetics 123, 585 (1989). S21. M. Nei, Molecular evolutionary genetics. (Columbia University Press, New York, 1987). S22. http://rana.stanford.edu/software/. S23. G. A. Watterson, Theor. Popul. Biol. 7, 256 (1975). S24. G. K. Wong et al., Nature 432, 717 (2004). S25. D. J. Begun et al., PLoS Biol. 5, e310 (2007). S26. W. G. Hill, A. Robertson, Theor. Appl. Genet. 31, 881 (1968). S27. S. J. Macdonald, T. Pastinen, A. D. Long, Genetics 171, 1741 (2005). S28. K. Yamamoto et al., Genome Biol. 9, R21 (2008). S29. M. Beye et al., Genome Res. 16, 1339 (2006). S30. P. Gepts, R. Papa Evolution during Domestication. In: ENCYCLOPEDIA OF LIFE
SCIENCES. John Wiley & Sons Ltd, Chichester (2002). S31. Q. Zhu, Mol. Biol. Evol. 24, 875 (2007). S32. A. Haudry, Mol. Biol. Evol. 24, 1506 (2007). S33. M. I. McCarthy et al., Nat. Rev. Genet. 9, 356 (2008). S34. S. K. Wright, R. E. Viola, J. Biol. Chem. 276, 31151 (2001). S35. G. K. Balendiran et al., J. Biol. Chem. 275, 27045 (2000). S36. T. Ito, Nutrition and artificial diets of the silkworm, Bombyx mori. (Nihon-Sanshi-Shinbun
Press, Tokyo, 1983). S37. G. Liti et al., Nature 458, 337 (2009). S38. K. E. Holt et al., Nat. Genet. 40, 987 (2008). S39. J. Mu et al., Nat. Genet. 39, 126 (2007). S40. A. C. Palmenberg et al., Science 324, 55 (2009). S41. K. A. Frazer et al., Nature 449, 851 (2007). S42. K. A. Frazer et al., Nature 448, 1050 (2007). S43. K. Lindblad-Toh et al., Nature 438, 803 (2005). S44. J. W. Kijas et al., PLoS ONE 4, e4668 (2009). S45. R. A. Gibbs et al., Science 324, 528 (2009). S46. M. R. Goldsmith, T. Shimada, H. Abe, Annu. Rev. Entomol. 50, 71 (2005).
S47. K. P. Arunkumar, M. Metta, J. Nagaraju, Mol. Phylogenet. Evol. 40, 419 (2006). S48. C. A. Driscoll, D. W. Macdonalda, S. J. O’Brien, Proc. Natl. Acad. Sci. USA. 106, 9971 (2009).