Reference Genome Sequencing Of Conifers
PineRefSeq: An adaptive approach to the sequencing of large conifer genomes
Nicholas Wheeler: University of California, Davis
University of California, Davis
Children’s Hospital Oakland Research Institute
Johns Hopkins University
University of Maryland
Indiana University
Texas A&M University
Washington State University
United States Department of Agriculture
National Institute of Food and Agriculture
pinegenome.org/pinerefseq
Our Guiding Principles
EMPOWERMENT. Our goal is to develop the technologies, platforms and bioinformatics infrastructures required to rapidly and inexpensively sequence large and complex genomes of coniferous forest trees. This will allow the forestry community to begin sequencing the many genomes of economic and ecological importance without a dependence on centralized genome centers.
ADAPTIVE. We recognize that sequencing technologies are developing rapidly and that we must have the expertise and flexibility to rapidly adopt new approaches into our overall sequencing strategy.
COMPARATIVE. We recognize the power of comparative genomics approaches in assembling and annotating genome sequences and will use this approach throughout the project.
OPEN ACCESS. We have a policy of sharing all data generated from this project with the research community
Reference Genome Sequence: For any given organism (species), the complete and ordered “assembly” of DNA, as denoted by the nucleotides A, T, C, and G.
Genome Sequencing – A Short History
A 5-year plan (FY 1991 to 1995) detailing the goals of the U.S. Human Genome Project was presented to members of congressional appropriations committees in mid-February, 1990. According to the document, "a centrally coordinated project, focused on specific objectives, is believed to be the most efficient and least expensive way" to obtain the 3-billion-bp map of the human genome. In the course of the project, especially in the early years, the plan states that "much new technology will be developed that will facilitate biomedical and a broad range of biological research, bring down the cost of many [mapping and sequencing] experiments, and find application in numerous other fields."
Human Genome News, May 1990; 2(1) Five-Year Plan Goes to Capitol Hill
James Watson Francis Collins Craig Venter
Source: WikiPedia Source: Michigan St University Source: WikiPedia
Genome Sequencing – A Short History (continued)
The rate of publication of plant genomes, updated in late 2011
Figures courtesy of the CoGePedia and Phytozome.org websites.
A phylogenetic tree of all plants with published full genomes as of May 13, 2012
Existing & Planned Angiosperm Tree Genome Sequences As of mid-2012
1 Genome size: Approximate total size, not completely assembled. 2 Number of Genes: Approximate number of loci containing protein coding sequence. 3 Status: Assembly / Annotation versions
Species Genome Size1
(Mbp) # of Genes2 Status3
In Progress with Draft Assemblies
Populus trichocarpa Black Cottonwood 500 ~40,000 2.0 / 2.2
Eucalyptus grandis Rose Gum 691 ~36,000 1.0 / 1.1
Malus domestica Apple 881 ~26,000 1.0 / 1.0
Prunus persica Peach 227 ~28,000 1.0 / 1.0
Citrus sinensis Sweet Orange 319 ~25,000 1.0 / 1.0
Carica papaya Papaya 372 -
Amborella trichopoda Amborella 870 -
Betula nana Dwarf Birch 450 - 1.0 / -
In Progress or Planned – No Published Assemblies
Castanea mollissima Chinese Chestnut 800 -
Salix purpurea Purple Willow 327 -
Quercus robur Pedunculate Oak 740 -
Populus spp. and ecotypes Various Various -
Azadirachta indica Neem 384 -
Existing and Planned Gymnosperm Tree Genome Sequences As of mid-2012
1 Genome size: Approximate total size, not completely assembled. 2 Number of Genes: Approximate number of loci containing protein coding sequence. 3 Status: Assembly / Annotation versions; See http://www.phytozome.net for all publicly released tree genomes. Conifer genomes will also be posted here as they are completed.
Species Genome Size1
(Mbp) # of Genes2 Status3
Gymnosperms
Picea abies Norway Spruce 20,000 ? Pending
Picea glauca White Spruce 22,000 ? Pending
Pinus taeda Loblolly Pine 24,000 ? Pending
Pinus lambertiana Sugar Pine 33,500 ? Pending
Pseudotsuga menziesii Douglas-fir 18,700 ? Pending
Larix sibirica Siberian Larch 12,030 ? Pending
Pinus pinaster Maritime Pine 23,810 ? Pending
Pinus sylvestris Scots Pine 23,000 ? Pending
Technological Advances Facilitate Sequence Acquisition
Figure credit: modified from Jill Wegrzyn, UC Davis
Why Do We Need a Conifer Genome Sequence?
Fundamental Genetic Information Phylogenetic Representation Ecological Representation Development of Genomic Technologies Economic Importance
Photo credit: Nicholas Wheeler, UC Davis
Challenges to Sequencing a Conifer Genome
Image Credit: Modified from Daniel Peterson, Mississippi State University
Elements of a Conifer Genome Sequencing Project Approaches to Resolving Challenges
Figure Credit: Nicholas Wheeler, University of California, Davis
Assembling the Reference Sequence Based on Whole Genome Shotgun Sequencing
Figure Credit: Nicholas Wheeler, University of California, Davis
Acquiring the Sequence Target Genome, Appropriate Tissues for DNA & RNA
Haploid Haploid megagametophyte tissue 1N Shotgun sequenced
Diploid Diploid needle tissue 2N 40 Kb cloned fosmids, pooled and sequenced
Figure Credit: Nicholas Wheeler, University of California, Davis
Sequencing Strategy
40X to 60X 1X to 5X 1X to 5X 1X to 5X
Genome Equivalents Figure Credit: Nicholas Wheeler, University of California, Davis
Visit the Broad Institute for details on DNA preparation, library construction, and sequencing technology of Illumina HiSeq
Whole Genome Shotgun Sequencing Millions of Short “Reads”
Figure Credit: Nicholas Wheeler, University of California, Davis
Sequencing DNA from Pools of Fosmid Clones Fosmid Library Construction
Genomic DNA cloned into fosmid vectors represents a source of stable genomic fragments of approximately 30 to 40 kb.
Figure Credit: Modified from Maxim Koriabine, Children’s Hospital Oakland Research InsHtute
Sequencing DNA from Pools of Fosmid Clones Preparing Fosmids for Sequencing
Assembly of complex genomes with a high level of repetitive DNA is facilitated by reducing the complexity of the “puzzle”.
Figure Credit: Modified from Maxim Koriabine, Children’s Hospital Oakland Research InsHtute
Jumping Libraries Illumina Mate-Pair, Clone-Free
Jumping Libraries Fosmid Di-Tag Cloned
Figure Credit: Modified from Maxim Koriabine, Children’s Hospital Oakland Research InsHtute
Assembling the Reference Sequence Based on Whole Genome Shotgun Sequencing
Figure Credit: Nicholas Wheeler, University of California, Davis
Assembling the Reference Sequence The Essence of Assembly
This general approach is called OLC or Overlap-Layout-Consensus.
Figure Credit: Nicholas Wheeler, University of California, Davis
Find all k-mers (short DNA sequences of length k) and build a graph. • Every k-mer is a node • Two nodes are linked with an edge if they share k-1 nucleotides
De Bruijn Graph Assembly Approach
Figure Credit: Modified from Aleksey Zimin
OLC
Benefits
• Can deal with variable length reads and reads from different sequencing platforms
• Overlaps can be long and thus more reliable
• Overlaps do not have to be exact
• Can resolve repeats of up to read size
Drawbacks
• Computationally intensive, number of overlaps grows quadratically with the number of reads
Comparing OLC and Graph Assembly Approaches
Graph
Benefits
• Computationally efficient to find paths in the graph
• Don’t have to find overlaps; they are implicit in the de Bruijn graph
Drawbacks
• The graph is very large; approximately one node per base
• Errors in the reads create spurious branches in the graph – requires error correction
• Max. size of k-mer is limited by the shortest read size
Assembly Strategy
Figure Credit: Nicholas Wheeler, University of California, Davis
Dense Genetic Maps Aid Assembly
snp0_14875_01-457 snp0_18789_01-316snp0_6869_01-53 snp0_2615_02-1900.0snpCL1912Contig1_01-7012.5snp0_14614_01-530 snpCL1590Contig1_01-149snpCL4371Contig1_02-2445.0rflpPitaIFG_2899_4 snp0_17343_02-417snp0_15011_02-197 snpUMN_4478_02-286.7rflpPitaIFG_2530_2 snp0_13608_01-303snpUMN_822_01-99 snpCL343Contig1_01-1347.7snp2_6100_01-4911.5snp2_6100_02-10612.5snp0_5239_02-321 snpCL4270Contig1_06-172snp0_8457_01-191 snpUMN_2867_01-6914.1rflpPitaIFG_1916_315.0snpUMN_1409_01-323 snpUMN_819_01-4419.7snpCL745Contig1_03-203 snpUMN_2867_01-36120.4snp0_13835_01-34123.4rflpPitaIFG_2697_A25.5snp0_18258_02-276 snpCL317Contig2_05-32329.6snp0_648_01-97 snpCL1400Contig1_02-14430.9rflpPitaIFG_2006_A33.9estPitaIFG_20G2_a estPitaIFG_9053_arflpPitaIFG_1D9_138.9snp0_6709_02-58842.2snp0_12488_01-289 snpCL14Contig4_03-193snpUMN_2913_01-11843.0snp0_16257_01-349 snp0_16546_01-159snpUMN_CL58Contig1_03-17445.3snp0_5898_01-51 snp2_792_01-319snpCL229Contig1_03-7245.8snp2_3444_01-5147.3snp0_12074_01-41147.7snp0_12074_01-66 snpCL1198Contig1_04-129snpCL2342Contig1_01-40748.1snp0_15329_01-7153.0snp0_18284_01-76155.5snp2_4841_01-7257.1snp2_3447_02-51657.2snp0_9482_01-34858.8estPitaIFG_8537_a snp0_15417_01-13863.3snp2_3494_01-19668.1estPitaIFG_1576_a estPitaIFG_2253_arflpPitaIFG_1576_1 rflpPitaIFG_2253_ArflpPitaIFG_3008_2
70.5
snpUMN_2740_01-228 snp2_9199_01-49572.5snp0_12156_02-134 snp0_7494_02-31373.9snp0_12133_02-42775.0snp2_6033_01-35776.0snpCL3492Contig1_02-18877.8snp0_1302_01-3979.2snp0_10693_01-146 snpUMN_2993_01-167snpUMN_7192_01-17483.6snp0_4265_02-5385.4snp2_1833_01-23585.5snp0_11588_02-103 snpCL4725Contig1_01-135snp0_13063_02-47486.4snpCL3817Contig1_05-29687.7snp0_7457_01-27795.2snp0_17383_01-149 snp0_15466_02-169snp2_5499_02-136 snpCL4036Contig1_03-10899.6snpCL4255Contig1_03-125100.7snpUMN_3450_01-409101.8snpUMN_3450_01-72102.9snpUMN_1142_01-250105.3estPitaIFG_9102_a snp0_15901_01-587108.3snpCL77Contig1_04-124108.9rflpPitaIFG_701_1109.1snpCL2101Contig1_04-233113.9snpCL1894Contig1_03-38114.6snpUMN_3332_01-383115.1snp0_6617_01-108117.4snpCL3054Contig1_01-82117.5snpUMN_2770_01-367118.7snp0_4344_01-218122.7snp2_3934_01-32123.2rflpPitaIFG_658_A snp0_8304_02-414snpUMN_579_01-348 snp2_3934_01-356123.8snp0_13957_02-302125.5snp0_10712_02-123127.3snp0_2366_01-98129.8estPitaIFG_2053_b130.5snp2_4150_02-20134.1rflpPitaIFG_2441_1136.2snp2_9100_01-505 snpUMN_1023_01-376snpUMN_CL290Contig1_08-336 snp0_11003_02-50snpUMN_7016_02-75
138.3
snpUMN_188_01-55 snpUMN_4702_01-557139.2snp0_14744_01-155 snp2_2559_01-303snpCL1917Contig1_03-86140.0rflpPitaIFG_975_2140.7snpUMN_4702_01-130142.7snpUMN_2001_01-235143.0snp0_1722_01-574 snpCL199Contig3_02-149144.5snp0_17123_02-380147.0snp0_3964_01-453148.3snpCL2037Contig1_03-241149.7rflpPitaIFG_2882_1 rflpPitaIFG_2897_ArflpPitaIFG_2931_A snpUMN_5830_01-60snpCL358Contig2_01-76 snpUMN_828_01-73
150.2
rflpPitaIFG_2393_1 rflpPitaIFG_2963_2150.8snp0_15361_01-225154.4rflpPitaIFG_1636_6 snp0_14824_01-185155.2snp2_6541_01-385 snp0_1145_01-361156.7snp2_10215_01-53168.2
cromosome9
snpCL3786Contig1_02-700.0snpCL3715Contig1_01-1281.1snp0_17113_01-873 snpCL4664Contig1_05-1012.5snp0_16084_01-134 snp0_17254_01-5023.4estPitaIFG_9156_a4.9rflpPitaIFG_2479_16.3snp0_11865_01-66 snp0_13026_02-3309.3estPitaIFG_1956_a snp0_14069_01-49snpCL647Contig1_04-50 snpUMN_2253_01-136snp0_17756_02-191
11.8
snp0_11344_02-107 snpUMN_3214_01-16515.9snp2_6397_01-90 snpCL519Contig2_06-195snp2_1618_01-52416.9snp2_10236_01-5420.7rflpPitaIFG_2323_A snp2_6240_02-230snpCL1451Contig1_02-60 snpCL4272Contig1_03-63snp0_14492_01-284 snp0_7497_02-94
22.0
snp0_2223_01-10623.1snpUMN_CL148Contig1_02-21924.2snpCL1747Contig1_04-7626.4snp0_1984_02-21626.6snp0_17933_01-41127.6rflpPitaIFG_2994_228.7rflpPitaIFG_669_4 rflpPitaIFG_669_A32.7snpCL634Contig1_03-19635.0snp0_13755_03-52 snpCL747Contig1_06-15139.5snpCL4321Contig1_01-7639.8snp0_1443_01-80 snp0_7321_01-4240.1snp0_14724_01-327 snpCL2411Contig1_05-7342.3snp0_11214_01-408 snp0_6699_01-33743.4snp0_11214_01-8844.5snp0_18619_01-15345.6snpCL2651Contig1_03-10347.6snpCL1595Contig1_02-62 snp0_3787_01-5848.6snp0_15140_01-592 snp2_7278_02-128snp0_6323_01-240 snpCL3399Contig1_05-10051.8snp0_16837_02-42153.2snp0_2751_01-35353.7snp2_2020_03-78 snp0_17143_02-7156.0snpCL1694Contig1_04-199 snpUMN_487_01-10065.2snp2_5548_01-65466.2estPitaIFG_2290_a rflpPitaIFG_125_2rflpPitaIFG_459_1 snp0_8286_03-183snpCL4592Contig1_03-180 snp0_13646_01-394snp0_16697_01-76 snp0_4022_01-328snp2_9699_01-160
71.7
snp2_5438_01-465 snp0_10903_02-555snp0_16697_01-338 snp0_783_01-233snp2_9226_01-67
72.8
snp0_14575_01-153 snp2_3770_02-282snpCL4041Contig1_02-11174.0snpCL529Contig1_04-89 snp0_17521_01-385snp2_7619_01-19174.5snp0_10446_01-24076.2snp2_1638_01-77 snpCL148Contig1_02-53snpUMN_5551_01-21977.9snp0_13841_01-11880.3estPitaIFG_0500_a81.3snpUMN_CL309Contig1_03-208 snpCL1057Contig1_03-5183.1snp2_4405_01-8185.8snp0_18710_03-17887.0snp0_17881_01-276 snpCL2431Contig1_01-459snpUMN_CL228Contig1_03-18187.8snp2_684_01-374 snpCL2253Contig1_01-187snp0_3461_01-618 snpUMN_3353_01-43890.9snpUMN_5161_01-258 snpUMN_CL228Contig1_03-49291.8snp2_974_02-19892.4snpCL3116Contig1_03-207 snp2_9294_01-29493.9snp2_7852_01-52597.4snpCL2416Contig1_06-36099.2snp0_4172_02-205 snp0_711_01-591101.9snpCL3094Contig1_03-125105.5snp0_13455_01-122 snp2_4854_03-202106.3snp2_2160_02-645 snp0_18187_02-220107.0snp2_9930_01-56108.2snp0_7272_02-199112.3snp0_12149_01-86 snpUMN_1342_01-244113.7snp0_9276_01-32 snpCL305Contig1_05-249116.3estPitaIFG_8415_a snp0_1854_01-189snpCL3370Contig1_03-158 snpCL4425Contig1_04-168snp0_18287_01-124
117.6
snp2_8491_01-672 snpCL1868Contig1_02-211119.0snp0_17947_02-90120.1rflpPitaIFG_2899_A snp0_4755_01-301snp2_7196_01-423 snpCL3954Contig1_01-95122.2snp0_14873_01-155123.3snp0_1644_01-792123.4snpUMN_CL22Contig1_02-466125.5snpUMN_CL326Contig1_05-421126.6snp0_2137_01-354128.4snp0_2137_01-66 snpUMN_5911_02-179129.6snp2_1534_02-96 snp0_5111_02-211snp2_6379_03-345 snpUMN_6365_02-31130.8snp0_17641_01-156 snpCL1999Contig1_03-156snpUMN_4383_01-585 snp0_10591_01-289snp0_12134_02-264 snp0_12973_02-367snpCL22Contig1_03-218
131.9
snp2_4191_01-104 snpUMN_239_01-90133.0estPitaIFG_8725_a rflpPitaIFG_2370_1135.1snp0_13620_01-163137.1snp0_7371_01-187 snp2_4569_01-615snp0_3665_02-347 snp2_7751_02-211138.2snp0_13311_02-812139.3snp0_13311_02-182140.4
cromosome10
snp0_4885_01-62 snp2_1728_01-3010.0snp2_945_01-771.7snp2_7707_02-3873.3snp2_7707_02-674.5snp0_7933_01-715.5snp0_18750_01-42 snp2_3973_01-105snpCL1757Contig1_01-187 snpCL4689Contig1_02-2755.6snp2_3475_02-1626.4snp0_12681_01-532 snp0_17543_01-196snp2_6504_01-4237.4snp0_630_01-86 snp2_10306_01-3349.1snpUMN_1633_02-5010.9snpUMN_1633_02-36511.3snp2_9060_01-46011.7snp0_18810_02-148 snp2_1808_02-5912.6snp0_10373_01-711 snp0_12076_01-31014.9snp2_4959_01-16519.4snp0_2217_01-143 snp0_378_01-442snp0_18318_02-50 snp0_2999_02-35021.7snp0_13185_01-12923.5snpCL2046Contig1_03-21324.4snp0_9473_02-33424.6snp0_18300_01-7325.9snpUMN_3258_01-33327.5snp0_16732_01-38628.3snp0_8795_01-33434.4snpCL4156Contig1_04-11535.8snp2_5095_02-155 snpCL4082Contig1_01-44237.1snpUMN_1866_01-41738.2snp0_9511_02-38839.2snp0_9511_02-4839.9snp0_5559_01-41242.0snp0_576_01-32847.0snp0_3203_01-231 snp0_14613_01-110snp0_8695_01-17248.0snpCL1864Contig1_04-18649.2snp0_1576_01-40751.6snp0_18075_03-9053.9snp0_10663_01-21358.6rflpPitaIFG_2986_A snpCL1536Contig1_05-12059.7snpCL1360Contig1_05-460 snp2_4137_02-672snpCL2311Contig1_01-7960.8snp2_6439_03-10661.8snp0_8823_01-30662.8snpCL1080Contig1_03-9064.2snp2_6453_03-24066.3snpCL1799Contig1_04-73767.3snpCL444Contig1_02-25668.4snp0_17386_01-8170.4snp2_7373_01-26471.3snpUMN_6889_01-6372.1snp0_1423_01-39073.8rflpPitaIFG_66_1 snp0_17822_02-15674.6snp0_7580_01-264 snpCL2136Contig1_01-6575.3snp0_7326_01-6376.7snpUMN_3932_01-24077.3rflpPitaIFG_1623_A77.9snp2_8315_02-24578.9snp0_10899_03-173 snp0_7881_01-38279.8snp2_795_02-13080.8snp2_8611_01-9681.9snp0_14750_02-25682.5snp0_17832_01-104 snpUMN_5395_01-43083.3snp0_2716_01-512 snp0_18624_02-6684.2snp2_394_01-25186.0snp0_15010_01-25987.2snp0_2302_01-41 snpUMN_4149_01-111snp0_7427_01-34088.7snp0_17194_01-11490.1snpUMN_3001_01-5690.7estPitaIFG_8939_a rflpPitaIFG_3006_193.0snp0_3671_02-181 snp0_9106_02-5594.5snp0_5688_01-443 snp0_17758_01-50098.2snpCL2160Contig1_01-6399.9rflpPitaIFG_48_1 snpUMN_5214_01-52103.2snp2_8909_02-75107.1snp2_2119_01-61112.3snpUMN_157_01-119114.9snpCL1778Contig1_04-72117.1snpUMN_CL78Contig1_01-192118.7snp0_4441_01-196 snpCL1027Contig1_04-35122.0snp0_886_02-324122.6snpCL3367Contig1_01-189123.7rflpPitaIFG_2536_1 rflpPitaIFG_2564_A127.8rflpPitaIFG_1A7_A134.8snp0_12494_03-189137.4snpCL1989Contig1_01-345140.1snpCL3466Contig1_01-91 snp0_7496_01-373143.0snp0_1675_01-136143.9snp2_7562_01-464152.3rflpPitaIFG_2538_B154.5snpUMN_3689_01-427156.1estPitaIFG_8569_a156.6rflpPitaIFG_2150_A rflpPitaIFG_2885_1rflpPitaIFG_2885_B snpCL352Contig1_03-122161.5rflpPitaIFG_2994_5 snp0_13929_02-186snp2_5345_01-159 snpUMN_6063_01-143snp0_15958_01-145 snp2_3141_01-131snp2_9087_01-114
164.4
snp0_4698_01-53165.7snp0_16524_02-57 snp2_5026_01-268166.8snp0_11199_01-376 snp0_1682_01-580snpCL1566Contig1_03-200 snpUMN_706_01-330170.2snp0_4924_01-230 snpCL133Contig2_06-75snpUMN_5166_01-35171.3snp2_9455_01-122173.6snp0_14991_01-174 snpCL193Contig1_03-104snp0_1078_01-477174.7
cromosome11
snpUMN_5143_01-4480.0snp0_10384_02-2741.1snp0_5193_01-86 snp0_17197_01-2203.4snp2_4484_02-622 snpCL104Contig2_02-46snp0_16860_01-85 snp2_6804_01-2894.5snp0_3980_03-200 snp0_8331_01-292snpCL4059Contig1_03-2255.0snp2_6804_01-5715.8snp0_14606_01-77 snp2_5005_01-7512.5snp0_14730_01-354 snpCL4330Contig1_03-120snp0_17456_01-15214.7snp0_5784_01-234 snpUMN_1422_01-26814.8snpUMN_3368_01-139 snp0_14093_01-19015.4snp0_17253_02-86 snpCL631Contig1_05-25816.9snp0_18830_01-22318.6rflpPitaIFG_1869_218.7snp0_9732_03-5720.9snp2_2505_02-39523.2rflpPitaIFG_1889_1 snp0_12998_01-3425.6estPitaIFG_8580_a26.6snp1_2647_01-41127.9snp0_10401_01-12630.3snpCL206Contig1_03-12333.7snpCL3061Contig1_03-99 snpCL4268Contig1_01-21333.8snpCL1468Contig1_01-184 snp2_10399_01-42534.9snpCL4144Contig1_06-16036.1estPitaIFG_4CH2_a rflpPitaIFG_503_1snp0_881_01-54337.3snp2_8987_01-6543.0snp0_13058_01-6843.8snp0_15106_02-49246.4snpCL64Contig1_07-13446.7snp0_15106_02-153 snp0_3570_01-43747.0snp0_13181_01-56648.4snp0_13898_01-4749.8snpUMN_592_01-404 snp2_3801_02-19452.8snp2_9629_02-40353.6snpCL1196Contig1_03-5455.4snp0_15238_01-44 snp0_13484_01-190snp0_572_02-5457.2snp2_10034_01-484 snp0_12535_01-57058.9snp0_12535_01-11759.7rflpPitaIFG_1919_1 snp2_10034_01-17760.6snp0_12396_01-7460.7snp0_7512_01-31265.4snpCL3925Contig1_03-16368.2snp2_6387_01-71169.0estPitaIFG_1635_a rflpPitaIFG_1635_2rflpPitaIFG_1635_A rflpPitaIFG_1635_C70.6snpCL996Contig1_03-6674.9snp0_15003_01-72 snp0_16653_02-19577.0snp0_2069_01-29080.1snp2_9997_02-44681.5snp0_10113_01-11984.0snpUMN_CL306Contig1_04-261 snp0_18411_02-105snpCL2215Contig1_03-40886.6snp2_6339_01-20387.2snp2_2356_02-88 snpCL2007Contig1_05-16892.8snp0_9867_01-94 snp2_4375_01-168snp0_1319_02-7993.9snpCL1099Contig1_03-89 snpCL1734Contig1_06-234snp0_6544_01-50 snp2_1402_02-59694.9snp0_17603_01-12595.0snpCL2034Contig1_03-312 snp0_7654_01-19296.2snp0_3251_01-442 snp2_2270_01-7998.4snp2_5574_02-450100.3snpCL3862Contig1_06-229100.8snp2_6130_01-57101.5snp0_13098_01-68 snp0_18322_01-117102.3snp2_4921_01-90105.6snp2_10071_02-104108.1snp0_14039_01-47114.4snp2_212_01-295 snp0_16706_02-326snp0_3657_02-90115.6snp0_16565_02-112117.2snp0_12163_01-286 snpCL3978Contig1_03-565118.8snp0_10950_02-148 snp2_6263_01-358snp0_2542_01-126 snpCL4404Contig1_01-145snpUMN_2192_02-42
120.1
snpCL1129Contig1_01-649 snpCL4404Contig1_01-464snpUMN_3522_01-441121.7snpCL4629Contig1_04-197122.5snp2_8019_01-33123.9snp0_489_01-89125.7snp0_4834_01-60126.5snp0_6047_02-245127.3snp0_15484_02-97128.1snp0_7115_02-341129.5snp2_3312_01-422130.2snp0_13202_01-360130.4snp2_6698_02-408131.3snp0_9993_01-50131.4snp0_1441_02-74132.4snpUMN_1956_02-75 snp0_17893_01-332133.2snp0_9296_01-189135.9snpCL3402Contig1_04-408138.8snpUMN_1397_01-416140.2snp2_7803_01-212141.0snp0_18301_01-115141.8rflpPitaIFG_2020_1 rflpPitaIFG_2361_1144.0snp2_10057_01-63 snp2_4323_01-429snp0_9314_01-64 snp2_4944_01-109146.9snpCL1453Contig1_01-181 snp0_9314_01-367148.0snp0_8544_01-128151.4rflpPitaIFG_3021_1 snp0_2234_01-128snp0_5740_01-61 snp2_4342_03-79snp2_6052_01-347
154.8
snp0_18225_02-54 snpUMN_5833_01-392156.2snp2_3140_01-95 snpUMN_4856_01-462157.6snp2_8381_01-116158.2rflpPitaIFG_1A7_6 rflpPitaIFG_2361_3159.7rflpPitaIFG_2994_3 snpCL1052Contig1_03-62snpCL4264Contig1_01-164162.1snp0_9868_01-383164.3snp0_17971_01-642 snp2_5974_02-51166.6snp2_4724_01-136168.0
cromosome12LG9 LG10 LG11 LG12
Figure Credit: Courtesy of Andrew Eckert and Pedro MarHnez-‐Garcia, University of California, Davis
Assembling the Reference Sequence Based on Whole Genome Shotgun Sequencing, Jumping Libraries, & Genetic Maps
Figure Credit: Nicholas Wheeler, University of California, Davis
Transcriptome (RNA) sequencing defines the genes expressed in different pine tissues
Figure Credit: Modified from Keithanne MockaiHs, Indiana University
How Does the Transcriptome Inform the Reference Genome?
Figure Credit: Modified from Jill Wegrzyn, University of California, Davis
What Else Can We Learn From Transcriptomics ?
• Genes and Pathways • Gene Function • Diagnostics • Gene Regulation • Proxy for other Omics
Library N, quality filtered
Nucleotides Transcripts Assembled
Mean Contig Length
Unique Transcripts
Pila Needles & Candles 454 (Newbler) 1,096,017 387,174,063 28,910
955 49,035
Pila Needle RNASeq (Trinity) 33,961
Psme Needles and Candles 454 (Newbler) 1,216,156 419,643,998 25,041
961 92,897 Psme Needle RNASeq (Trinity) 99,936
Pita Shoot 454 (Newbler) 874,971 205,284,775 62,342
1,124 48,842 Pita Callus 454 (Newbler) 882,199 344,842,307 37,322
Pita Stem 454 (Newbler) 934,760 310,498,816 43,234
Preliminary Results of Transcriptome Reference Sequencing for Three Conifer Species
Updated summary of transcriptome assemblies from 454 (CCG, JGI) and RNASeq (FS) in Psme (Douglas-fir), Pila (sugar pine), and Pita (loblolly pine).
Genome Annotation
Figure Credit: Modified from Jill Wegrzyn, University of California, Davis
The Functional Annotation Process
Figure Credit: Modified from Jill Wegrzyn, University of California, Davis
Gene Ontology
Biological Process Molecular Func=on Cellular Component A commonly recognized
series of events An elemental acHvity
or task or job Where a gene product
is located
Cell division Mitosis
Organelle fission
Protein kinase acHvity Insulin binding
Insulin receptor acHvity
Mitochondrion Mitochondrial matrix
Mitochondrial membrane
Figure Credit: Modified from Jill Wegrzyn, University of California, Davis
Transcriptome Ontology
Figure Credit: Modified from Barakat et al. BMC Plant Biology 2012, 12:38 -‐ hZp://www.biomedcentral.com/1471-‐2229/12/38
More Functional Annotation
Identifying and classifying regulatory sequences
Identifying and classifying genes that produce functional RNAs
• tRNA – Protein Synthesis
• rRNA - Protein Synthesis
• U snRNA – Splicing
• snoRNA - rRNA modification
• miRNA – Gene regulation
A Complete Annotation
Figure Credit: Modified from Jill Wegrzyn, University of California, Davis
Annotating Structural Features of Genome Sequences
• Repetitive sequence content & distribution
• Pseudogenes/gene families
• GC content
• Segmental duplication
• Centromere and telomere structure
Database Resource Requirements Project Level Genome Browsers
Goal: Provide capacity to capture, archive, curate, distribute, and analyze genomic information.
TreeGenes
Genome Database for Rosaceae
Identifying and classifying genes that produce functional RNAs
Public Databases Nucleic Acid Sequence Databases
ENA / EBI – European Bioinformatics Institute DDBJ – DNA Data Bank of Japan NCBI /GenBank – National Center for Biotechnology Info
NCBI Entrez
NCBI Entrez – The Life Sciences Search Engine
Version 0.6 Release: June, 2012 Version 0.8 Release: January, 2013
0
50
100
150
200
250
NCBI BLAST Against Loblolly Draft Reference
Jun -‐12 Jul-‐12 Aug-‐12 Sep-‐12 Oct-‐12 Nov-‐12 0
100
200
300
400
500
600
700
800
Jun-‐12 Jul-‐12 Aug-‐12 Sep-‐12 Oct-‐12 Nov-‐12
FTP Downloads of Loblolly Draft Reference
The Pine Reference Sequence in Use
Unique users
Figure Credit: Modified from Jill Wegrzyn, University of California, Davis
Year Common Name Scien=fic Name Assembly Size
(GB) Predicted Size
(GB) N50 Con=g
(KB) N50 Scaffold
(KB)
2011 Potato Solanum tuberosum L. 0.7 0.8 31.4 1320.0
2011 Orangutan Pongo abelii/pygmaeus 3.1 3.1 15.5 740.0
2011 Nake Mole Rat Heterocephalus glaber 2.7 19.3 1590.0
2011 AtlanHc Cod Gadus morhua 0.8 2.8 690.0
2011 Coral Reef Acropora digi<fera 0.4 0.4 10.7 190.0
2012 Gorilla Gorilla gorilla gorilla 2.9 11.9 914.0
2012 Oyster Crassostrea gigas 0.6 0.6 19.4 400.0
2013 Radish Raphanus sa<vus L 0.4 0.5 25.0
2012 Wheat Tri<cum aes<vum 5.5 17.0 0.6 0.6
2013 Loblolly Pine Pinus taeda 20.1 22.0 8.2 30.7
Genome Assembly StaHsHcs for Recently Sequenced Species
Key Website Resources
Human Genome Sites hZp://www.ornl.gov/sci/techresources/Human_Genome/project/hgp.shtml hZp://www.nature.com/nature/supplements/collecHons/humangenome/index.html hZp://www.sciencemag.org/content/300/5617/286.abstract
Resource – Broad Ins=tute for Illumina Sequencing Technology hZp://www.broadinsHtute.org/scienHfic-‐community/science/plaeorms/genome-‐sequencing/broadillumina-‐genome-‐analyzer-‐boot-‐camp
Photo Credits: Slide 5 hZp://en.wikipedia.org/wiki/Craig_Venter hZp://www.humaniHesandhealth.wordpress.com hZp://en.wikipedia.org/wiki/James_D._Watson . Figure Credits: Slide 6 hZp://www.phytozome.net/ hZp://genomevoluHon.org/wiki/index.php/Sequenced_plant_genomes
Key Website Resources
Slide 38 hZp://dendrome.ucdavis.edu/treegenes/ hZp://www.rosaceae.org/
Slide 39 hZp://www.ebi.ac.uk/ena/ hZp://ddbj.sakura.ne.jp/ hZp://www.ncbi.nlm.nih.gov/genbank/
Slide 40 hZp://www.ncbi.nlm.nih.gov/sites/gquery
Side 43 Download: hZp://loblolly.ucdavis.edu/bipod/fp/Genome_Data/genome/pinerefseq/Pita/v0.9/ BLAST: hZp://dendrome.ucdavis.edu/resources/blast/ Data Release Policy: hZp://www.pinegenome.org/pinerefseq/Data_Use_Policy.pdf
References Cited
Gibson, G., and S. V. Muse. 2004. A primer of genome science. (2nd Ed). Sinauer Associates, Sunderland, MA. Lesk, A. M. 2012. IntroducHon to genomics. (2nd Ed). Oxford University Press, New York.