RNA-sequencing: Taking Advantage of this Measurement Revolution October 1, 2013 Anne Deslattes Mays Wellstein/Riegel Laboratory Mentor: Anton Wellstein, MD, PhD 01/03/2022 Wellstein/Riegel Laboratory 1
04/12/2023 Wellstein/Riegel Laboratory 1
RNA-sequencing: Taking Advantage of this Measurement Revolution
October 1, 2013Anne Deslattes Mays
Wellstein/Riegel LaboratoryMentor: Anton Wellstein, MD, PhD
04/12/2023 Wellstein/Riegel Laboratory 2
Talk Outline
• On the Shoulders of Giants• Timelines• Personal Genome Project• RNA-Sequencing• Causality• Messenger Therapeutics
04/12/2023 Wellstein/Riegel Laboratory 3
Rosalind Franklin“pioneered use of x-rays to create images of unorganized matter – such as
large biological molecules – not just single crystals”
http://www.pbs.org/wgbh/aso/databank/entries/bofran.html
“Franklin made equipment adjustments to produce an extremely fine beam of x-rays. She extracted finer DNA fibers than ever before and arranged them in parallel bundles. Studied fibers’ reactions to humid conditions. … allowed her to discover cruical keys to DNA’s structure…. Wilkins shared this with Watson & Crick at Cambridge without her knowledge…”
04/12/2023 6
Computer Architecture Advances (64 bit)1961
IBM 7030 Stretch Supercomputer64 bit data words 32/64 bit instructions
1976
Cray-1 super computer 64-bit word architecture
1989
Intel i860 RISC processor“64-bit microprocessor”32 bit architecture3D graphics unit capable of 64 bit integer operations
1991
R4000 – 64 bit microprocessorSGI graphics workstation used this CPU
1992
DEC introduces pure 64-bit Alpha architecture
1997
IBM releases RS6464-bit PowerPC (partial)
1999
Intel releases instruction set for IA-64
2003
AMD Opteron and Athlon 64 processors (AMD64 – first x86 based 64 bit processorApple ships “G5” POWER PC CPU
2013
Apple announces iPhone 5s first 64 bit smartphone in the worldA7ARMv8 system on a chip
04/12/2023 7
Computer Architecture Advances (64 bit)1961
IBM 7030 Stretch Supercomputer64 bit data words 32/64 bit instructions
1976
Cray-1 super computer 64-bit word architecture
1989
Intel i860 RISC processor“64-bit microprocessor”32 bit architecture3D graphics unit capable of 64 bit integer operations
1991
R4000 – 64 bit microprocessorSGI graphics workstation used this CPU
1992
DEC introduces pure 64-bit Alpha architecture
1997
IBM releases RS6464-bit PowerPC (partial)
1999
Intel releases instruction set for IA-64
2003
AMD Opteron and Athlon 64 processors (AMD64 – first x86 based 64 bit processorApple ships “G5” POWER PC CPU
2013
Apple announces iPhone 5s first 64 bit smartphone in the worldA7ARMv8 system on a chip
04/12/2023 8
Computer Operating Systems (64 bit)1985
Cray releases UNICOS64 bit implementation of unix
1976
Cray-1 super computer 64-bit word architecture
1993
DEC releases DEC OSF/1 AXP Unix-like OSLater Named Tru64 UNIX
1991
R4000 – 64 bit microprocessorSGI graphics workstation used this CPU
1996
IRIX operating system supports 64 bit
2001
Linux first OS to support x86-64 (simulator – chip wasn’t there yet)
1999
Intel releases instruction set for IA-64
2003
Mac OS X 10.3 64 bit integer arithmetic support
2013
iOS7 AArch64 processors 65 bit kernal supporting 64 bit applications
Celera Infrastructure Choice 1998
Brian ReidPalo Alto IX Visit
1998
Bench marked TIGR assembler on available architecturesSGI, Sun SPARCIBM RISCDEC TRU64 Alpha
1998
DEC’s TRU 64 Architecture won out
1998
COMPAQ buys DEC
04/12/2023 Wellstein/Riegel Laboratory 13
http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference
04/12/2023 Wellstein/Riegel Laboratory 14
http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference
04/12/2023 Wellstein/Riegel Laboratory 15
http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference
04/12/2023 Wellstein/Riegel Laboratory 20
http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference
04/12/2023 Wellstein/Riegel Laboratory 21
http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference
Cancer Systems BiologyTaking advantage of measurement revolution
Declining sequencing costs, decreasing computing costsHow do you leverage all this data?
GEO May 25, 2012
GEO June 25, 2013
04/12/2023 Wellstein/Riegel Laboratory 23
Here is an example RNA-Seq Workflow
Experimental Design
Sample Collection
Quality Control Read Trimming
Differential Analysis
Transcript Identification
Pathway Analysis
FeatureDiscovery
Sequencing
04/12/2023 Wellstein/Riegel Laboratory 32
What is unique about RNA-Seq?
• Allows you to discover and profile the entire transcriptome of any organism
• No probes or primers to design• Novel transcripts• Novel isoforms• Alternative splice sites• Rare transcripts• cSNPS – all of this in one experiment
04/12/2023 Wellstein/Riegel Laboratory 36
How much RNA-sequencing data?1. 20 million paired end reads ~ 2 GB of data2. 100 million paired end reads ~ 10 GB of data
How much computation power?3. More memory, more processors, less time it takes to compute4. Outsource the analysis, still will need to store the results somewhere
Amazon web servicesS3 storageEC elastic cloud on demand computational facility
Georgetown University High Performance Computer Corematrix.georgetown.edu
UPENN Galaxy services
How much RNA-sequencing data, how much computation power and where do you go to compute?
04/12/2023 Wellstein/Riegel Laboratory 37
Galaxy is a web based tool committed to enable a researcher (more than just for RNA-Seq)
04/12/2023 Wellstein/Riegel Laboratory 39
How to visualize mapped results?
• UCSC Genome Browser (Gbrowse)• Integrated Genome Browser (IGB)• Integrated Genome Viewer (IGV)
Many shared formats, reading many of the outputs generated by the programs, ability to generate ones own tracks
What do RNA-Seq reads look like for GAPDH?
Repeat masked allowing 1/2 mismatched bases blat’d reads viewed in IGB 6.7.2
What does GAPDH look like in terms of quantitation?
TOTAL BM HPPRPKM 3SEQ Counts BLAT Reads RPKM 3SEQ Counts BLAT Reads
CD34 0.7 340 230 8 8 14BST1 19.7 5374 31 31 CD133 0.2 173 176 16 16 33THY1 0 7 4 4 A12 1 0A5 0 0ALK 0 9 24 0 0 3B9 0 0C1 0 0C2 0 0C7 0 0E7 0 0E9 2 0F6 0 0G12 0 0GAPDH 3013.2 727831 356289 120.8 5559 2670H3 0 0
Blat read raw counts ratio == 3Seq counts ratio ~= 130 to 1RPKM ratio ~= 24.3
04/12/2023 Wellstein/Riegel Laboratory 49
Given a list of differentially expressed Genes now enrichment analysis should be performed
• Enrichment analysis allows the researcher to leverage documented experiments which provide evidence for genes roles in pathways and functions that enable the researcher to determine the results and significance of their experiments
• DAVID– Gene ontology– Functional ontology
• Revigo– Output of David may be placed in REVIGO for further
interpretation and statistical exploration of significance of discovered sets of genes
04/12/2023 Wellstein/Riegel Laboratory 50
Using differentially expressed genes, biological pathways should be explored
• Differentially expressed genes are put into programs such as pathway studio or ingenuity
• Shortest path programs and• Canonical pathway analysis• Enables a researcher to reverse engineer the pathways
expressed in the course of a healthy response to a diseased response
• Ideally a pathway reveals the observed phenotype – connecting the expressed gene expression program with the phenotype – genotype – gene expression program to phenotype
04/12/2023 Wellstein/Riegel Laboratory 55
RNA-Sequencing: What is it good for?
• Transcript Annotation– Mutation identification– Isoform determination– Alternative Splice Variation
• Differential Gene Expression– Phenotypically segregating experiments– Allows us to get at the How in looking at the response of an
organism within a particular cell population to events– Good and careful design will allow us to unfold the
dynamics of this response and identify targets for altering disease responses to improve ones chances of surviving
04/12/2023 Wellstein/Riegel Laboratory 56
Thank-you
Dr. Anton WellsteinDr. Anna Riegel
Dr. Marcel SchmidtDr. Elena TassiThe entire lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Eric the entire Wellstein/Riegel laboratory My Committee Dr. Yuri GusevDr. Anatoly DritschiloDr. Michael JohnsonDr. Christopher LoffredoDr. Habtom RessomDr. Terry Ryan (external committee member)
High Performance Core Group, Steve Moore, especially Woonki ChungAmazon Cloud ServicesDr. Ann Loraine, UNC, IGB DeveloperBrian Haas, Author Trinity SuiteKeygene
04/12/2023 Wellstein/Riegel Laboratory 57
Some Resources
• http://personalgenome.org• http://rnaseq.uoregon.edu/index.html• http://dx.doi.org/10.1038/npre.2010.4282.1 (DESeq)• http://galaxy.psu.edu/• http://seqanswers.com/• http://www.broadinstitute.org/igv/• http://bioviz.org/igb/index.html• http://www.illumina.com• http://www.otogenetics.com• http://www.dnanexus.com• http://bioconductor.org/packages/2.12/bioc/html/limma.html• http://trinityrnaseq.sourceforge.net/• http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html• http://cufflinks.cbcb.umd.edu/• http://brb.nci.nih.gov/BRB-ArrayTools.html• http://www.modernatx.com/
04/12/2023 Wellstein/Riegel Laboratory 58
Systems Biology History (wikipedia)
• Systems biology roots found in– Quantitative modeling of enzyme kinetics– Mathematical modeling of population growth– Simulations to study neurophysiology– Control theory and cybernetics
• Theorists– Ludwig von Bertalanffy – General Systems Theory– Alan Lloyd Hodgkin and Andrew Fielding Huxley – constructed a
mathematical model that explained potential propagating along the axon of a neuron cell
– Denis Nobel – first computer model of the heart Pacemaker
04/12/2023 Wellstein/Riegel Laboratory 59
Scientific knowledge is limited (and advanced) by the limits (and advancements) of measurement
• Ilya Shmulevich Genomic Signal Processing “Validity of the model involves observation and measurement, scientific knowledge is limited by the limits of measurement”
• Erwin Shrödinger Science Theory and Man: “It really is the ultimate purpose of all schemes and models to serve as scaffolding for any observations that are at all means observable”