University of Iowa Iowa Research Online eses and Dissertations 2013 Mechanisms Of MicroRNA evolution, regulation and function: computational insight, biological evaluation and practical application Ryan Michael Spengler University of Iowa Copyright 2013 Ryan Spengler is dissertation is available at Iowa Research Online: hp://ir.uiowa.edu/etd/2636 Follow this and additional works at: hp://ir.uiowa.edu/etd Part of the Cell Biology Commons Recommended Citation Spengler, Ryan Michael. "Mechanisms Of MicroRNA evolution, regulation and function: computational insight, biological evaluation and practical application." PhD (Doctor of Philosophy) thesis, University of Iowa, 2013. hp://ir.uiowa.edu/etd/2636.
163
Embed
Mechanisms Of MicroRNA evolution, regulation and function ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
University of IowaIowa Research Online
Theses and Dissertations
2013
Mechanisms Of MicroRNA evolution, regulationand function: computational insight, biologicalevaluation and practical applicationRyan Michael SpenglerUniversity of Iowa
Copyright 2013 Ryan Spengler
This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/2636
Follow this and additional works at: http://ir.uiowa.edu/etd
Part of the Cell Biology Commons
Recommended CitationSpengler, Ryan Michael. "Mechanisms Of MicroRNA evolution, regulation and function: computational insight, biological evaluationand practical application." PhD (Doctor of Philosophy) thesis, University of Iowa, 2013.http://ir.uiowa.edu/etd/2636.
____________________________________ Title and Department
____________________________________ Date
1
MECHANISMS OF MICRORNA EVOLUTION, REGULATION AND FUNCTION:
COMPUTATIONAL INSIGHT, BIOLOGICAL EVALUATION
AND PRACTICAL APPLICATION
by
Ryan Michael Spengler
A thesis submitted in partial fulfillment of the requirements for the Doctor of
Philosophy degree in Molecular and Cellular Biology in the Graduate College of
The University of Iowa
May 2013
Thesis Supervisor: Professor Beverly L. Davidson
Graduate College The University of Iowa
Iowa City, Iowa
CERTIFICATE OF APPROVAL
_______________________
PH.D. THESIS
_______________
This is to certify that the Ph.D. thesis of
Ryan Michael Spengler
has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Molecular and Cellular Biology at the May 2013 graduation.
Thesis Committee: ___________________________________ Beverly L. Davidson, Thesis Supervisor
___________________________________ Adam Dupuy
___________________________________ John Logsdon
___________________________________ Andrew Russo
___________________________________ Yi Xing
ii
2
ACKNOWLEDGMENTS
First of all, I would like to thank my mentor, Dr. Bev Davidson, for her guidance,
motivation and most of all, patience. I also must acknowledge all the members of the
Davidson Lab, both past and present, who have been an invaluable source of knowledge,
discussion and guidance over the years. In particular, I would like to thank Ryan
Boudreau and Alex Mas Monteys with whom I have closely collaborated and who are
responsible for some of the work presented in this manuscript.
I also thank Dr. Anton McCaffrey, my first research mentor, who took the time to
train me in the basics of molecular biology techniques. He encouraged me to think
outside the box, and guided me as I learned to interpret data and manage my own
research projects.
I owe a special mention of thanks to the entire faculty in the Biology department
of Augustana College, who first taught me to think about science and encouraged me to
explore my own interests. Dr. Kristin Douglas and Dr. Dara Wegman-Geedey were
particularly amazing mentors who guided me in my own research. Dr. Douglas deserves
a special acknowledgement as she first introduced me to microRNAs, which sparked my
interest in the subject and led me to follow that interest in my graduate research.
Finally, words cannot truly describe my appreciation for the love and support my
family has given me over the years. My wife, Erin, most of all has been a vital source of
encouragement and the fact that I am writing this manuscript is in large part due to her
always being there for me.
Thank you all.
iii
3
ABSTRACT
MicroRNAs (miRNAs) are an abundant and diverse class of small, non-protein
coding RNAs that guide the post-transcriptional repression of messenger RNA (mRNA)
targets in a sequence-specific manner. Hundreds, if not thousands of distinct miRNA
sequences have been described, each of which has the potential to regulate a large number of
mRNAs. Over the last decade, miRNAs have been ascribed roles in nearly all biological
processes in which they have been tested. More recently, interest has grown in understanding
how individual miRNAs evolved, and how they are regulated. In this work, we demonstrate
that Transposable Elements are a source for novel miRNA genes and miRNA target sites. We
find that primate-specific miRNA binding sites were gained through the transposition of Alu
elements. We also find that remnants of Mammalian Interspersed Repeat transposition, which
occurred early in mammalian evolution, provide highly conserved functional miRNA binding
sites in the human genome. We also provide data to support that long non-coding RNAs
(lncRNAs) can provide a novel miRNA binding substrate which, rather than inhibiting the
miRNA target, inhibits the miRNA. As such, lncRNAs are proposed to function as
endogenous miRNA “sponges,” competing for miRNA binding and reducing miRNA-
mediated repression of protein-coding mRNA targets. We also explored how dynamic
changes to miRNA binding sites can occur by A-to-I editing of the 3 ‘UTRs of mRNA
targets. These works, together with knowledge gained from the regulatory activity of
endogenous and exogenously added miRNAs, provided a platform for algorithm
development that can be used in the rational design of artificial RNAi triggers with
improved target specificity. The cumulative results from our studies identify and in some
cases clarify important mechanisms for the emergence of miRNAs and miRNA binding sites
on large (over eons) and small (developmental) time scales, and help in translating these gene
silencing processes into practical application.
iv
4
TABLE OF CONTENTS
LIST OF TABLES ................................................................................................... vii
LIST OF FIGURES ................................................................................................ viii
LIST OF ABBREVIATIONS .................................................................................... x
CHAPTER
1. INTRODUCTION ................................................................................................. 1 miRNAs: biogenesis ............................................................................... 1 miRNAs: mechanism of action ............................................................... 2 miRNAs: transcriptional and co-transcriptional control ......................... 3 miRNAs: post-transcriptional control ..................................................... 4 miRNAs: changing the mature miRNA sequence modulates target profiles .................................................................................................... 4 miRNAs: changing mRNA sequence modulates target profiles ........... 6 Long noncoding RNAs ........................................................................... 7 Exogenous RNAi .................................................................................... 7 Exogenous RNAi: implementation and design ....................................... 8 Objectives ............................................................................................... 9 Summary ............................................................................................... 10 Published work ..................................................................................... 10
Data retrieval and parsing .............................................................. 17 3’UTR analyses ...................................................................... 17 3’UTR TEs ............................................................................. 18
miRNA target prediction and TE annotation ................................. 18 TargetScan MRE predictions ................................................. 18 Local position to global coordinate conversion ..................... 18 Intersection of MRE and TE genomic coordinates ................ 19 Alu-MRE positional enrichment relative to Alu consensus ... 20 Generating unique MRE coordinates ..................................... 21
MiRNAs have predicted binding sites in 3’UTR-resident TE sequences ....................................................................................... 26
v
5
Let-7 directly regulates genes through conserved, MIR-element-derived target sites ......................................................................... 27 miRNAs with high Alu-MRE frequency target specific regions in the Alu ........................................................................................... 28 miR-24 directly regulates transcripts through Alu-derived target sites ................................................................................................ 28 Proliferation of Alu and B1 SINEs resulted in the convergent acquisition of miRNA targets in their respective primate and murine lineages .............................................................................. 30 Potentially-active Alu loci contain miRNA binding motifs ......... 31 MiRNAs are processed from TE sequences and regulate target genes containing homologous elements ........................................ 31 Functional validation of Alu-derived miRNAs ............................. 32
3. LONG INTERGENIC NON-CODING RNAS ARE A POTENTIAL SOURCE OF ENDOGENOUS MICRORNA “SPONGES” ................................ 53 Abstract ................................................................................................. 53 Introduction ........................................................................................... 53 Methods ................................................................................................ 55
Data sources ................................................................................... 55 Prediction and analysis of MRE content in lncRNAs ................... 55 RNA isolation and RT-PCR .......................................................... 56 Ago immunoprecipitation .............................................................. 56
Results................................................................................................... 57 Abundant MRE content is evident in many mouse lncRNAs ....... 57 Expression pattern of lncRNA, PSMI16 ....................................... 58
Adult mouse brain .................................................................. 58 Developing mouse at e14.5 .................................................... 58 Other adult mouse tissues and cell lines ................................. 58
PSMI16 associates with Ago2 ....................................................... 59 “Modular” exon structure and differential MRE inclusion in PSMI16 alternative isoforms ......................................................... 59
4. SISPOTR: A TOOL FOR DESIGNING HIGHLY SPECIFIC AND POTENT SIRNAS FOR HUMAN AND MOUSE............................................... 70 Abstract ................................................................................................. 70 Introduction ........................................................................................... 70 Methods ................................................................................................ 72
Dataset and Sequence Retrieval .................................................... 72 Formulating POTS ......................................................................... 73
Dataset selection ..................................................................... 73 Establishing weighted probability of repression (PR) values and POTS calculation ............................................................. 74
Validation of siSPOTR algorithm: efficacy and specificity ......... 84 Efficacy .................................................................................. 84 Off-targeting potential ............................................................ 84
Comparison of siSPOTR to other algorithms ................................ 86 Prospective applications to expressed RNAi and genome-wide RNAi libraries ............................................................................... 86 SiSPOTR Online Tool ................................................................... 88
Discussion ............................................................................................. 89 Consideration of Seed Pairing Stability ........................................ 89 The Utility of siSPOTR ................................................................. 89
V. FINAL DISCUSSION ...................................................................................... 105 Competitive Endogenous RNAs ......................................................... 105 Off-targeting and RNAi design .......................................................... 106 Emerging technologies in the study of miRNA biology .................... 108
ADENOSINE DEAMINATION IN HUMAN TRANSCRIPTS GENERATES NOVEL MICRORNA BINDING SITES1F ...................................... 112 Abstract ............................................................................................... 112 Introduction ......................................................................................... 113 Results................................................................................................. 114
Adenosine deamination creates miRNA complementarities ....... 114 MiR-513 and miR-769-3p/-450b-3p specifically target deamination sites ......................................................................... 115 MiR-513 and miR-769-3p repress deaminated sequences .......... 116 MiR-769-3p represses DFFA expression specifically in cells that deaminate the DFFA 3’ UTR ...................................................... 117
Discussion ........................................................................................... 118 Materials and Methods ....................................................................... 120
Informatics evaluation of ADAR deamination sites ................... 120 Vector construction ..................................................................... 120 Luciferase assays ......................................................................... 121 Western blotting .......................................................................... 122
Table 2-1. miRNAs have predicted MREs in potentially-active Alus in the human genome ................................................................................................... 49
Table 3-1. Putative lncRNA “sponges” and MRE frequency for conserved miRNAs ................................................................................................. 64
Table 4-1. Comparison of siRNA design tools. .................................................... 101
Table 4-2. The effect of seed position 8 on off-targeting potential by site frequency.............................................................................................. 102
Table 4-3. The effect of seed position 8 on off-targeting potential by POTS ....... 103
Table A-2. A-to-I editing occurs predominantly in noncoding regions of expressed sequences ............................................................................................. 124
Figure 1-2. Anatomy of lncRNA loci (Adapted from Rinn & Chang, 2012) ........ 13
Figure 2-1. TE family composition of putative TE-MREs in human 3’UTRs. ..... 38
Figure 2-2. TE-MRE composition and unbiased gene function analysis reveal strong functional connections between let-7 and MIR-derived MREs. ................................................................................................. 39
Figure 2-3. Genome browser views for let-7 MIR-derived MREs in (A) MYO1F and (B) E2F6. ...................................................................................... 41
Figure 2-5. TE-MRE compositions for (A) miR-24-3p and (B) miR-122 show a prominent Alu fraction. ....................................................................... 43
Figure 2-6. Most frequent Alu-MRE sequences map to distinct positions relative to the Alu consensus. .......................................................................... 44
Figure 2-7. Alu-derived MREs respond to miR-24 overexpression. ..................... 45
Figure 2-8. Microarray datasets measuring response to miRNA overexpression to assess functional response of Alu-derived targets on a global scale .. 46
Figure 2-9. The fraction of down-regulated genes with Alu-derived MREs is in proportion to their overall prevalence. ................................................ 47
Figure 2-10. Functional miR-24 MREs are independently created in rodent and primate clades due to lineage-specific, but homologous TE families. 48
Figure 2-11. miR-28 is derived from an LINE2c retrotransposon, is highly conserved and regulates transcripts with LINE2-embedded MRE sequences. ........................................................................................... 50
Figure 2-12. Alu-derived miR-1285-1 is effectively processed and mediates knockdown of genes with Alu-MREs. ................................................ 51
Figure 2-13. Pol III intronic promoters drive intronic miRNA expression. ............ 52
Figure 3-14. Proposed mechanism for microRNA competitive inhibition by endogenous long non-coding RNA “sponges”. .................................. 62
Figure 3-15. Distribution of MRE frequency in predicted miRNA/lncRNA pairs. 63
Figure 3-16. PSMI16 (NR_015505) In situ hybridization reveals strong regional expression in adult mouse brain. ......................................................... 65
ix
9
Figure 3-17. Strong regional expression of PSMI16 is seen in the developing mouse (14.5 DPC) by in situ hybridization. ....................................... 66
Figure 3-18. PSMI16 expression by RT-PCR in (A) adult mouse tissues and (B) cell lines. ............................................................................................. 67
Figure 3-19. PSMI16 associates with Ago proteins in mouse neural progenitor cells. .................................................................................................... 68
Figure 3-20. Differential MRE incorporation in alternative PSMI16 isoforms. ..... 69
Figure 4-1. Diagram of on- and off-target silencing by siRNAs. .......................... 92
Figure 4-2. Effect of siRNA off-targeting potential on gene silencing capacity. .. 93
Figure 4-3. Formulation and distribution of POTS (potential off-targeting score). .................................................................................................. 94
Figure 4-4. Correlation of POTS ranks across tissues. .......................................... 95
Figure 4-5. Workflow schematic for designing siRNAs targeting human PPIB using the siSPOTR algorithm. ............................................................ 96
Figure 4-6. Validation of siSPOTR: efficacy and off-targeting. ........................... 97
Figure 4-7. Spearman rank correlation of final POTS values. .............................. 99
Figure 4-8. Effect of POTS on off-targeting from hairpin-based RNAi expression vectors. .............................................................................................. 100
Figure 4-9. Comparison of off-targeting potentials among shRNA libraries. ..... 104
Figure A-3. miR-513 and miR-769-3p target MAIDs but not the corresponding unedited sequence. ............................................................................ 127
Figure A-4. Endogenous MAIDs are targets for miR-513 and miR-769-3p repression. ......................................................................................... 128
Figure 1-2. Anatomy of lncRNA loci. (Adapted from Rinn & Chang, 2012)
Owing to the largely unknown function of most lncRNAs, they are often classified according to their location and orientation in relation to nearby protein-coding genes. Gene-proximal lncRNAs often act in cis, regulating expression from the protein-coding gene. Antisense transcripts initiate transcription within or 3’ of the protein-coding gene and are transcribed in the opposite direction
14
CHAPTER 2
TRANSPOSABLE ELEMENTS CREATE FUNCTIONAL
MICRORNAS AND MICRORNA TARGET SITES
Abstract
Transposable Elements (TEs) account for nearly one-half of the sequence content
in the human genome. De novo germline transposition into regulatory or coding
sequences of protein-coding genes causes several heritable disorders. However, TEs are
prevalent in and around protein-coding genes, sparking inquiry into possible regulatory
function. Computational studies revealed miRNA genes and miRNA Recognition
Elements (MREs) residing within TE sequences, but little evidence exists to support a
role for these sequences. In this work, I functionally validate miRNAs and MREs derived
from the most prevalent TE families, including evolutionarily ancient LINE2 and MIR
retrotransposons as well as primate-specific Alu elements.
Introduction
Transposable Elements (TEs or transposons) mobilize and reintegrate within a
host organism’s genome and different TE classes have diverse structural features,
transposition mechanisms and evolutionary origins. Some elements mobilize via "copy
and paste" mechanisms and others through "cut and paste.” Retrotransposons (Type I)
replicate by transcribing an RNA copy that subsequently reintegrates into the host
genome and serves as a template for RNA-dependent DNA polymerase (a.k.a. reverse
transcriptase) activity. Analogous mechanisms are essential in the life cycle of infectious
retroviruses like Human Immunodeficiency Virus (HIV) and Human T-Cell Leukemia
Virus (HTLV). Non-infectious Endogenous Retroviruses (ERVs) are predominant
members of the Long Terminal Repeat (LTR)-containing subclass of retrotransposons.
Non-LTR-containing retrotransposons, including Long and Short Interspersed Nuclear
Elements (LINEs and SINEs, respectively), are the most abundant TE class in the human
15
genome, accounting for more than 30% of the total DNA content. Additionally, non-LTR
LINE1, Alu and SVA elements are the only TE families that remain active. DNA (Type
II) transposons encode proteins that excise the TE and facilitate reintegration elsewhere.
Although no DNA elements are active in the human genome at present, evolutionary
analysis of human DNA element sequences, accounting for 3% of the total genome
content, revealed that primate genomes had abundant DNA transposition until ~37
million years ago (MYA). Together, TEs mobilizing through both mechanisms have
modified the human genome as well as the genomes of most organisms across all
domains of life, and some continue to do so.
Irrespective of the mechanism, transposition of "active" elements is potentially
mutagenic as TE excision (Type II) or integration (Type I and II) can directly disrupt the
sequence or expression of protein-coding genes. Examples of de novo germline insertions
of active Alu, LINE1 and SVA elements are evident in more than 60 human diseases
including β-thalassemia, hemophilia and cystic fibrosis. Additionally, high copy numbers
of Alu and LINE1 elements, both active and inactive, can cause somatic genome
instability and cancer.
In spite of their mutagenic potential, TEs are commonly observed as the
predominant contributor to a genome's sequence content. In fact, ~80% of the 17
gigabase-pair (Gb) bread wheat genome (Triticum aestivum) is TE-derived, which is
more than 4.5 times the size of the human genome. Conservative estimates place TE
content of humans at ~45% of the genome. More recent estimates using an improved TE
prediction algorithm suggests that this value is closer to 65-70%.
Mechanisms to protect against potentially deleterious transposition events have
evolved in plants and animals, including RNA interference (RNAi). In the mammalian
germline, where heritable mutations can accumulate, piwi-Interacting RNAs (piRNAs)
and some endogenous siRNAs (endo-siRNAs), are loaded into Argonaute-family proteins
(PIWI and Ago2, respectively) and guide silencing complexes to complementary TE
16
sequences. Intriguingly, computational observations from our lab and others reported that
miRNAs can be processed from TE-derived genomic loci (Borchert et al, 2006 ;
gamma (EIF2S3), Mitogen-Activated Protein 3-Kinase 9 (MAP3K9)), were cloned into a
luciferase reporter for functional validation. One let-7a MIR-derived target gene,
MFSD4, also had an Alu-derived miR-24 site and was also tested. Dose-dependent
luciferase reduction was observed in response to a miR-24 mimic for EIF2S3 and
MAP3K9, as well as the artificial miR-24 target control, miR-24_2xT, (Figure2-7;
bottom). At the doses used (1 and 10 nM) no significant knockdown was observed in the
other constructs tested or in the psiCHECK no-target control (data not shown and Figure
2-7). Above that concentration, miR-24 caused non-specific changes in both firefly and
Renilla luciferase in the negative control reporter, preventing accurate interpretation of
the 3’UTR reporter data at these higher doses (data not shown). Blocking endogenous
miR-24 with antisense Anti-miR™ inhibitors resulted in a dose-dependent increase in
luciferase activity only in the artificial target positive control, consistent with the low
validation rate seen in the overexpression experiments (Not Shown). If the chosen
candidate genes represent an accurate sample of miR-24 Alu-derived targets, the extent
of Alu-derived targets imparting miR-24 responsive activity is low.
However, because miR-24 predominantly targets a specific region within the Alu
sequence, it may not be representative of an Alu sequence’s general capacity to allow for
miRNA mediated regulation. To test this on a global scale, and to determine if the
responses vary with different miRNAs, I analyzed a set of publically-available
microarray data that measured mRNA transcriptional changes in response to
overexpression of various miRNAs. Genes were annotated according to whether or not
they contained i) a 3’UTR Alu, ii) an Alu-derived target site, or iii) a canonical (non-TE-
derived) target site for the miRNA in question. Analysis of cumulative distribution
functions for all groups revealed significant repression of expression relative to genes
lacking 3’UTR-resident Alus and target sites, but not to the degree seen in genes with
canonical sites. For example, compare the cumulative fraction plots for miR-122 and
30
miR-24 (Figure 2-8 and Figure 2-9). Interestingly, in spite of the generally weaker
knockdown of Alu-derived targets, they represented 20-30% of the down-regulated target
list (Figure 2-9). Also, the two candidates (MAP3K9 and EIF2S3) that responded to miR-
24 overexpression in the context of the luciferase reporter were also repressed according
to the microarray data, supporting our previous findings (Figure 2-7).
Proliferation of Alu and B1 SINEs resulted in the
convergent acquisition of miRNA targets in their
respective primate and murine lineages
To test the impact of recently-evolved lineage-specific TEs, as well as to improve
the prediction of functionally-relevant sites, I repeated our target prediction analysis
using mouse 3’UTR-resident TE sequences. I also searched for the convergent
acquisition of TE-derived target sites to determine if murine and primate orthologs would
independently gain regulatory sites for the same miRNA (Figure 2-10). For this, I
gathered coordinates for mouse 3’UTR-resident TE sequences and used the "lift-over"
utility on the Galaxy web server to convert to the corresponding human coordinates.
Mouse 3’UTR sequences overlapping TEs with no mappable human counterpart were
then selected, as were human 3’UTR sequences overlapping TEs with no mappable
mouse counterpart. Target sites were predicted, using miRNAs present in both species. I
focused on 3’UTRs with single sites that demonstrated sequence conservation in all
species where a TE insertion was present. I functionally tested human and mouse Solute
Carrier Family 12, member 8 (SLC12A8), Sideroflexin2 (SFXN2), UBX domain-
containing protein 2B (UBXN2B) and CDGSH Iron Sulfur Domain 2 (CISD2), using
3’UTR reporters. The 3’UTR of chimpanzee SFXN2 was also cloned, because the miR-
24 seed match had a single base mutation in the target site. Both mouse and human
SFXN2 and SLC12A8 showed significant repression when co-expressed with 15 or 30
31
nM of miR-24 mimic (Figure 2-10). No significant response was seen with UBXN2B,
CISD2 or ptrSFXN2 (Figure 2-10).
Potentially-active Alu loci contain
miRNA binding motifs
Alu and LINE1 elements represent two of the most abundant TEs in the human
genome, as well as the two most-frequently predicted TE-derived target sites. Because
both TE families still show evidence of active retrotransposition in humans, they remain
potential sources for novel miRNA binding sites. A recent study assessed mobilization
activity of 89 full-length Alu sequences and found 124 key positions that were 100%
conserved in active elements (Bennett et al, 2008). I predicted miRNA binding sites in
the ~12,000 Alus in the human genome that retained these 124 features, hypothesizing
that these would be the most likely sources of new Alu-derived sites. As expected, many
of the miRNAs with a high frequency of Alu-derived 3’UTR sites, including miR-24 and
miR-122, had a high frequency of sites in the potentially-active Alu sequences (Table 2-
1). Surprisingly, however, MREs for several miRNAs were present in well over 90% of
the potentially-active sequences. For example, in the case of miR-150, over 98% of the
potentially-active Alus contain a MRE site. This suggests that novel Alu insertions into
3’UTRs may have a higher likelihood of carrying MREs for a subset of miRNA families.
MiRNAs are processed from TE sequences and regulate
target genes containing homologous elements
In the examples presented thus far, the miRNA’s origin precedes that of the
corresponding target sites. Although several mechanisms may generate novel miRNA
genes, I focused on TE sequence processing as a source of miRNAs. I hypothesized that
these miRNAs would inherently gain functional targets through homologous TE
sequences resident in 3’UTRs. In this scenario, a new miRNA gene would have an active
source of novel target sites. To test the functionality of these interactions, I specifically
32
focused on miRNAs where the sequence alignment to the TE overlaps the seed of the
miRNA guide strand. Therefore, I generated a list of all human miRNAs with any
detectable TE homology and then selected sequences where the TE annotation
completely overlapped the proposed seed sequence.
While most miRNAs with TE homology are of relatively recent origin, one
notable example of broad conservation is miR-28. Annotation of this locus shows that
tandem inverted copies of the 3’ end of an L2c retrotransposon formed the 5’ and 3’ arms
of the miRNA precursor (Figure 2-11). Both L2c sequences have a similar level of
divergence from the L2c consensus (23%, 19.4%), suggesting that both insertions
occurred around the same time. A recently-published study suggested that miR-28-5p
binds to and drives endonucleolytic cleavage of LYPD3, interacting with the transcript
through a novel "centered-seed" site within a homologous L2 element (Shin et al, 2010).
In line with this result, luciferase reporters containing the 3’UTR of LYPD3
demonstrated dose-dependent repression in response to a miR-28-5p mimic. Similar
responses were observed using 3’UTR reporters for E2F6, within which miR-28-5p is
also predicted to bind through an L2 sequence (Figure 2-11).
Functional validation of Alu-derived miRNAs
While miR-28 demonstrates functionality of a TE-derived miRNA, its high level
of conservation across eutherian mammals makes it a rare example among the TE-
derived miRNAs. To test if primate-specific TE-derived miRNAs are functional, I used
Alu-derived miR-566 and miR-1285 as case studies. At the start of this study, miR-566
and miR-619 were the only human miRNAs with Alu homology, and only with miR-566
did the Alu encompass the mature miRNA sequence (Figure 2-13A). This miRNA was
initially annotated in a study characterizing the miRNA contingent of colorectal cells
(Cummins et al, 2006). Using a common stem-loop PCR method, I detected significant
expression of miR-566 in PBMC and HEK293 cells. Furthermore, I could express the
33
miR-566 genomic locus in the context of an otherwise promoterless cloning plasmid
(Figure 2-13B,C,D). However, miR-566-encoding plasmids were unable to reduce
luciferase reporter expression (not shown), while a commercially available miR-566 Pre-
miR™ was functional. I suggest that the increase in expression as measured by stem-loop
PCR, without forming a functional miRNA, could be due to non-specific priming.
Indeed, northern blots detected the miR-566 precursor, but not a ~20 nt band
corresponding to the mature sequence (not shown).
While conflicting evidence was found for Alu-derived miR-566, as high-
throughput sequencing technologies matured and were used in miRNA discovery studies,
additional Alu-derived sequences were annotated as miRNAs. Among the list of putative
Alu-miRNAs, miR-1285 had subjectively more promising deep sequencing support than
miR-566, based on data collected by miRBase and deepBase sequencing data (Kozomara
& Griffiths-Jones, 2011; Yang et al, 2010). In a recent study which sought to validate
functionally all known mouse miRNAs, the primary method used for this purpose
involved cloning the miRNA along with ~100 flanking nucleotides into an expression
vector. They expressed these constructs in vitro and then sequenced the small RNA
fractions to determine those that yielded a functional mature miRNA (Chiang et al,
2010). These same constructs were also tested for the ability to silence MRE-containing
reporters, and the data related back to the sequencing results. Similarly, to test the
functionality of the miR-1285, I cloned 200bp flanking the pre-miRNAs of the two hsa-
miR-1285 loci (miR-1285-1 and miR-1285-2) into an expression plasmid (Figure 2-12A).
Interestingly miR-1285-1 and -2 share a common seed sequence and homology to Alu
elements, but differ notably in secondary structure. To test whether functional miRNAs
could arise from the genomic fragments, I transfected HEK-293 cells with 0, 200 or
400ng of either miR-1285-1 or miR-1285-2, balanced with a control plasmid lacking a
miRNA. A similar expression plasmid was also generated for hsa-miR-24-1 to serve as a
positive control for a valid miRNA. All plasmids were co-transfected with their
34
corresponding artificial targets as described previously. Interestingly, miR-1285-1, but
not miR-1285-2 reduced expression of the artificial reporter (Figure 2-12B). This
suggests that the miR-1285-2 locus does not produce a functional miRNA.
I also tested whether predicted targets for miR-1285 responded to miR-1285
overexpression. The majority of target sites predicted for miR-1285 were located in Alus
(Figure 2-12C), and so candidates containing Alu-derived sites were tested.
Overexpression of miR-1285 reduced expression of EIF2S3, CHST6 and CBFA2T2 in a
dose dependent manner. Attempts to block miR-1285 activity using Anti-miRs™ were
ineffective, even for the artificial target. While this could mean that endogenous miR-
1285 is non-functional in the cell lines tested, I also were unsuccessful in inhibiting the
effects of miR-1285 overexpression with the Anti-miR™ (Figure 2-12D). I hypothesize
that could be due to nonspecific binding of the Anti-miR™ to the multitude of Alu
sequences in other transcripts transcribed at any given time. In any case, the
overexpression data and the proper processing of miR-1285 in the context of the genomic
locus supports miR-1285 as a functional Alu-derived miRNA. Furthermore, while
varying functional results were observed for the TE-derived miRNAs tested, the data
from miR-28 and miR-1285 show that these miRNAs deserve further study.
Discussion
In this work, I demonstrate that the most prevalent TE families in the human
genome, namely Alu, MIR and LINE2 elements, provide a functional platform for
miRNA-mediated regulation when resident in mRNA 3’UTRs. I also found that, while
the majority of TE-MREs in human 3’UTRs reside in primate-specific L1 and Alu
elements, sequence conservation was also seen in the MIR-derived let-7 MREs.
Further inquiry into the extent of Alu MRE function will undoubtedly benefit
from high-throughput approaches of measuring gene expression changes after modulating
miRNA levels, such as the microarray experiments presented here. The low degree of
35
sequence divergence among the 3’UTR-resident Alus leads to a preponderance of
predicted MREs for some miRNAs. As a function of their limited divergence from
parental Alu sequences, distinct miRNA binding sites cluster in specific Alu primary
sequence regions (Figure 2-6). Although, on average, Alu-MREs had lower potency
canonical (non TE-derived) sites, evidence from the array data and our luciferase results
show that some are, indeed, functional.
One possible reason for the lower validation rate of Alu-MREs may arise from the
fact that Alus can associate with Signal Recognition Particle (SRP) proteins through
specific domains (Bovia et al, 1997; Chang et al, 1996; Hsu et al, 1995). If SRP binding
occurs in the context of a 3’UTR-resident Alu, MREs, and hence miRNA access, may be
shielded. In a previous study, in vitro transcription of Chloramphenicol Acetyltransferase
(CAT) mRNAs with artificial 5’ or 3’UTR Alus transcribed in the sense orientation were
bound by SRP complex (Hsu et al, 1995). If SRP binding blocks miRNA association,
miRNAs predominantly targeting the antisense Alu would be less affected. This
hypothesis could be tested directly by querying microarray datasets. Genes could be
categorized based on the presence or absence of a 3’UTR Alu, whether the Alu is
transcribed in the sense or antisense orientation, and if the 3’UTR Alu has a predicted
MRE. One could also test genes with known 3’UTR-Alu-SRP interactions to assess the
impact of the SRP on miRNA-mediated silencing. Additionally candidate genes could be
predicted computationally by searching for sequence features indicative of SRP binding.
One predictor of SRP-binding is active transposition (Bennett et al, 2008). Because the
SRP binding domains G25C and G159C in the AluYa5 subfamily are important for
transposition activity, genes with Alus retaining these features could be predicted along
with the associated MREs. Their direct association and the effects on miRNA-mediated
repression could be assessed by first immunoprecipitating SRP9 or 14 followed by RT-
PCR of candidate transcripts to confirm SRP-Alu interaction. The validated 3’UTRs
could be cloned subsequently into luciferase reporters and site-directed mutagenesis
36
performed to confirm their role in SRP binding. The candidates emerging from this
secondary screen would then be used to determine if miRNA-mediated repression was
impacted by the presence of absence of SRP, or the intactness of the G24 and G1159
sites. Conversely, 3’UTR Alus lacking these sites could be reverse-engineered to attain
SRP association sites and miRNA knockdown efficiency measured.
On a larger scale, if such interactions are shown to be important for miRNA
recognition, a Cross Linking/Immunoprecipitation – Sequencing (CLIP-SEQ) method
could be developed for the Alu-binding SRP proteins, SRP9 or SRP14. One likely
complication in this experiment would be the predominance of the 7SL noncoding RNA,
which is the canonical SRP binding partner, or free cytoplasmic Alu RNAs. However, the
molecular weight difference between Alu/7SL-bound and mRNA-bound SRP complexes
could help resolve this, since CLIP protocols often include a size-selection step.
Curiously, of the ~1.2 million Alu copies present in the human genome, fewer
than 20 are expected to produce functional miRNAs. Also, the functionality of most of
these Alu-derived miRNAs is untested. In this work, I found that miR-566 and one of the
two miR-1285 loci did not produce a functional miRNA. However, LINE-derived miR-
28 is both well conserved and functional and miR-1285-1 did demonstrate effective
processing and silencing efficacy. These data show that some TE-derived miRNAs are,
indeed functional, but Alu-derived miRNAs and other miRNAs with low apparent
sequence conservation, deserve closer scrutiny. Ideally, these studies should include a
combination of northern blot and reporter based assays, before concluding that a bona
fide miRNA emerges from the locus in question.
37
Further complication in identifying true Alu-derived miRNAs comes from a
recent study which demonstrates that DICER1 degrades Alu RNAs (Kaneko et al, 2011),
indicating that some Alu-derived small RNAs are DICER-dependent degradation
products rather than miRNAs. These results emphasize the importance of functional
experiments for validating Alu-derived miRNA function, such as those presented here.
While miR-1285 and miR-566 each produced small RNAs, only miR-1285 was capable
of silencing a luciferase reporter in a sequence-dependent manner. For miRNA discovery
studies, close examination of the proposed loci should be undertaken to ensure that they
follow the criteria outlined in (Chiang et al, 2010). The experimental design used for
testing miR-566 and miR-1285 was taken from the Chiang et al study; although they
additionally performed high-throughput sequencing on small RNA fractions extracted
from cells expressing the miRNA expression constructs. From these data, they found that
miRNAs most likely to validate functionally were those producing a predominant mature
sequence with a homogeneous 5’ end and a passenger strand with a 2 nt 3’overhang. The
miRBase repository is actively incorporating small RNA high-throughput sequencing
reads and using evidence such as that proposed in the Chiang et al study to improve the
accuracy of miRNA identification and remove dubious annotations (Kozomara &
Griffiths-Jones, 2011).
In summary, I find evidence that some TE-derived miRNAs and miRNA binding
sites are both conserved and functional. I also show that some sequences with low
sequence conservation do respond to miRNA expression, with evidence both from
reporters and global transcript expression profiles. Together, our data support a role for
TEs in the evolution of human miRNA interactions, as suggest that novel miRNA
functions may continue to arise as active transposition persists.
38
Figure 2-1. TE family composition of putative TE-MREs in human 3’UTRs.
TargetScan was used to predict miRNA target sites in Refseq-annotated human 3’UTRs. TargetScan’s provided human miRNA family seed file was used. TE-MREs were annotated by intersecting unique MRE coordinates with the RepeatMasker track annotations at the UCSC Genome Browser. (A) The TE Family annotation from RepeatMasker was used to classify all human TE-MREs, and the percent contribution of the top 5 most prevalent families is shown, representing >87% of all TE-derived target sites predicted. Primate-specific Alu and L1 retrotransposons make up more than half of the sites; the more ancient L2 and MIR elements constituted ~20% of the sites. “Other” represents all other transposable elements, but simple repeats and other low complexity repeats were not considered in this analysis. (B) Seed families were selected for which Alu, L1 or MIR-derived MREs were the most frequent TE-MRE. Seed families were binned according to the fraction of TE-MREs comprised by the majority TE group The histograms for the three TEs are overlayed, and so should be read as though every bar starts at zero. For example, the histogram shows that for ~60 miRNAs, L1s represent ~25% of the predicted MRE sites. Alus showed a bimodal distribution, because for many miRNAs, Alus represented more than half of predicted TE-MREs.
39
Figure 2-2. TE-MRE composition and unbiased gene function analysis reveal strong functional connections between let-7 and MIR-derived MREs.
The TE family composition of predicted human let-7 MREs revealed over 40% are of MIR origin. (Top) This was the highest proportion of MIR-derived sites for any miRNA in the dataset.
40
Figure 2-2. Continued. The 192 transcripts with MIR-MREs represent just over 10% of the ~1800 human genes with embedded MIR elements. (Bottom) Gene names for the ~1800 genes were analyzed using ToppFun to find functional groups associated with genes. Statistical significance is presented as p-values adjusted using Bonferroni correction. Let-7 had the most significant p-values of any functional category, including the non-miRNA categories not shown. Furthermore, it was the only miRNA with significant results for more than two of the prediction methods. The MRE prediction methods and any additional information are color-coded. For mirSVR (C = Conserved, NC = Non-conserved, HE = High Efficacy (predicted), LE = Low Efficacy).
41
Figure 2-3. Genome browser views for let-7 MIR-derived MREs in (A) MYO1F and (B) E2F6.
Let-7 MREs (yellow box) overlapping a MIR element (red boxes) annotated by RepeatMasker. MYO1F and E2F6 are two candidates where: i) no let-7 MRE is present in the 3’UTR aside from the MIR-MRE shown, and ii) PhyloP conservation scores (Mammal Cons track) showed (subjectively) strong conservation coincident with the binding site.
3’UTRs of MYCBP, MFSD4, E2F6 and MYO1F, each containing a single MIR-derived let-7 MRE, were cloned into Dual luciferase reporters and co-transfected into HEK293 cells with low doses (0.1, 1.0 nM) of an artificial let-7 mimic (Pre-miR™). Reactions were balanced to 1.0nM with a non-targeting Pre-miR™ (ctrl). Repression of luciferase activity was observed after 24 hours with all four reporters, as well as the let7_2xT positive control, but not the negative control reporter (CTRL). Luciferase activity is plotted as a percent of the activity observed in the 0nM let-7 Pre-miR™ dose. (B) Luciferase reporters were then co-transfected with a let-7a AntimiR inhibitor (0, 25, 50nM) into HeLa cells which express high levels of endogenous let-7a. 48 hours later, all 3’UTR, but not the CTRL reporters showed increased activity over the 0 nM AntimiR dose. N=3; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
43
Figure 2-5. TE-MRE compositions for (A) miR-24-3p and (B) miR-122 show a prominent Alu fraction.
TE-derived MREs were predicted for each miRNA. RepeatMasker track annotations were used to tabulate TE Family frequencies. The top represented families are shown, grouping all other families into the “Other” category. These results showed that Alus represent over 80% of the TE-derived miRNAs.
44
Figure 2-6. Most frequent Alu-MRE sequences map to distinct positions relative to the Alu consensus.
The position of the Alu consensus sequence containing a miR-125-3p, miR-24 or miR-122 MRE was graphed above. MRE position was normalized across all Alus by calculating positions relative to the Alu consensus sequence (see Methods). The high MRE frequency observed for each of the miRNAs is restricted to a narrow range 5-10bp wide. This suggests that little sequence divergence has occurred among Alus in these regions. It also suggests that these miRNAs encounter similar local sequence/structural contexts when binding to similar Alu-derived sites in other mRNA targets.
45
Figure 2-7. Alu-derived MREs respond to miR-24 overexpression.
Transcripts with Alu-derived MREs with evidence of local sequence conservation were functionally tested. (Top) The Primate Conservation track shows a rise in conservation score coincident with the miR-24 MRE overlapping the AluSp family sequence. (Bottom) Luciferase reporters expressing EIF2S3 and MAP3K9 3’UTRs were co-transfected into HEK293 cells with Pre-miR™ miR-24 mimics (0, 1, 10 nM Pre-miR™ doses). 24-hours later, luciferase assays reveled that EIF2S3 and MAP3K9 reporter expression was in response to miR-24, while the negative controls (ctrl) were not. N=4; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
46
Figure 2-8. Microarray datasets measuring response to miRNA overexpression to assess functional response of Alu-derived targets on a global scale
Gene annotations from arrays were intersected with TE-MRE predictions for the corresponding miRNAs. (Top) Genes were grouped according to whether they had a canonical (non TE-derived) MRE, Alu MRE or no MRE for the indicated miRNA family, and the empirical cumulative distributions were plotted. Canonical and Alu MRE-containing transcripts were shifted to the left of the non-target set, demonstrating a larger fraction of down-regulated transcripts in these groups relative to Alu MRE sites (Bottom).
47
Figure 2-9. The fraction of down-regulated genes with Alu-derived MREs is in proportion to their overall prevalence.
Cumulative distribution plots showed greater knockdown of genes with Canonical or Alu-derived MREs compared to background (no MRE). Therefore, in the case of miR-122 (left) and miR-24 (right), between 20 and 30 percent of all MRE-specific knockdown is due to the presence of Alu-derived target sites.
48
Figure 2-10. Functional miR-24 MREs are independently created in rodent and primate clades due to lineage-specific, but homologous TE families.
Target sites were predicted in human and mouse 3’UTR TEs, limiting to TE integrations specific to each lineage (Top). Homologous genes were then combined, searching for lineage-specific TE-derived MREs for the same miRNA. For miR-24, most of these sites resulted from transposition of B1 SINE elements, which, like Alus in primates, arose from a 7SL RNA ancestral sequence. Candidates were selected which had an Alu-derived miR-24 site in human 3’UTRs and a B1-derived site in mouse. The chosen candidates, SFXN2, SLC12A8 and UBXN2B additionally had binding sites that were conserved in species where the insertion was present. (Bottom) Luciferase reporters expressing candidate 3’UTRs were co-transfected with miR-24 Pre-miRs™. SFXN2 and SLC12A8 reporters showed reduced expression after miR-24 treatment compared to the control treated cells for both human and mouse constructs. UBXN2B showed no response. Chimpanzee SFXN2 had a single base change that disrupted the predicted binding site and a reporter of this 3’UTR did not respond to miR-24 addition. N=3; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
49
Table 2-1. miRNAs have predicted MREs in potentially-active Alus in the human genome
miRNA Family Alu (+) Alu (-)
miR-150 98.1% 0.0%
miR-129/129-5p 97.7% 0.0%
miR-590/590-3p 95.6% 0.0%
miR-106/302 95.6% 0.0%
miR-520gh 95.6% 0.3%
miR-17-5p/20/93.mr/106/519.d 95.5% 0.0%
miR-411 95.3% 0.0%
miR-512-3p/1186 94.0% 0.0%
miR-483/483-5p 86.0% 3.8%
miR-1234 83.2% 15.3%
miR-1307 75.6% 0.1%
miR-122 72.4% 0.3%
miR-139-3p 70.9% 0.6%
miR-720.h 67.5% 0.0%
miR-575 61.1% 10.4%
miR-1281 56.4% 0.0%
miR-709/1827 7.7% 97.1%
miR-24 0.6% 96.6%
miR-940 0.2% 95.8%
miR-485/485-5p 0.0% 93.2%
miR-548c-3p 1.8% 91.8%
miR-290-5p/292-5p/371-5p 2.6% 91.6%
miR-661 6.5% 73.7%
miR-566 0.2% 71.6%
miR-766 0.2% 70.6%
miR-508-5p 0.0% 68.0%
miR-1273 22.9% 66.3%
miR-663 0.0% 66.3%
Annotations and sequences of potentially active Alus were taken from Supplemental Table 3 (Bennett et al, 2008).
50
Figure 2-11. miR-28 is derived from an LINE2c retrotransposon, is highly conserved and regulates transcripts with LINE2-embedded MRE sequences.
LINE2-derived miR-28 is a conserved TE-derived microRNA (A). (B) Homologous LINE2-derived MREs make up the largest proportion of TE-derived targets. (C) LYPD3 has a LINE2c-embedded miR-28 MRE that shows strong conservation localized around the site. (D) Co-expression of 3’UTR luciferase reporters for LYPD3 and E2F6, both of which have predicted L2 target sites, are potently repressed in response to miR-28-5p overexpression. N=4; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
51
Figure 2-12. Alu-derived miR-1285-1 is effectively processed and mediates knockdown of genes with Alu-MREs.
(A) Two genomic loci are annotated for Alu-derived miR-1285, miR-1285-1 and miR-1285-2. Both loci, along with ~100 flanking bp were cloned into an expression plasmid. (B) MiR-1285-1 but not miR-1285-2 repressed the expression of luciferase reporters with 3’UTR resident target sites (miR1285-2xT). Mutating the seed sequence abolished this activity, indicating that miR-1285-1 is a functional miRNA. (C) Alu-derived miR-1285 has MREs predicted in homologous Alu sequences. (D) Coexpression of the miR-1285 mimic with predicted target 3’UTR reporters led to a reduction in luciferase activity. (E) Anti-miRs™ do not affect miR-1285 activity, likely due to dilution effect from other transcripts with Alu-derived sequences. N=3; Error bars = SD. * = p ≤ 0.05 (ANOVA; Tukey’s post hoc).
52
Figure 2-13. Pol III intronic promoters drive intronic miRNA expression.
(A) Representative diagram of the miR-566 genomic locus in several species. (B) Diagram of gHsa-miR-566 and gMmus-Sema3F constructs. The gHsa-miR-566 construct contains the intronic sequence of SEMA3F harboring the primate-specific Alu-derived miR-566 sequence. gMmus-Sema3F construct contains the equivalent intronic sequence of Sema3F from mouse, which is devoid of intronic miRNA sequence. (C) Mir-566 expression in HEK293 cells. Mir-566 expression was detected by QPCR in HEK293 cells after transfection with gHsa-566 but not in cells transfected with gMmus-Sema3F. MiR-566 levels were normalized to 18S expression and compared to cells transfected with gMmus-Sema3F. Data are mean ± SEM. *, P<0.05, n=4. (D) Mir-566 is expressed independently of Sema3F. MiR-566 and Sema3F expression were determined in HEK293 cells and PBMC cells. Data show expression of both miR-566 and Sema3F in HEK293 cells, while PBMC cells express only miR-566 and not the host gene. MiR-566 and Sema3F levels were normalized to 18S expression. Data are mean ± SEM. *, P<0.05, n=4.
53
CHAPTER 3
LONG INTERGENIC NON-CODING RNAS ARE A POTENTIAL
SOURCE OF ENDOGENOUS MICRORNA “SPONGES”
Abstract
MicroRNAs (miRNAs) classically bind to the 3’ Untranslated Regions (3’UTRs)
of protein-coding genes, playing important roles in diverse cellular processes. Exploring
the function of individual has relied on molecular tools that reduce the miRNA’s
expression or activity. One effective method uses constructs expressing a reporter gene
with a 3’UTR containing several miRNA binding sequences. These miRNA “sponges”
compete for miRNA binding to endogenous targets. Similarly, I find that some
endogenous long non-coding RNAs (lncRNAs) contain numerous binding sites for a
miRNA. Therefore, I propose that these lncRNAs function as endogenous miRNA
“sponges,” regulating the activity of one or more miRNAs through competitive
inhibition.
Introduction
Long, non-coding RNAs (lncRNAs) are an enigmatic class of novel RNA species
roughly defined as being larger than 200bp and having no evidence for coding potential
(Rinn & Chang, 2012). Although the name and definition are rather nondescript and
arbitrary, in the few years since their discovery, several distinct groups have emerged and
are generally defined according to their position and orientation in relation to nearby
protein-coding genes (Figure 1-2). For lncRNAs falling within or proximal to protein-
coding genes, this classification proved somewhat useful as many are thought to act in
cis, regulating expression of the neighboring transcripts. Recently, interest has grown in
understanding the function of the long intergeninc non-coding RNAs (lincRNAs). These
long non-coding RNAs can be have been implicated in the coordination of epigenetic
processes. Most long non-coding RNAs are restricted to the nucleus, supporting their role
54
in epigenetic and transcriptional control, but some are predominately cytoplasmic and are
capped, spliced and polyadenylated like mRNAs. Because microRNAs (miRNAs)
classically regulate protein-coding mRNAs by binding to non-coding 3’UTRs, I
hypothesized that some mRNA-like lncRNAs may be substrates for miRNA binding.
One of the general mechanisms by which lncRNAs can regulate transcriptional or
epigenetic states is by acting as decoys for protein regulatory factors (Wang & Chang,
2011). For example, growth arrest-specific 5 (Gas5) is an lncRNA that forms a structure
mimicking a DNA glucocorticoid response element (GRE). In this way, Gas5 competes
for binding with the DNA binding domain of the glucocorticoid receptor (Kino et al,
2010).
In this study, I propose a mechanism by which lncRNAs can act as decoys for
Argonaute (Ago)-bound miRNAs. Binding between a miRNA and mRNA classically
occurs through the 3’UTR of the mRNA as the coding region is typically a less-effective
substrate (Garcia et al, 2011). LncRNAs are effectively “UTRs”, thereby providing, at
least theoretically, large non-coding platforms for miRNA binding. LncRNAs may
provide a platform for multiple miRNA recognition elements (MREs) which would
impede miRNA:mRNA interaction. (Figure 3-14). To find lncRNAs that provide miRNA
“sponges” of biological relevance, I predicted MREs in lncRNAs that are expressed at
high levels in mouse Embryonic Stem Cells (ESCs) by microarray, or in particular
regions in the mouse brain by in situ hybridization (ISH). From this, I found many
candidate lncRNA that have greater than 10, and some with as many as 40 MREs in an
lncRNA for a single miRNA family. I characterize one such lncRNA with 23 binding
sites for the miR-15/16 family in the longest annotated isoform. Interestingly, I find
evidence for alternative lncRNA isoforms formed from alternative splicing or
transcription start site choice, which removes many of the predicted miR-15/16 binding
sites, which could subsequently regulate the degree to which a miRNA is sequestered.
55
Together, our findings suggest a mechanism for lncRNA-mediated regulation of miRNA
activity.
Methods
Data sources
Accession numbers for the mouse “Brain” and “ESC” long non-coding RNAs
were taken from the supplementary data provided in references (Dinger et al, 2008;
Mercer et al, 2008). Using these accession numbers, sequences were obtained from the
UCSC Genome browser (mm9). Mouse miRNA data, including seed sequences,
conservation level and seed family annotations were obtained from the TargetScan
website (http://www.targetscan.org/) (Grimson et al, 2007).
In situ hybridization (ISH) data for PSMI16 in adult mouse brain was obtained
from Allen Brain Atlas (http://mouse.brain-map.org/) (Ng et al, 2009) . PSMI16 data
series were found using its accession number from the Riken database, 6720401G13Rik.
The same search term was used to obtain ISH data for the e14.5 mouse embryo, available
from the Eurexpress transcriptome atlas (http://www.eurexpress.org/ee/) (Diez-Roux et
al, 2011).
Prediction and analysis of MRE content in lncRNAs
Target sites for mouse miRNAs were predicted in lncRNA sequences from both
datasets independently using the standalone Perl implementation of the TargetScan 5.1
algorithm (Lewis et al, 2005). For the purposes of representing the distribution of MRE
frequency, only one representative of a miRNA seed family was used. Additionally, for
lncRNAs with multiple isoforms, the sequence with the highest MRE frequency for a
given miRNA was represented.
56
RNA isolation and RT-PCR
Tissues were harvested from wild-type C57Bl/6 mice after deeply anesthetizing
with isofluorine and sacrificing by cervical dislocation. Tissues were immediately placed
in ~300ul RNALater (Life Technologies) and stored at 4°C overnight.
Total RNA was extracted using TRIzol reagent (Life Technologies) according to
the manufacturer’s protocol. ~1ml of Trizol was added to the mouse tissues after
removing the RNALater and tissues were homogenized on ice using a micropestle. RNA
samples were quantified by spectrophotometry and 1.0 μg of total RNA treated for 1.5 hr
with DNAse I to remove genomic contamination (DNA-free kit, Ambion®). Unless
otherwise indicated, cDNA was generated from ~500ng of the RNA using the High
Capacity cDNA Reverse Transcription Kit with random primers (Life Technologies).
Ago immunoprecipitation
Immunoprecipitations were performed using Dynabeads (Invitrogen). Beads were
prepared according to the manufacturer’s protocol, binding either the Ago or IgG control
antibodies. NPC or HEK293 cells were lysed using RIPA buffer with RNase and protease
inhibitors added. The immuoprecipitations were also performed using the manufacturer’s
protocol with the following details. Cell lysates were incubated with the dynabeads for
two hours at 4°C. After incubation and placing the samples on the magnet to separate the
bound beads, ~100 ul of the supernatant was retained as input. Three washes were then
performed with lysis buffer. After the final wash, the beads were separated, buffer
removed and 1ml of TRIzol was added directly to the beads and the reserved supernatant.
RNA was isolated as above. RT-PCR was performed to detect PSMI16 as above, except
that 200ng of RNA was used because of low RNA yield in the IP.
57
Results
Abundant MRE content is evident in many mouse
lncRNAs
To predict the extent of miRNA binding to lncRNAs, I evaluated MRE content in
a set of lncRNA sequences with previously-defined expression patterns in mouse brain or
embryonic stem cells (ESCs). ~460,000 combined seed matches were found in 849 and
945 sequences, representing confidently-expressed lncRNAs in the “brain” or ESC
datasets, respectively. Because the average lncRNA is expressed lower levels than a
typical mRNA, to effectively compete for miRNA binding I hypothesized that lncRNA
decoys would have numerous binding sites for a miRNA it regulates. To uncover
candidate interaction pairs, I tabulated MRE frequency for all predicted miRNA:lncRNA
interactions. As a control, target prediction was repeated on all sequences after
performing a randomized dinucleotide shuffle. 148 brain and 63 ESC-expressed lncRNAs
had at least 10 MREs for one or more miRNAs (Figure 3-15). By contrast, only three
such events were predicted in the scrambled control datasets. Many well-conserved
miRNAs, including pro-oncogenic miR-27, the tumor-suppressive and developmentally-
important miR-15/-16 and miR-302 families, and brain-enriched miR-128 and miR-338
had at least 10 sites predicted in one or more lncRNAs (Table 3-1). Interestingly, many
lncRNAs had numerous target sites predicted for several different miRNAs. For example,
the lncRNA NR_015505 (BC066100, in Table 3-1) has 11 MREs for miR-338, 12 for
miR-302 and 23 for the miR-15/16 family, suggesting a potential to coordinately regulate
multiple miRNAs. However, because the miR-15/16 family has nearly twice the MRE
content as any other miRNA in this transcript, I hypothesized that these miRNAs would
be the most likely candidates for competitive inhibition. Therefore, for the purposes of
this study and for simplicity, I refer to NR_015505 throughout the text as Putative
Sponge for miRNA-16 (PSMI16).
58
Expression pattern of lncRNA, PSMI16
Adult mouse brain
PSMI16 was one of only two lncRNAs that had more than 10 sites for any
conserved miRNA and was expressed in both the brain and ESC datasets. The “Brain”
lncRNA dataset had come from the identification of lncRNAs with ISH data available
from the Allen Brain Atlas. The ISH data for NR_015505 revealed the highest expression
levels in regions within cerebellum and hippocampus (Figure 3-16). Closer inspection of
the hippocampal expression pattern revealed that expression was restricted to the granule
cell layer of the dentate gyrus and the pyramidal layer of Fields CA1, 2 and 3. Similar
regional restriction was seen in the cerebellum, where high expression levels were only
observed in the periphery of the granular layer of the cerebellar cortex. While I
hypothesized that having high MRE content would help overcome low levels of
expression often observed with lncRNAs, these data show that in certain regions,
PSMI16 levels may be quite high in addition to having 23 MREs for miR-16.
Developing mouse at e14.5
Because PSMI16 was also expressed in the ESC dataset, I was interested to see
whether the lncRNA was expressed in the developing mouse. An ISH data series showing
PSMI16 expression in a mouse embryo 14.5 days post coitum (DPC) was found in the
Eurexpress Transcriptome Atlas database (Diez-Roux et al, 2011) (Figure 3-17).
Other adult mouse tissues and cell lines
To characterize PSMI16 further, I tested its expression in adult mouse tissues
using semi-quantitative RT-PCR. Expression was detected in all tissues tested, with
particularly high levels in colon, thymus, lung, pineal gland and ovaries (Figure 3-18).
These data show that the lncRNA has expression in many embryonic and adult tissues. I
also tested several mouse cell lines for its expression, including brain-derived endothelial
59
(BEND3), neuroblastoma (N2A), and neural progenitor (NPC) cells for the lncRNA’s
expression. N2As showed the lowest level of expression, and so NPC and BEND3 cells
were used for further studies (Figure 3-18). In these latter experiments, Oligo-dT primers
were used in the RT reactions, creating a cDNA library of polyadenylated transcripts. I
was able to amplify PSMI16 from Oligo-dT libraries not only using the same primer set
as before, but also with a set amplifying a near full-length product. This suggests that
PSMI16 is a polyadenylated transcript,
PSMI16 associates with Ago2
Although PSMI16 was expressed at high levels in many biological settings, in
order to function as a miRNA target decoy, it should associate with miRNA-containing
complexes. Specifically, a target decoy should associate with RISC, within which
miRNAs are bound by Ago proteins. To test whether PSMI16 associates with this
complex, I performed RNA Immunoprecipitation (RIP) using an antibody for Ago2 on
NPC cell lysates, which had appreciable levels of PSMI16 and miR-16 (Figure 3-5 and
Sarah Fineberg, unpublished data). Because PSMI16 is rodent-specific, human HEK293
cells were used as a negative control. RNA was extracted from bound and unbound
fractions, and RT-PCR for PSMI16 was performed. As expected, HEK293s showed no
detectable expression for PSMI16 (Figure 3-19). PSMI16 was detected in both Ago and
IgG control supernatants, confirming expression of PSMI16 in these cells and its integrity
through the course of the experiment. In the IP samples, however, a single specific band
was only seen in the Ago-bound IP fraction of mouse NPCs, demonstrating that PSMI16
associates with Argonaute proteins in NPCs.
“Modular” exon structure and differential MRE
inclusion in PSMI16 alternative isoforms
Based on annotations available at the UCSC Genome Browser, I saw that PSMI16
is a ~5.7kb transcript with 20 exons and several alternative isoforms formed from
60
differential splicing, promoter use, or 3’ end choice. The structure of the primary isoform,
drawn with exons roughly to scale in Figure 3-20, reveals that the predicted target sites
for miR-16 are present in 13 of the exons within the first (5’) two-thirds of the transcript.
Interestingly, many of the exons are near-identical copies of one another, as demonstrated
by a multiple sequence alignment of the miR-16 binding sites with ~18 flanking bases
(Figure 3-20; bottom). Remarkably, in addition to the full-length transcript, a novel short
isoform was amplified in BEND3 cells that excluded as many as 12 predicted binding
sites (Figure 3-20). Also a different variant (AK030946) skips exons 4-13, leaving only 4
miR-16 MREs (Figure 3-20). These data suggest an intriguing mechanism by which the
repetitive MRE-containing exons could serve as “Modular” units, allowing fine control
over MRE frequency and, consequently, miRNA repression levels.
Discussion
During the course of these experiments, five highly-publicized articles were
published in quick succession demonstrating various biological systems where this
miRNA “sponging” mechanism plays an important role (Cesana et al, 2011; Karreth et
al, 2011; Poliseno et al, 2010; Sumazin et al, 2011; Tay et al, 2011). These long non-
coding RNAs were named competing endogenous RNA (ceRNA). The first study
demonstrated that a PTEN pseudogene had no coding potential, but retained many of the
same miRNA binding sites as its protein-coding counterpart (Poliseno et al, 2010). As
compared with the example proposed in this thesis chapter, the pseudogene ceRNA
would likely have more precise impact on PTEN levels, because it would compete for
multiple miRNAs, all of which bind the PTEN transcript. Interestingly, the same group
later published that protein-coding PTEN functions as a ceRNA in a coding-independent
matter, adding to the complexity of this system (Tay et al, 2011). Finally, an mRNA-like
non-coding RNA, like PSMI16, was shown to function as a ceRNA during muscle
differentiation (Cesana et al, 2011). They showed that linc-MD1 “sponges” miR-133 and
61
miR-135, which themselves regulate transcription factors that activate muscle-specific
gene expression. Together, these data add a great deal of complexity to an already-
complex system of post-transcriptional gene regulation.
PSMI16 may still prove to be an interesting case study, since regulation of its
alternative isoforms adds an intriguing layer of complexity to the ceRNA story. However,
the linchpin in the PSMI16 story was finding a measurable indication of cellular
responses to miR-16 expression—a point which ultimately remained elusive. I proposed
measuring levels of previously-established miR-16 target genes after manipulating the
levels of PSMI16 or altering accessibility to the MRE site. However, I was unable to
validate any previously-described miR-16 targets in BEND3 cells. Overexpression of
miR-16 using Pre-miR™ mimics and inhibition with Anti-miRs™ had no effect on the
levels of the five genes tested (not shown). Artificial reporters for miR-16 showed that
the Pre-miR™ and Anti-miR™ treatments were working properly; suggesting that the
target genes tested are not responsive to miR-16 in these cells. Therefore any response
observed from manipulating PSMI16 levels would likely be non-specific in this setting.
62
Figure 3-14. Proposed mechanism for microRNA competitive inhibition by endogenous long non-coding RNA “sponges”.
(Bottom right) In a typical setting, a pri-miRNA is processed in the nucleus, processed sequentially by RNaseIII enzymes, Drosha and Dicer, and the mature guide miRNA is loaded into an Ago protein (Ago2 is depicted). (Top right) The miRNA guides the Ago-containing RISC machinery to complementary binding sites in the 3’UTR of a protein-coding miRNA, leading to reduced protein output through transcript destabilization or translation inhibition. (Top left) A proposed long non-coding RNA (lncRNA) “sponge” is transcribed in the nucleus, then possibly capped, spliced and polyadenylated before being exported to the cytoplasm where the miRNP complexes located. An lncRNA with numerous miRNA binding sites is proposed as a means to effectively compete for miRNA binding. (Bottom center) With miRNP complexes sequestered on the lncRNA, translation of target mRNAs resumes.
63
Figure 3-15. Distribution of MRE frequency in predicted miRNA/lncRNA pairs.
Modified histograms summarizing the frequency of binding sites predicted for all possible miRNA (492) x lncRNA interactions in the “Brain” (1665) or “ESC” (1333) datasets. Control lncRNA datasets (dotted lines) were generated by a randomized dinucleotide shuffling of each sequence.
64
Table 3-1. Putative lncRNA “sponges” and MRE frequency for conserved miRNAs
1“Broadly-Conserved” or “Conserved” based on TargetScan miRNA family Annotations
2Coordinates are from mm10 assembly (GRCm38.p2)
65
Figure 3-16. PSMI16 (NR_015505) In situ hybridization reveals strong regional expression in adult mouse brain.
In situ data were obtained from Allen Brain Atlas data for the adult mouse brain (Ng et al, 2009) and images below modified by Ryan Spengler. PSMI16 is listed under its Riken dataset ID, 6720401G13Rik. Sagittal and coronal section data series are available. Expression level filters were applied and images are shown for both (Top Left) a coronal section (position 222) and (Top Right) and a sagittal section (position 36). The highest expression levels (Bottom; orange structures) were seen in the Hippocampal Formation (HPF, green structures) specifically in the granular cell layer of the dentate gyrus and the pyramidal layer of Fields CA1, 2 and 3. (Bottom right) High expression was also seen in the granule cell layer of the cerebellar cortex (CBX, yellow structures).
66
Figure 3-17. Strong regional expression of PSMI16 is seen in the developing mouse (14.5 DPC) by in situ hybridization.
In situ expression strength values (left) and images (right) are taken from the Eurexpress website (Diez-Roux et al, 2011). Expression strength is provided by the database and represents a numeric depiction of subjective assessments of signal intensity in the regions falling under the anatomical system categories shown on the graph. Moderate to high expression is seen in the nervous system, as in the adult mouse (Figure 3-3). (Right) In situ hybridization of PSMI16 is shown in a sagittal section of the mouse embryo. Select areas annotated as being of “High” expression are indicated on the image.
67
Figure 3-18. PSMI16 expression by RT-PCR in (A) adult mouse tissues and (B) cell lines.
(A) RT-PCR was performed on several tissues from the adult mouse using random primers for the RT step and specific primers to detect PSMI16. The ~100bp product was detectable in all tissues tested, with particularly high levels seen in the thymus, lung pineal gland, ovary and kidney. (B) NPC, BEND3 and N2A cells were tested for expression of PSMI16. Oligo-dT primers were used for the RT step to test whether PSMI16 is likely polyadenylated. Expression was highest in NPC and BEND3 cells, as detected by the primer set used in A (Lower bands, bottom right). A nearly full-length product was also detected in NPC and BEND3 cells (Top bands). Both bands suggest that PSMI16 is polyadenylated.
68
Figure 3-19. PSMI16 associates with Ago proteins in mouse neural progenitor cells.
An Ago2 antibody was used to IP Ago2 and bound RNAs from mouse NPCs and Human HEK293s (negative control). RNA was purified and reverse transcribed from bound (IP) and unbound (Supernatant) samples. PCR was performed on the cDNAs (30 cycles) using a primer set for PSMI16 or β-actin control. A specific band was seen in the Ago IP fraction of the NPCs and not HEK293s, showing that endogenous PSMI16 associates with Ago2 in these cells.
69
Figure 3-20. Differential MRE incorporation in alternative PSMI16 isoforms.
(Top) 23 miR-16 MREs are predicted in the PSMI16 reference sequence (NR_015505), spread across 13/20 exons. Alternative inclusion of MRE-containing exons is apparent in annotated isoforms, including AK030946 (shown above) which incorporates only 4 sites. The novel short isoform cloned incorporates 12. (Bottom) Multiple sequence alignment of MRE-containing exons reveals high sequence similarity.
70
CHAPTER 4
SISPOTR: A TOOL FOR DESIGNING HIGHLY SPECIFIC AND
POTENT SIRNAS FOR HUMAN AND MOUSE 0F
Abstract
RNA interference (RNAi) serves as a powerful and widely-used gene silencing
tool for basic biological research and is being developed as a therapeutic avenue to
suppress disease-causing genes. However, the specificity and safety of RNAi strategies
remains under scrutiny because small inhibitory RNAs (siRNAs) induce off-target
silencing. Currently, the tools available for designing siRNAs are biased towards efficacy
as opposed to specificity. Prior work from our laboratory and others’ supports the
potential to design highly specific siRNAs by limiting the promiscuity of their seed
sequences (positions 2-8 of the small RNA), the primary determinant of off-targeting.
Here, a bioinformatic approach to predict off-targeting potentials was established using
publically-available siRNA data from more than 50 microarray experiments. With this,
we developed a specificity focused siRNA design algorithm and accompanying online
tool which, upon validation, identifies candidate sequences with minimal off-targeting
potentials and potent silencing capacities. This tool offers researchers unique
functionality and output compared to currently available siRNA design programs.
Furthermore, this approach can greatly improve genome-wide RNAi libraries and, most
notably, provides the only broadly applicable means to limit off-targeting from RNAi
expression vectors.
Introduction
RNAi is mediated by small RNAs (~21 nucleotides) which are loaded into the
RNA Induced Silencing Complex (RISC), generating a functional complex capable of
base-pairing with and repressing target transcripts (Provost et al, 2002 ). Scientists have
devised strategies to co-opt the cellular RNAi machinery to silence virtually any gene of
71
interest using siRNAs, which may be chemically synthesized or expressed in the context
of stem-loop RNAs [e.g. short-hairpin RNAs (shRNAs)]. RNAi tools are vital for
functional genomics studies which enrich our understanding of basic biological
processes. In addition, RNAi-based therapeutics exhibit exciting potential to treat
numerous human ailments by suppressing disease-associated genes (Davidson &
McCray, 2011). However, the utility of RNAi is appreciably limited by our ability to
design siRNAs which are both potent and specific. There is considerable evidence
supporting that siRNAs bind to and regulate unintended mRNAs, an effect known as off-
target silencing (Chi et al, 2003 ; Jackson et al, 2003 ; Semizarov et al, 2003 ). Although
most siRNA design algorithms include BLAST to identify off-target transcripts with
near-perfect complementarity, off-targeting primarily occurs when the seed region
(nucleotides 2-8 of the small RNA) pairs with sequences within 3’UTRs of unintended
mRNAs thus inducing translational repression and transcript destabilization, similar to
canonical microRNA-based silencing (Guo et al, 2010; Jackson et al, 2006 ; Lewis et al,
2003). Notably, short stretches of complementarity – as little as 6 bp – may be sufficient
to initiate off-target silencing (Birmingham et al, 2006 ) (Figure 4-1A).
Numerous reports support that seed-based off-targeting generates false positives
in RNAi screens and dictates the toxicity potential of siRNAs (Anderson et al, 2008;
Fedorov et al, 2006 ; Ma et al, 2006; Schultz et al, 2011). Anderson et al. reported that
the extent of siRNA off-targeting correlates with the frequency of seed complements
(hexamers) present in the 3’UTRome (Figure 4-1B) (Anderson et al). Upon evaluating
subsets of siRNAs with differing off-targeting potential (low, medium and high; based on
3’UTR hexamer distributions), the low subset had significantly diminished microarray
off-target signatures and less adverse effects on cell viability as compared to the other
subsets. These findings established the importance of considering seed complement
hexamer frequencies as a key criterion for designing highly specific siRNAs, and some
siRNA design algorithms have since incorporated seed-specificity guidelines
72
(Birmingham et al, 2007; Jackson & Linsley, 2010; Naito et al, 2004 ). However, these
algorithms remain strongly biased for silencing efficacy, and because numerous potency-
based filters are applied ahead of specificity guidelines, few candidate siRNAs with low
off-targeting potential seeds emerge. This is reflected in recent literature and genome-
wide RNAi libraries, where only 10% of siRNAs fall into the previously established low
off-targeting range, per the Anderson et al. study(Boudreau et al, 2011; Moffat et al,
2006). While potency-based design is rational, only a fraction of the functional siRNAs
for a given target transcript are predicted, and in many instances, highly functional
siRNAs do not satisfy several design rules.
In recent work from our laboratory, we aimed to improve the safety profile of
therapeutic RNAi by designing hairpin-based vectors containing siRNAs with low off-
targeting potentials (Boudreau et al, 2011). We implemented a design scheme which
focuses on seed specificity yet promotes efficacy. This approach proved successful in
identifying therapeutic sequences which effectively silence target gene expression, induce
minimal off-targeting and are well-tolerated in mouse and non-human primate brains
(McBride et al, 2011). These promising results prompted us to extend the utility of this
approach by developing a user-friendly tool to facilitate with the selection of low off-
targeting potential siRNAs for broader application in therapeutic development and basic
biological research. Here, we describe a specificity biased design algorithm which
employs an improved means to score off-targeting potentials, and demonstrate its
effectiveness and unique functionality in comparison to current publically available tools.
Methods
Dataset and Sequence Retrieval
Pre-processed microarray datasets, annotations and sequences were obtained from
previously published supplementary materials (Garcia et al, 2011). This represents a
73
compilation of microarray data from seven earlier reports describing gene expression
changes in siRNA- or miRNA-treated HeLa cells.
TargetScan 6.0 was used determine the frequencies of seed complement binding
sites (e.g. 6-mer, 7A1, 7m8 and 8-mer) for all possible 16,384 heptamers (corresponding
to positions 2-8 of the small RNA) for each RefSeq 3’UTR sequence (Garcia et al, 2011).
Human (GRCh37/hg19) and mouse (NCBI37/mm9) 3’UTR sequences, and
corresponding gene symbols and accession numbers were obtained from the UCSC Table
Browser (http://genome.ucsc.edu/) using RefSeq annotations (Fujita et al, 2011;
Karolchik et al, 2004; Kent et al, 2002; Lander et al, 2001 ; Pruitt et al, 2005).
Formulating POTS
Dataset selection
Expression data for endogenous microRNAs were excluded from the training and
validation sets; several publications have suggested avoiding these seed sequences in
RNAi sequence design (Garcia et al; Wang et al, 2009). The GSE5814 dataset was also
excluded, because 77 of the experiments tested siRNAs with the same seed sequence.
Strand-biasing analyses were performed to determine whether sense or antisense strands
induce detectable off-targeting in each experiment. Pairwise T-tests were performed
comparing genes with at least 1 7mer site (>=1 8mer, 7M8 or 7A1) for either sense or
antisense strand seed sequence, to those having no predicted 3’UTR target site, including
6mer sites. Experiments exhibiting highly significant repression mediated by the sense
strand (one-tailed; P≤6E-5), and little to no evidence for the antisense (P>0.05) were
removed from further analyses. Of the remaining studies, the Dharmacon2008 dataset
qualitatively showed the most diversity in seed off-targeting potential, and it was set
aside for downstream validation.
74
Establishing weighted probability of repression (PR) values
and POTS calculation
Following the dataset filtering described above, 53 microarray datasets from
three independent studies (Dharmacon2006, GSE5291 and GSE5769) were used as
training data to establish POTS. For each microarray dataset, transcripts with a single
predicted 3’UTR seed binding site for either the sense or antisense strand of the given
siRNA were considered. This was done to account for possible loading of the sense
strand which may also mediate off-targeting. Transcripts with multiple target sites (8mer,
7M8, 7A1 or 6mer) for either strand were ignored so that the silencing potential for single
sites for each site type could be determined. Background data for each microarray
consisted of the remaining transcripts with no predicted 3’UTR seed binding sites for
either siRNA strand. Transcripts containing seed binding sites were parsed into groups
based on seed site type, and cumulative distributions of gene expression values were
generated for each transcript set.
PR values were calculated as a measure of the increased probability of repression
imparted by the presence of the single seed binding sites, relative to background
expectations. Statistical analyses were first performed on the datasets collectively to
identify the log2 fold-change value corresponding to the most significant divergence of
repressive potentials across all site types. For this, the data were analyzed at discrete
intervals (0.05 log2 fold-change increments), comparing the mean differences in
cumulative fractions (paired-samples T-test) for each site type set relative to the
respective background values across all experiments. Fisher’s method was used to
summarize p-values at each interval. The most significant interval (-0.3 log2 X2=176.4;
df=8; P<6E-34) was used calculate PR values where,
0.3 0.3
75
These PR values were multiplied by seed binding site frequencies (N) for each
site type in the 3’UTRome and summed to compute a weighted Potential Off-Targeting
Score using the following equation:
To generate the final POTS used in the siSPOTR tool, PR values were calculated
for both the validation and training datasets, and the median values served as the final PR
value. Also, 8mer, 7M8, 7A1 and 6mer site counts for all 16,384 heptamers were
calculated from Targetscan 6.0 (Garcia et al, 2011) predictions based on human and
mouse RefSeq-annotated 3’UTRs.
Tissue-specific POTS analysis
Expression profiles from 177 human cell lines and tissues based on the
U133A/GNF1H gene atlas were obtained from the BioGPS FTP site (http://biogps.org)
(Su et al, 2004; Wu et al, 2009). For each dataset, genes with median expression values
of greater than 100 for their corresponding probe sets were considered to be expressed. A
tissue-specific POTS (tsPOTS) was calculated for each tissue, as described above, but
limiting the 3’UTRs to expressed genes when calculating site type frequencies. Spearman
correlations were performed to evaluate variability in the rank-order of seed sequences by
tsPOTS, as compared to POTS calculated based on all human 3’UTRs.
Validating siSPOTR
Efficacy
The 2431 siRNAs in the Huesken Dataset were stepwise filtered according to the
siSPOTR design scheme (i.e. strand-biasing, GC-content and POTS rank). For a
comparison of efficacy, we used siDesign Center (Dharmacon), a highly utilized siRNA
design tool which focuses primarily on potency. Target gene coding sequences were
obtained using the Genbank Accessions provided in the Huesken siRNA Dataset and
76
were used as input sequences into the siDesign Center tool for siRNA design using
default settings. The top ten hits by siDesign Center were considered the top candidates
and were intersected with the Huesken siRNA dataset. Gene silencing efficacies for
overlapping siRNAs were recorded and plotted.
Ranking off-targeting potential
To evaluate the ability of the PR values to estimate the relative extent of off-
targeting, POTS values were calculated for the validation set, using the median value for
each site type determined from the training set. Target site frequencies were calculated as
described above, using human RefSeq 3’UTR sequences for transcripts present on the
array. POTS values were determined as the sum-product of the 8mer, 7M8, 7A1 and
6mer site frequencies and their respective PR values.
Cumulative distribution plots for gene expression values were generated by
parsing the transcripts by site type with no limitation for transcripts with single sites. The
number of down-regulated transcripts over background was calculated as described
above, subtracting the background fraction at the same point. Seeds were ranked
according to these values, and were compared to the rank-order of their estimated POTS
values, using spearman rank correlations. Visual inspection of the correlation plot
showed seven qualitatively-distinct outliers in the right tail of the POTS distribution (red
dots, Figure 5D). Spearman’s rank correlation coefficients and p-values were calculated
with and without these samples included.
Suppression signatures
Microarray data for the validation datasets was processed on a per target gene
basis (i.e. GAPDH, PPIB, and No Target groups) to discern off-targeting from gene
expression changes resulting from on-target silencing. The microarray data for each
group was evaluated to identify genes that were down-regulated by more than three
standard deviations from the mean, across the datasets, for a given gene. These gene lists
77
and accompanying gene expression values were imported into Partek Genomics Suite
(Partek GS, Saint Louis, MO) and used to perform hierarchical clustering by row
(columns were ordered by increasing POTS) allowing visualization of the suppression
signatures by heatmaps. Heatmaps were partitioned to separate low POTS and high
POTS siRNAs for each group. A qualitative assessment of suppression signature size was
defined by the area of the broadest, dark blue regions for each lane and plotted on a
common x-axis.
SiRNA Design Tool Comparison
We obtained RefSeq coding sequences for the sixteen therapeutically-relevant
gene targets (Table 1). These sequences were used as input at each of the indicated
siRNA tool websites [siDesign Center (Dharmacon,
http://www.dharmacon.com/designcenter/DesignCenterPage.aspx), siRNA Target Finder
(Genscript, https://www.genscript.com/ssl-bin/app/rnai), DSIR (Commissariat à l'Energie
Atomique; France, http://biodev.cea.fr/DSIR/DSIR.html), and Applied Biosystems SVM
dinucleotide motifs which are relatively rare within mammalian genomes. The POTS=50
value is highlighted, representing an estimated but relevant cut-off which is employed
henceforth for demonstrative purposes throughout this manuscript. This value is
noteworthy since all 14 of the previously validated low off-targeting potential siRNAs
tested by Anderson et al. have POTS<50(Anderson et al, 2008). Furthermore, our
evaluation of 750 siRNAs and accompanying in vitro cytotoxicity data support POTS<50
as a conservative cut-off associated with an improved likelihood for tolerability (data not
shown) (Fedorov et al, 2006 ). The siSPOTR specificity feature serves primarily to rank
the off-targeting potential of siRNAs, and a firm cut-off for POTS values does not exist,
much like for siRNA efficacy scores provided by potency-based siRNA design
algorithms.
The importance of weighting seed site types is evident particularly in cases where
seeds sharing the same core hexamer vary greatly in the number of genes containing the
more potent 7- and 8-mer sites. For example, the seeds CGCGATa and CGCGATc each
have 302 potential off-target transcripts (based on 3’UTR hexamer counts) but
83
respectively have 40 and 201 transcript 3’UTRs with 7- or 8-mer sites. This 5-fold
difference creates a considerable disparity in the off-targeting potentials of these seeds,
resulting in a two-fold difference in their POTS values (Table 4-2, Table 4-3). This
illustrates the importance of considering position 8 which dictates the sequence of the
most potent seed site types (i.e. 7m8 and 8mer). We calculated the mean site type
frequencies for all possible heptamers binned by POTS values, revealing nearly a 5-10
fold reduction in the more potent site types for Low POTS heptamers, relative to those
with medium-to-high POTS (e.g. for 8mers, mean values of ~45 compared to >350,
respectively).
Finally, as means to further refine our prediction of off-targeting potentials, we
considered the degree to which POTS is influenced by variations in gene expression
changes across tissues. For this, transcriptional profiling data from 177 different human
cell lines and tissues (BioGPS) were used to calculate tissue-specific POTS for all
possible heptamers. Although gene expression patterns vary greatly across tissues, POTS
ranks for each heptamer correlate strongly (r2>0.95; Figure 4-4). These data support that
organism-wide application of POTS is suitable.
SiSPOTR design example
We provide a step-wise example illustrating the use of siSPOTR for designing
siRNAs targeting the human PPIB coding sequence (CDS; Figure 4-5). The 648-nt target
sequence is first divided up to produce all 631 possible 21-mer siRNA target sites, and
the strand-biasing and GC-content filters described above are applied prior to
determining POTS values for the resulting siRNAs. In this example, among the 113
PPIB-targeted siRNAs which satisfy the strand-biasing and GC-content criteria, seven are
represented in the siRNA validation datasets described below, allowing visualization of
the measured off-targeting associated with their respective POTS values of 25, 29, 40,
407, 410, 510 and 560 (Figure 4-6).
84
Validation of siSPOTR algorithm:
efficacy and specificity
Efficacy
We gauged the capacity of siSPOTR to identify potent siRNA sequences among
the siRNAs in the Huesken dataset (Figure 4-6A). The siRNAs satisfying the strand-
biasing and GC-content criteria were rank ordered by POTS (low to high), yielding seven
siRNAs with POTS<50. Here, this relatively low number results from fewer sequences
passing the strand-biasing filter, since the capacity for introducing duplex instability
using G-U base-pairs, as described above, is not applicable to these pre-existing siRNAs.
Surprisingly, these seven siRNAs each had >80% silencing efficacy, with a mean
comparable to that of siRNAs within the database that were identified among the top hits
generated by siDesign Center (Dharmacon), a widely-used siRNA design website.
Although siDesign Center yields more hits among this database, only two of these
siRNAs has a POTS<50. Indeed, siSPOTR identified five siRNAs not among the
siDesign Center hits (Figure 4-6A, Venn diagram), highlighting the unique output
potential of the siSPOTR algorithm.
Off-targeting potential
We next evaluated the predictive power of POTS to estimate the extent of off-
target gene silencing observed among microarray experiments for 40 unique siRNAs
targeting GAPDH, PPIB, or “No Target”. These 40 experiments were selected because
the siRNAs encompass a broad range of POTS with relatively equal representation across
low, medium and high scores. To improve our ability to discern sequence-specific off-
targeting from gene expression changes associated with on-target silencing, the datasets
were grouped by target gene prior to calculating differential gene expression and
establishing “suppression signatures” for each siRNA. Furthermore, each of these 40
siRNAs exhibits greater than 85% silencing efficacy, reducing the potential for detecting
85
gene expression changes due to varying degrees of on-target silencing within groups. In
support of the POTS approach, our analyses of these datasets reveals smaller sequence-
specific “suppression signatures” among the low off-targeting potential siRNAs
(POTS<50), relative to siRNAs with higher POTS (Figure 4-6B). Notably, 13 of 28
higher POTS siRNAs produced greater “suppression signatures” than the largest one
observed among the low POTS siRNAs (Figure 4-6C). It is important to note that our
analyses (data not shown) and previously published data support that these “suppression
signatures” consist of down-regulated transcripts that are enriched for 3’UTR seed
binding motifs, suggesting that most are likely to be direct siRNA off-targets (Burchard
et al, 2009; Jackson et al, 2006).
The prospect of using POTS to accurately rank off-targeting potentials among
these 40 siRNAs was also assessed. Spearman rank correlation of the POTS scores and
numbers of down-regulated off-targets observed for each siRNA indicated a positive
correlation of modest significance (Figure 4-6D, dotted line, P = 0.05). As depicted by
this plot, a few higher POTS siRNAs have low numbers of off-targets (red dots);
however, none of the low POTS siRNAs showed high numbers of off-targets. Indeed
removing the overt outliers among the higher POTS siRNAs produces a highly
significant correlation (solid line, P < 1E-8), providing further evidence that POTS is a
reliable predictor of siRNA off-targeting potentials. These data, in conjunction with the
efficacy validation, establish the robust capability of siSPOTR to identify highly specific
and effective siRNAs.
Finally, we reasoned that training on more datasets (i.e. combining the training
and validation sets described above) could generate a more accurate POTS for ranking
siRNA off-targeting potentials. As expected, the Spearman rank correlation of POTS
scores and numbers of down-regulated off-targets observed for each siRNA showed even
greater significance (Figure 4-7). These improved POTS values are used henceforth.
86
Comparison of siSPOTR to other algorithms
We subsequently compared the abilities of our design strategy and other
publically available algorithms, particularly those which incorporate seed specificity
parameters, to identify siRNAs with low off-targeting potential seeds (i.e. low POTS).
The coding sequences of 16 therapeutically-relevant genes (of varying sizes; comprising
in total ~50 kb) were used as input, and the number of candidate siRNAs with POTS<50
was determined for each algorithm. Our design scheme identified more low off-targeting
potential siRNAs [at least four siRNAs (a typical starting number for initial efficacy
screening) for all 16 of the input genes] relative to the other algorithms, which failed to
generate at least four siRNAs with POTS<50 for at least 8 of the 16 genes (Table 4-1).
This observation emphasizes a considerable limitation of current siRNA design tools that
are strongly biased towards potency, highlighting the unique functionality that siSPOTR
provides to researchers seeking siRNAs with low off-targeting potentials.
Prospective applications to expressed RNAi and
genome-wide RNAi libraries
The siSPOTR algorithm provides an attractive approach for limiting off-targeting
from hairpin-based RNAi expression systems, which unlike siRNAs, are not amenable to
chemical modifications that may reduce seed-based off-targeting (Bramsen et al, 2010;
Jackson et al, 2006 ; Vaish et al). Recently, we published microarray data supporting that
RNAi vectors expressing siRNAs with low off-targeting potentials (based on 3’UTR
hexamer frequencies) show reduced off-targeting relative to sequences with more
promiscuous seeds (Boudreau et al, 2011). To ascertain whether POTS can be a reliable
indicator of off-targeting from expressed RNAi, we evaluated the association of POTS
with off-targeting for the expressed RNAi sequences tested in this previous study (eight
constructs with POTS ranging from 11 to 653). Hierarchical clustering of differentially
expressed genes (N=827, P<0.0001) among the various RNAi sequences reveals that the
87
clustering distance relative to the control (i.e. promoter-only vector) increases in
agreement with rising POTS values (Figure 4-8), supporting that Low POTS RNAi
sequences induce fewer gene expression changes as compared to sequences with higher
POTS values. These data substantiate the utility of siSPOTR for improving the specificity
of RNAi expression vectors.
Next, we investigated the feasibility of generating a genome-wide shRNA library
using this algorithm. Genome-wide RNAi screens are broadly used to discover genes
implicated in biological pathways and phenotypes; however, these screens can be plagued
by off-target effects producing false leads (Ma et al, 2006; Schultz et al, 2011). Although
bioinformatic approaches show some practicality for distinguishing off-targets from bona
fide targets (Sigoillot et al, 2012; Zhang et al), careful attention to sequence selection
may greatly reduce off-targeting among libraries. There are currently several RNAi
libraries available in synthetic siRNA or expressed forms (e.g. shRNAs). Here, we
demonstrate the potential of our siRNA design scheme to generate genome-wide RNAi
libraries with high specificity (based on POTS and BLAST, see methods). Our
prospective shRNA library (“Low POTS”) consists of 235,121 sequences (up to 10
shRNAs per target gene; POTSmedian=37) and provides at least 4 shRNAs with<50 POTS
for more than 78% of all RefSeq mRNAs (Figure 4-9). These sequences have reduced
(nearly 10-fold) off-targeting potential over those offered in a publically available
shRNA library [178,265 sequences; POTSmedian=322; The RNAi Consortium (TRC)]
which covers 0.70% of RefSeq mRNAs with at least 4 shRNAs having<50 POTS. A
histogram of the POTS distributions for each of these libraries reveals an evident
disparity, with>90% of the sequences having improved POTS relative to the TRC library
which followed a near-random distribution mirroring POTS for all possible heptamers.
For genome-wide siRNA design, the “low POTS” library coverage is even broader (data
not shown), providing an additional means to enhance specificity in combination with
88
chemical modifications to the seed (Bramsen et al, 2010; Jackson et al, 2006 ; Vaish et
al, 2011).
SiSPOTR Online Tool
Based on these observations, we developed an online tool employing the
siSPOTR algorithm to assist users with designing RNAi sequences with low off-targeting
potential for application in human and mouse (https://sispotr.icts.uiowa.edu). The
siSPOTR tool searches user-defined target sequences for siRNAs that pass strand-biasing
and GC% filters and outputs candidate siRNAs rank-ordered by POTS from lowest to
highest. For convenience, the sequences are ready-to-order with the necessary nucleotide
substitutions made to the sense strand to promote proper strand-loading. In addition,
DNA oligonucleotide sequences for generating corresponding shRNAs are supplied to
assist users with generating RNAi expression vectors. The output also provides detailed
off-targeting information for each siRNA including i) the number of 3’UTRs containing
each seed site type, ii) the putative off-target transcripts, and iii) counts of each seed site
type on a per transcript basis. The siSPOTR tool also alerts the user if the siRNA seed
sequence matches that of a known miRNA, as such an instance may confound
experimental results given the regulatory roles miRNAs play in numerous biological
processes and pathways. Furthermore, recognizing the ease of purchasing pre-validated
siRNAs and shRNAs, we provide an accompanying online tool which allows users to
input siRNA sequences to obtain POTS values and the detailed off-targeting information
described above. These tools will provide researchers with dependable means to
minimize and evaluate off-targeting concerns associated with RNAi experiments.
89
Discussion
Consideration of Seed Pairing Stability
A recent report from the Bartel laboratory evaluated the impact of seed-pairing
stability (SPS) and target abundance (TA; levels of potential binding sites in the cellular
transcriptome) on seed-mediated silencing by small RNAs (miRNAs and siRNAs)
(Garcia et al, 2011). Their data support that seeds with weak SPS inherently have higher
TA, and that both factors limit seed-based silencing potency, presumably from weaker
binding and a dilution effect associated with the increased number of targets. In contrast
to the siSPOTR approach, the authors propose that designing siRNAs with weak SPS and
high TA seeds may minimize off-targeting potential. While the potency of such seeds
may be low on average, the possibility of repressing considerably more off-targets exists.
A comparison of the low POTS approach to the weak SPS strategy may be warranted.
When accounting for repressive potentials in addition to the numbers of predicted off-
targets, it is likely that siRNAs having weak SPS would consistently have higher numbers
of off-targets expected to be down-regulated, relative to low POTS siRNAs. Even yet, a
consideration for SPS in siRNA design is warranted, and we have added SPS values to
the siSPOTR output, so that users may avoid higher SPS seeds among siRNAs with
comparable POTS values.
The Utility of siSPOTR
Off-target effects (e.g. false discovery rates and toxicity) pose a problem for gene
silencing technologies, particularly for RNAi therapeutics, thus supporting the need for
developing a user-friendly tool to assist researchers in designing siRNAs which are
highly specific and efficacious. Here, and in prior work from our laboratory and others’,
we demonstrate that focusing on seed specificity in siRNA design may mitigate off-
targeting by 5- to 10-fold, as supported by predictive analyses and transcriptional
profiling data from RNAi studies (Anderson et al, 2008; Boudreau et al, 2011). Unlike
90
other siRNA design strategies, siSPOTR yields numerous candidate sequences with low
off-targeting potentials, providing a broad and attractive approach towards alleviating
off-target concerns. Other means to address off-targeting have been previously described.
For example, in basic biological research, scientists may employ “same seed” controls
(i.e. containing the same seed sequence as the experimental siRNA, but central
mismatches to prevent silencing of the target of interest) to discern on-target versus off-
target effects(Boudreau et al, 2011). Furthermore, research supports that off-targeting
from synthetic siRNAs can be reduced by chemical modifications or using lower doses
(Bramsen et al, 2010; Caffrey et al, 2011; Jackson et al, 2006 ; Vaish et al, 2011; Wang
et al, 2009); however, specificity could be enhanced further by employing seeds with low
POTS. By contrast, for expressed RNAi forms (e.g. shRNAs), our approach provides the
only broadly applicable methodology to limit off-targeting potential. Although sequence-
specific effects on hairpin expression, stability, and processing may also contribute to off-
targeting potential, our data support that POTS provides a good predictor of off-targeting
for RNAi expression vectors. This is important particularly since dosing from RNAi
expression vectors cannot be as readily controlled, and shRNA-induced toxicities have
been reported by several groups (Boudreau et al, 2009a; Grimm et al, 2006 ; Martin et al,
2011; McBride et al, 2008). Given the extensive use of RNAi expression systems in the
laboratory and in therapeutic development, siSPOTR will serve as a valuable tool to the
research community.
SiSPOTR can easily be used in conjunction with other siRNA design algorithms
(e.g. those weighted towards efficacy) to query their outputs for off-targeting potentials
and information. For instance, one can use Applied Biosystems’ hyperfunctional (i.e.
highly potent) siRNA design tool to identify hyperfunctional candidate sequences which
can subsequently be input into the siSPOTR tool to retrieve their POTS values (Wang et
al, 2009). This combined approach aims to ascertain siRNAs with a highly desirable
balance of potency and low off-targeting potential, providing an attractive means to
91
identify therapeutic siRNAs for disease-relevant targets, particularly larger genes which
have numerous low POTS siRNAs available (Table 4-1).
SiSPOTR allows users to query the identities of predicted seed-based off-target
transcripts as means to avoid potentially important cellular genes (e.g. those involved in
cell cycle and viability). Off-target identity is an important contributor to the overall
detrimental effects caused by disrupting gene networks, and the resulting tolerability for a
given siRNA. However, declaring a predicted off-target to be important remains difficult
due to a dependence on numerous variables [e.g. experimental system (i.e. cell type),
duration and extent of knockdown, identities of other off-targets (e.g. a two-hit model),
etc.]. Nevertheless, although researchers should consider the identities of predicted off-
targets, it stands to reason that minimizing the off-targeting potential of the siRNA seed
will inherently reduce the likelihood of unintentionally silencing important genes and
further limit downstream events associated with cascading gene networks.
Finally, siSPOTR supports RNAi sequence design for human and mouse
experimental systems; however, all low POTS heptamers contain CpG motifs which are
consistently sparse throughout mammalian genomes. Furthermore, the ranking of
heptamers by POTS for mouse and human reveals a significant correlation (r2>0.938, plot
not shown), suggesting that siSPOTR is likely applicable to other mammalian species.
92
Figure 4-1.Diagram of on- and off-target silencing by siRNAs.
(A) Cartoon depicting a siRNA duplex designed to exhibit proper strand-biasing [i.e. strong G-C (blue) and weak A/G-U (red) binding at the respective 5’ and 3’ ends of the sense strand] and contain a low off-targeting potential seed (green highlight). Upon loading into RISC, the antisense strand may direct on-target silencing (intended) and off-target silencing (unintended). (B) Schematic highlighting the relationship between the frequencies of seed complement binding sites in the 3’UTRome and the off-targeting potential for siRNAs. Contributed by Ryan Boudreau.
93
Figure 4-2.Effect of siRNA off-targeting potential on gene silencing capacity.
A siRNA database composed of 2431 randomly designed siRNAs (targeting 31 unique mRNAs) and accompanying silencing data (Huesken et al, 2005 ) was used to determine whether low off-targeting potential siRNAs (i.e. those having <2000 potential off-targets based on seed complement hexamer distributions in human RefSeq 3’UTRs; blue) have similar capacities for gene silencing relative to the remaining 2068 siRNAs (mid-to-high off-targeting potentials; red). Roughly 1 in 4 of the low off-targeting potential siRNAs achieved >80% silencing (a commonly accepted threshold for potency), and overall their average efficiencies were comparable to the remaining siRNAs (~66% and 69% knockdown, respectively; dotted lines). (Contributed by Ryan Spengler).
94
Figure 4-3. Formulation and distribution of POTS (potential off-targeting score).
(A) Illustration of seed site types, with seed sequences highlighted in green. The adenosine corresponding to position 1 is highlighted in yellow and represents a defining feature for the 7A1 and 8mer binding site types. (B) The effect of seed site type on off-target silencing was determined using data 54 microarray experiments testing unique siRNAs in HeLa cells. Cumulative distribution plots for gene expression values are shown for transcripts grouped by the binding site type present. Only transcripts containing singles sites of a given type were considered. ***Student t-test indicated that the most significant divergence of the repressive potentials among these site types occurs at ≤ -0.3 Log2 fold-change (P<0.001). (C) Schematic illustrating how POTS is calculated using seed site type frequency and probability of repression (PR) values, shown above each respective site type. (D) The distribution of POTS scores – based on human 3’UTR sequences – for all possible 16,384 heptamers is plotted. POTS<50 is highlighted to indicate a relevant cut-off which is employed for purposes of this manuscript (refer to ‘Results’ section for further information regarding the relevance of this value). (Panels A and C contributed by Ryan Boudreau; Panels B and D contributed by Ryan Spengler).
95
Figure 4-4. Correlation of POTS ranks across tissues.
Tissue-specific POTS values for 177 human cell lines and tissues (BioGPS) were calculated based on genes expressed (median of probeset expression values ≥100) in those tissues. (A) POTS values calculated using all 3’UTR sequences (Overall POTS) were correlated with those calculated by the 177 expression profiles (Spearman rank correlation). The histogram and box plot show the variation of correlation coefficients (r2) for each pairwise comparison (error bars = 2-98th percentile). (B) The scatter-plot shows the correlation of Overall POTS scores with the tissue-specific POTS distributions with the worst calculated correlations (r2 0.9982-0.9986).
96
Figure 4-5. Workflow schematic for designing siRNAs targeting human PPIB using the siSPOTR algorithm.
All possible 631 siRNAs targeting the human PPIB coding sequence (CDS) were filtered based on strand biasing [i.e. strong G-C (blue) and weak A/G-U (red) binding at the respective 5’ and 3’ ends of the sense strand] and GC-content, and the number of siRNAs passing each criteria are provided. Note: the asterisk denotes a cytosine base in the 3’ end of the target site; this base can be converted to a uridine to produce a weak G:U base-pairing in the resulting siRNA duplex. The heptamer seed sequence used for POTS determination is highlighted. (Contributed by Ryan Boudreau).
97
Figure 4-6. Validation of siSPOTR: efficacy and off-targeting.
(A) SiRNA efficacy was evaluated using a database of 2431 randomly designed siRNAs with accompanying silencing data. The number of siRNAs passing each stage of our stepwise filtering process is indicated along with the number of potent sequences among them (i.e. those with >80% silencing efficacy. *siDesign Center (Dharmacon) was used for comparison by inputting the relevant target gene sequences into the online tool (N=29) and intersecting the top ten hits for each gene with the 2431 siRNAs. The box and whiskers plot shows the max and min gene silencing values (whiskers) and the upper and lower quartiles (box). The accompanying Venn diagram shows that siSPOTR identified five unique and effective sequences not present among the siDesign Top Hits. (B-D) Microarray data from experiments testing 40 unique siRNAs were used to assess the reliability of POTS as an indicator for off-targeting potential. (B) Heatmaps representing sequence-specific gene “suppression signatures” unique to each siRNA were generated using hierarchical clustering of significantly down-regulated genes (>3 standard deviations from the mean) among the datasets on a per target gene basis (i.e. GAPDH, PPIB and No Target), and columns were ordered and parsed by POTS for each group.
98
Figure 4-6. Continued. (C) A qualitative representation of “suppression signature” size (i.e. sum of dark blue regions) for each column is shown. The red dotted line marks the largest “suppression signature” among the siRNAs with POTS<50. (D) Spearman rank correlation of the POTS scores and numbers of down-regulated off-targets (i.e. transcripts with 3’UTRs containing 7- and 8-mer seed binding sites and ≤ -0.3 Log2 fold-change) observed for each siRNA is plotted. Linear regression lines, including correlation coefficients and p-values, for all data points (dotted line) and black dots (solid line) are provided. Red dots represent overt outliers. (Panels A, B and C contributed by Ryan Boudreau; Panels A and D contributed by Ryan Spengler).
99
Figure 4-7. Spearman rank correlation of final POTS values.
Spearman rank correlation of final POTS values. Spearman rank correlation of the POTS scores and numbers of down-regulated off-targets (i.e. transcripts with 3’UTRs containing 7- and 8-mer seed binding sites and ≤ -0.3 Log2 fold-change) observed for each siRNA is plotted. Data consists of the training and validation groups combined. Linear regression lines, including correlation coefficients and p-values, for all data points (dotted line) and black dots (solid line) are provided. Red dots represent overt outliers.
100
Figure 4-8. Effect of POTS on off-targeting from hairpin-based RNAi expression vectors.
HEK293 cells were transfected with U6 promoter-only or U6-driven hairpin-based RNAi expression plasmids (n = 4 for each treatment), and RNA was harvested 72 h later for microarray analysis. Two-way ANOVA was performed to detect differentially expressed genes among the treatment groups. Hierarchical clustering of differentially expressed genes (P < 0.0001, 827 genes) was performed to visualize the relationships among the treatment groups. Notably, all of the low POTS sequences (green) exhibit gene expression profiles that are more closely related to the U6 control, as compared to the remaining sequences which have medium (yellow) to high (red) POTS values. (Contributed by Ryan Boudreau).
At least 4 siRNAs? 16 of 16 7 of 16 2 of 16 8 of 16 0 of 16
** POTS<50 serves as a relevant cut-off for purposes of this manuscript (refer to ‘Results’ section for further information regarding the relevance of this value).
N/A indicates that the online tool was unable to process transcripts of this length.
Contributed by Ryan Boudreau.
102
Table 4-2. The effect of seed position 8 on off-targeting potential by site frequency.
All possible 7mer (nt 2-8) seed sequences were grouped according to their common core 6mer (nt 2-7). The number of 3’UTRs containing any 6mer binding motif were counted. The number of these putative targets containing at least one 8mer, 7M8 or 7A1 site, given the variant base at position 8 was also tallied. The ratio between the maximum and minimum number of genes among the four heptamers was then calculated for each group. The groups with the 10 highest ratios are indicated in the table above.
# 3'UTRs with 8mer, 7M8 or 7A1 Binding Site Given N at Seed position 8
Table 4-3. The effect of seed position 8 on off-targeting potential by POTS
The same as Table 4-2, except here the POTS values for the core 6mer sequence given A, T, G or C at position 8 are provided. The ratio between the maximum and minimum POTS in each seed group is provided.
Figure 4-9. Comparison of off-targeting potentials among shRNA libraries.
A histogram and complementing table presenting the POTS distributions and genome-wide coverage of shRNA library sequences are shown for our “Low POTS” library (green) and the TRC library (red). The POTS distribution of all possible heptamers (blue) serves as a reference. The range encompassing 90% of all sequences for each shRNA library is indicated. Yellow highlights intersect to emphasize the coverage disparities at a key point; POTS<50 provides a conservative cut-off for low off-targeting potential, and at least 4 siRNAs are desired for a given gene when generating a library or performing initial efficacy screening.
105
CHAPTER V
FINAL DISCUSSION
Competitive Endogenous RNAs
Experimental manipulation of miRNA activity has long relied on the ability to
block or sequester miRNA binding through the use of synthetic antagomirs and expressed
miRNA sponges. These molecular tools showed that, at least in principle, miRNA
activity can be regulated in a competitive manner. In Chapter 3, lncRNAs were proposed
as endogenous miRNA “sponges,” serving as endogenous analogs to the artificial
inhibitory tools. As described briefly in that chapter, recently published functional
evidence suggests that competitive endogenous RNAs (ceRNAs) take on many forms,
including long intergenic noncoding RNAs (lincRNAs) similar to PSMI16, pseudogenes,
and even protein-coding mRNAs (Cesana et al, 2011; Hansen et al, 2013; Karreth et al,
2011; Poliseno et al, 2010; Sumazin et al, 2011; Tay et al, 2011). Based upon these
reports, “ceRNA” describes a functionally diverse array of RNA classes, much in the
same way that “RNAi” describes a general process mediated by miRNA, endo-siRNA,
piRNA and the like. Future work will likely involve functional characterization of more
ceRNA:miRNA interactions, along with the physiological or pathophysiological
pathways in which they function. Additionally, other RNA species, such as the recently-
reported circRNAs (Hansen et al, 2013), may also be ceRNAs.
The observation that pseudogenes like PTENP1 function as ceRNAs adds another
connection between transposons and miRNAs. PTENP1 is an example of a “processed”
pseudogene. Processed pseudogenes are created when a mature, spliced transcript is
reverse transcribed and integrated into the genome by retrotransposon- or retrovirus-
encoded proteins. For example, PTENP1 formed when a LINE1 element mobilized and
reverse transcribed a fully processed copy of the PTEN gene. In the Posileno et.al. study,
many of the conserved MRE sites (e.g. miR-19,-20,-21,-26 and -214) from the PTEN
106
3’UTR were still intact in PTENP1, thus imparting its ceRNA activity. Interestingly,
PTENP1 is present only in apes, as no syntenic locus is found in rhesus (Old World
Monkey), marmoset (New World Monkey) or mouse genomes. This exemplifies how
primate-specific transposition activity can alter the activity of conserved miRNAs.
The fact that PTENP1 retains many MREs from the parent PTEN transcript also
reveals an important nuance differentiating pseudogene ceRNAs from other ceRNAs.
Most mRNAs, like PTEN, are coordinately regulated by multiple miRNAs, and a
pseudogene could compete for them. This means that pseudogene ceRNAs would likely
have the most potent effect on the expression levels of the parent gene and any other gene
bound by the same set of miRNAs. On the other hand, lincRNA ceRNAs like PSMI16
have numerous binding sites for a given miRNA. I would hypothesize that lincRNAs
would globally impact the targets of a miRNA family, whereas pseudogenes would
regulate its parent gene more specifically.
Off-targeting and RNAi design
We took advantage of mRNA transcript degradation by miRNA-like interactions
to detect off-target effects from exogenous RNAi triggers after their delivery to cells and
tissues. We found that the extent of miRNA-mediated changes on cell expression profiles
was robust, and in some cases, these broad transcriptional perturbations caused cell
toxicity. It stands to reason then, that rational design of RNAi triggers with low off-
targeting potential would reduce the probability of generalized transcriptional
disturbances and subsequent toxicity.
Although in general we can reduce off-targeting probability with our siSPOTR
algorithm, we also found that some low off-targeting potential sequences induced toxicity
in vivo. This suggests that not all off-targeting can be avoided, and that empirical testing
of RNAi triggers is required to assess their overall safety. Future research to further
improve predictions of RNAi specificity would benefit from closer analysis of sequences
107
deemed toxic in the literature. We are currently working to find ways to “switch” the off-
targeting profile of exogenous, artificial miRNA triggers found to be toxic. We have
found that given an antisense RNA with a “low POTS” seed that induces unintended
toxicity, single base changes to the seed sequence changes the off-target profile. As we
assume that at least one of the original sequence’s off-targets is problematic when
suppressed, switching to another low POTS seed avoids most, if not all of the original
off-target genes. To test this, we are currently working with an artificial miRNA that
effectively silences expression of Huntingtin (HTT), but which induces behavioral
deficits in wild-type C57BL/6 mice. Because similar constructs targeting HTT have been
tested in nearly identical experimental settings, achieving comparable levels of HTT
repression (Boudreau et al, 2009b; McBride et al, 2008), we hypothesize that sequence-
specific off-target effects are causing this phenotype. So far, experiments performed by
Alex Mas Monteys using constructs I designed to alter seed sequences while retaining
potency, reveals that directed single base mutations in the toxic miRNA’s seed preserves
HTT silencing efficacy in vitro. Bioinformatic target site predictions indicate that very
few seed-mediated targets overlap between the toxic trigger and the modified ones. The
next step is to inject the original or modified sequences into C57BL/6 mice as before and
see whether the mutations correct the toxic phenotype. Notably, if toxicity persists, this is
likely due to hitting other target transcripts whose expression level must be maintained at
or near 100% for cell viability.
If the single base mutations prove effective in mitigating the toxic phenotypes that
manifest from seed-mediated off-target effects, it follows that the same changes could be
made to reduce the off-target potential of high POTS sequences. As discussed in Chapter
4, commercial suppliers of pre-designed RNAi sequences focus on designing the most
potent sequences for their customers. Also mentioned in Chapter 4, based on the relative
rarity of low POTS seeds, most of the sequences designed for potency and not for seed
specificity will likely have high off-targeting potential. However, we found that all low
108
POTS seed sequences contained at least one “CG” dinucleotide. This dinucleotide is
known to be relatively infrequent in mammalian genomes. On average, every additional
“CG” nucleotide in a 7-mer motif results in a 10-fold reduction in 7-mer frequency
(Garcia et al, 2011). Therefore, we expect that if we start with a highly-potent RNAi
sequence with relatively high off-target potential, a single base change in the seed to
introduce a “CG” dinucleotide could greatly reduce their off-targeting potential. If these
mutations have minimal impact on silencing efficacy, as we have seen with the HTT
sequences thus far, we could greatly increase our ability to design low off-targeting
sequences, and perhaps even increase our stringency in screening for potency.
Emerging technologies in the study of miRNA biology
The data presented in this work, as well as many of the cited publications, has
revealed that the mechanisms underlying miRNA biogenesis and function are far more
complex than represented in the canonical pathways outlined in Chapter 1. Integrating
these newer pathways and determining the relative breadth of each to various biological
systems will be important tasks in the future. For example, Ago HITS-CLIP and similar
technologies will be essential for verifying to what extent and in which biological settings
TE-derived or lncRNA-resident MREs are actually occupied by Ago complexes. Based
on the “off-targeting” phenomenon observed with exogenous RNAi triggers, it is clear
that the RNAi machinery can be pushed to silence biologically-irrelevant targets in
sufficient doses. HITS-CLIP will provide a better picture of what is actually engaged by
RISC machinery under physiological conditions.
On the other hand, the physiological role of the Ago-bound complexes also
remains an open question. Our current understanding of miRNA function is largely based
upon perturbations of individual miRNA levels in cell culture models. Less clear is what
function miRNAs play in a relatively static setting of terminally-differentiated cells. Even
in comparing miRNA binding profiles in normal versus disease states, the question will
109
remain as to which changes are causative and which are reactionary to the disease state. I
believe that in order to effectively use and interpret HITS-CLIP to study these kinds of
questions, we should first understand how Ago binding profiles relate to gene expression
changes in acute disease settings. For example, what happens to Ago binding profiles
during acute ischemia brought on by stroke or myocardial infarction? Furthermore, how
does the response differ in these settings in which very different mRNA and miRNA
profiles are intrinsically present? Following a common theme in biology, it seems likely
that some miRNAs will be involved in an immediate reactionary phase, followed by
another group guiding a return to homeostasis. Among the most interesting findings will
be determining to what extent the concentration of the miRNA or the targets influence the
activities of one another, given that lncRNAs, pseudogenes and mRNAs appear to
compete for miRNA binding.
Alternatively, given a setting such as B-cell chronic lymphocytic leukemia (B-
CLL) where the miR-15/16 family is deleted in nearly half of all cases (Calin, 2002),
HITS-CLIP and other high-throughput techniques would help uncover which
physiological changes result from loss of miR-15 and -16, and which come from the
resulting void filled by the miRNAs that remain. As expected, many validated targets for
miR-15/16 are upregulated in response to the chromosomal deletion. However, the
sudden disappearance of such a highly-expressed miRNA would also likely increase the
effective silencing capacity of the remaining miRNAs. In a simplistic setting, given a loss
of the miR-15/16 family with no net change in expression of miRNA machinery or other
mature miRNAs, more Ago proteins would be free to engage the remaining miRNAs.
The extent to which these miRNAs contribute to the observed gene expression changes
remains an open question. Ago HITS-CLIP could be performed to compare B-CLL cells
with the miR-15/16 deletion with B-CLL cells lacking the deletion or normal B cells.
Analysis of the Ago binding profile will show a complete loss of the miR-15/16-
dependent peaks. If the remaining miRNAs do indeed have increased binding potential
110
with the miR-15/16 locus deleted, then there should be a concomitant increase in peaks or
peak height associated with these remaining miRNAs. Performing RNA-seq on the total
RNA in these cells will also be important for comparative HITS-CLIP to account for
peak changes due to changes in mRNA expression levels.
What has become quite clear over the past several years is that a close partnership
between computational and molecular biologists is essential for truly understanding the
function of these small non-coding RNAs. No miRNA or miRNA:target interaction exists
in a vacuum, and microarray, RNA-seq and HITS-CLIP techniques will help to delineate
some more complex interactions. At the same time, the role of the biologist becomes all
the more important to present a setting and biological question for which these techniques
can be effectively employed, correctly interpreted and ultimately validated.
As the miRNA field moves forward, largely guided by high-throughput
sequencing technology, researchers should go in with a sense of naivety to the role that
miRNAs play. Reading through a 2004 review in Cell, entitled “MicroRNAs: genomics,
biogenesis, mechanism and function,” (Bartel, 2004) it is apparent that prior assumptions
guiding current research in these areas have changed very little in the near decade that
has passed since the review’s publication. Although such assumptions are not necessarily
invalid, indiscriminately following them has left many important observations to become
nothing more than puzzling curiosities. Assuming no strict a priori knowledge, careful
interpretation of the information gleaned from the new technology mentioned above
could illuminate the importance of intriguing observations such as, miRNAs up-
regulating gene expression (Vasudevan et al, 2007), “isomiR” production (Guo & Lu,
2010; Martí et al, 2010), or even apparent loading into Ago and functional silencing
mediated by miRNA precursors (Tan et al, 2009). In general, miRNA studies primarily
report miRNA-mediated repression of target genes containing 3’UTR MREs based on the
most commonly-annotated miRNA isoform. What remains unclear is to what extent the
research is biased due to researchers only choosing to study the canonical interactions, or
111
whether the non-canonical pathways are actually rare occurrences in nature. Ultimately,
the coming years should prove exciting for the miRNA field, and it has been a privilege
to play some small part in contributing to knowledge and discourse in this area.
112
APPENDIX
ADENOSINE DEAMINATION IN HUMAN TRANSCRIPTS
GENERATES NOVEL MICRORNA BINDING SITES 1F
Abstract
Animals regulate gene expression at multiple levels, contributing to the
complexity of the proteome. Among these regulatory events are post-transcriptional gene
silencing, mediated by small noncoding RNAs (e.g., microRNAs), and adenosine-to-
inosine (A-to-I) editing, generated by Adenosine Deaminases that Act on double stranded
RNA (ADAR). Recent data suggest that these regulatory processes are connected at a
fundamental level. A-to-I editing can affect Drosha processing or directly alter the
microRNA (miRNA) sequences responsible for mRNA targeting. Here, we analyzed the
previously reported adenosine deaminations occurring in human cDNAs, and asked if
there was a relationship between A-to-I editing events in the mRNA 3’ untranslated
regions (UTRs) and mRNA::miRNA binding. We find significant correlations between
A-to-I editing and changes in miRNA complementarities. In all, over 3,000 of the 12,723
distinct adenosine deaminations assessed were found to form 7-mer complementarities
(known as seed matches) to a subset of human miRNAs. In 200 of the ESTs, we also
noted editing within a specific 13 nucleotide motif. Strikingly, deamination of this motif
simultaneously creates seed matches to three (otherwise unrelated) miRNAs. Our results
suggest the creation of miRNA regulatory sites as a novel function for ADAR activity.
Consequently, many miRNA target sites may only be identifiable through examining
expressed sequences.
113
Introduction
A-to-I RNA editing catalyzed by dsRNA-specific ADAR refers to the conversion
of adenosine to inosine in double-stranded (ds) or stem-loop regions of precursor mRNAs
(Bass, 2002). Experimental evidence demonstrates that, whether found in a codon,
anticodon or mature miRNA, inosine, like guanine, preferentially base-pairs with
cytosine (Yoshida et al, 1968). Several characterized examples of amino acid changes
created by adenosine deamination show that ADARs can regulate gene expression by
directing the synthesis of distinct proteins from a single open reading frame (Bass, 2002;
Burns et al, 1997). Recent work by Li et al. confirms that editing events occur at a much
higher frequency within noncoding regions (Li et al, 2009). Comparisons of human EST
and genomic sequences have identified thousands of distinct ADAR deaminations
occurring in many different genes (Levanon et al). Possible functions for editing events
include altered splicing, RNA localization, nuclear retention, mRNA stability and
translational efficiency (reviewed in (Chen & Carmichael, 2008)). Interestingly, most
editing sites occur in Alu elements (Athanasiadis et al, 2004 ; Hundley et al, 2008; Kim et
al; Levanon et al, 2004 ), the majority of which are in UTRs (Hundley et al, 2008;
Levanon et al).
Experimental evidence suggests that miRNA-mediated post-transcriptional gene
silencing and A-to-I editing are interrelated (Kawahara et al, 2007a; Kawahara et al,
2007b; Luciano et al, 2004; Scadden, 2005). MiRNA transcripts have been found to
undergo ADAR deamination with editing affecting Drosha processing, Dicer processing
or mRNA targeting (Kawahara et al, 2008; Kawahara et al, 2007b; Yang et al, 2006 ).
Work by Kawahara and colleagues showed that ADAR deamination of the seed region of
miR-376 alters the gene set regulated by the edited versus the unedited miRNA
(Kawahara et al, 2007b). In this work, we asked if A-to-I editing of the target mRNA,
rather than the miRNA, could impact mRNA::miRNA binding by creating seed matches.
114
We examined the previously reported 12,723 distinct ADAR editing sites (Levanon et al,
2004), and find A-to-I editing creates perfect complementarities to human miRNA seeds.
Results
Adenosine deamination creates miRNA
complementarities
ADAR-mediated conversion of adenosine to inosine allows inosine:cytosine
pairing because inosine is chemically similar and functionally equivalent to guanosine
(Figure A-1A). A well-established participant in regulating RNA:RNA interactions
through altering sequence complementarity, the preferential base pairing of inosine to
cytosine was described several decades ago in codon:anticodon interactions (Yoshida et
al). More recently, the direct ADAR deamination of a miRNA (miR-376) was found to
alter miRNA target selection (Kawahara et al, 2007b). Over 12,000 A-to-I editing sites
have been identified in human mRNAs with nearly 90% of these occurring in UTRs
(Athanasiadis et al, 2004 ; Kim et al, 2004; Levanon et al, 2004 ). Because 3’ UTRs are
widely accepted as the predominant site of miRNA:mRNA association, we asked whether
deamination of 3’ UTR A-to-I editing sites (Levanon et al, 2004 ) significantly altered
their complementarity to currently annotated human miRNAs.
Although miRNAs are generally ~21-22 nt in length, their association with target
mRNAs is typically mediated through a seven base pair (bp) interaction involving base
pairs 2-8 (5’ to 3’) of the mature miRNA (Lai, 2002). This 7 nt sequence constitutes a
miRNA “seed” and its reverse complement in a target mRNA, a “seed match” (Lewis et
al, 2005). Using a simple 7 bp seed scan of the 100 bp 5’ and 3’ of the 12,723 distinct
deamination sites (Levanon et al, 2004 ), we identified miRNA seed matches that were
created or lost. All sites were screened once with a central adenosine (unedited, lost) and
once with a central guanosine (edited, created) (Figure A-1B). Using this approach, we
identified seed matches to 30 miRNA families that were significantly enriched (p ≤
115
1.8x10-5) in sequences bearing a central G position (Table A-1 and Table A-2).
Strikingly, over 3,000 of the 12,723 sites form perfect miRNA seed complements if
deaminated. We coined these miRNA associating if deaminated (MAID) sites, and find
that most are localized to the 3’ UTR (Table A-2). While editing can also destroy sites
(not shown), we focus here on MAIDs and their ability to confer miRNA-mediated
regulation.
MiR-513 and miR-769-3p/-450b-3p
specifically target deamination sites
We first examined the greatest outliers, miR-513 and miR-769-3p/-450b-3p, in
greater detail. In the 12,723 dataset representing unedited sequences, the average number
of seed matches to miR-769-3p/-450b-3p at any 7 nt position was 0.79 (max = 4). This
strongly contrasts the 252 miR-769-3p/-450b-3p seed matches unique to the edited 3’
UTR dataset (Table A-1 and Figure A-2A). Similarly, the average number of seed
matches to miR-513 at any position was 0.63 (max = 4) vs. 257 when comparing the
unedited to the edited 3’ UTR flanking sequence and edit site. Therefore, for these
mRNAs, miR-513 and miR-769-3p/-450b-3p preferentially target deaminated sequences.
Upon closer examination, we found that ~190 of the matches to the miR-513 seed
(3’ GGACACU 5’) and miR-769-3p/-450b-3p seed (3’ CUAGGGU 5’) were created by
a single deamination within a common 12 nt motif (5’ CCUGUIAUCCCA 3’) (Figure A-
2B). Finding an invariant guanine immediately 3’ of these 12 nt, and allowing for a single
GU wobble at an adenosine or guanine immediately 3’ to the deamination site, extended
the miRNA-513/-769-3p/-450b-3p MAID to 5’ CCUGUIRUCCCAG 3’. Thus, the
simple scanning approach used identified 288 distinct sites within this 13 nt motif, which
when edited forms seed matches to miR-513 and miR-769-3p/-450b-3p (Figure A-2C).
Thus, MAIDs containing miR-513 and miR-769-3p/-450b-3p seed matches are
significantly enriched in a subset of the deamination sites originally identified by
116
Levanon et al. (Levanon et al, 2004 ) (not shown). Of note, this result was repeated using
a standalone TargetScan program (Lewis et al, 2005) without considering conservation of
seed matches as a ranking criterion.
MiR-513 and miR-769-3p repress
deaminated sequences
To test if MAID sequences could serve as miR-513 and/or miR-769-3p/-450b-3p
targets, we constructed a series of luciferase reporters possessing unedited sequences, or
‘edited’ 13 bp MAIDs specific to miR-513/miR-759-3p/-450-3p downstream of Renilla
1Expected numbers are based on seed match occurrence in the 200nt flanking each adenosine deamination.
2 miRs-513 and -769-3p are often (~80%) complementarity to the same deaminated sequences.
3 The two miR 518 family seeds AAAGCGC and AAGCGCT are complementarity to the same twenty-six deaminated sequences.
124
Table A-2. A-to-I editing occurs predominantly in noncoding regions of expressed sequences
Number of Edited Sequences
% of Total Edited Sequences
% of Edited Sequences with
miR seed matches
Total Edited Sequences
12723 100.00%
Total # Successfully
Mapped1
8014 (99.75%/0.25%)
62.99%
Total MAIDs Overall
3058 24.04% 100%
Total MAIDs Mapped2
1918 (99.64%/0.36%)
15.08% 62.72%
Total Lost Sequences Overall
2358 19.95% 100.00%
Total Lost Sequences Mapped
1605 (99.75%/0.25%)
12.61% 63.24%
1 12,723 available editing sequences (201 nt) were mapped to human transcripts (previous mapping data of these sequences were not currently available) obtained from the UCSC table browser (RefSeq hg18) using megablast (arguments: -W 196; -S 1; -F F; -p 100). Due to the highly repetitive nature of the sequences used for this analysis, positive identification required 100% identity and 100% coverage of the editing sequence. Using these criteria, ~63% of the 12,723 sequences could be mapped.
2 Sequences with miRNA sites created (MAIDs) or lost were queried for mapping to coding or noncoding regions. Of the sequences mapped to transcripts using our methods, the vast majority fell with non-coding regions (%non-coding/%coding presented in the first numerical column).
(A) A cartoon depicting adenosine, deaminated adenosine (inosine), and guanine. In some tRNAs, inosine routinely serves as a member of the anticodon where it is recognized as a guanine. (B) A characterized deamination site occurring in the 3’ UTR of DNA Fragmentation Factor α (DFFA) is shown in both an edited and unedited state. In this work, each 7 nt sequence (red) occurring within 100 nt of > 12,000 distinct deamination sites (blue) were screened against all annotated human miRNA seed sequences. The miR-513 seed (yellow) illustrates how target mRNA deamination can mediate miRNA binding. Contributed by Glen Borchert.
(A) 12,719 unique EST sequences (www.cgen.com), each consisting of a central A-to-I deamination and 100 nt flanks (i.e. n100 (A or I) n100), were screened for complementarity to human miRNAs. All human miRNA seed matches were identified within the individual 201 nt sequences originally identified as an A-to-I transition by Compugen (statistical significance is addressed in Table 2). The top two panels represent all miR-769-3p (and miR-450b-3p) seed matches occurring at each position in both the unedited (left) and edited (right) states. The lower panels represent all miR-513 seed matches occurring in unedited (left) and edited (right) states. (B) A cartoon of miR-513 and miR-769-3p / -450b-3p complementarities to a MiRNA Associating If Deaminated (MAID) site in both unedited (left) and edited (right) states is shown. Perfect seed matches to miR-769-3p / -450b-3p (blue) and miR-513 (yellow) are significantly enriched in sequences containing characterized deaminations (red). Vertical lines indicate complementary base pairing. (C) Venn diagram depicting the overlap between miR-513 and miR-769-3p / -450b-3p target sites matching the full MAID motif. Importantly, nearly 100 additional sequences are identified by allowing a single GU wobble immediately 3’ to the deamination. CCUGUIRUCCCAG. Original analysis by Glen Borchert. Reanalysis and editing by Ryan Spengler.
127
Figure A-3. miR-513 and miR-769-3p target MAIDs but not the corresponding unedited sequence.
(A) A diagram shows hairpin expression vectors and MAID reporter constructs. pAL -513 and -769-3p reporters have miR-513 and miR-769-3p hairpins downstream of the miR-517 Pol-III promoter. TAAT, TGAT, and TGGT reporters contain 3 tandem copies of the 13 bp MAID sequence in the 3’ UTR of Renilla luciferase for testing activity in the unedited (TAAT) or edited (TGAT, TGGT) states. Guanines mimicking A-to-I edits are bolded and underscored. (B) Renilla luciferase activity (normalized to firefly luciferase and presented as percent mock transfected control) following co-transfection of miR-513, miR-769-3p, pooled miR-513 and miR-769-3p inhibitors and/or control miRNA inhibitor with the indicated reporters into HEK 293 cells (n = 3) is illustrated. *, p < 0.005. Contributed by Glen Borchert and Brian Gilmore.
128
Figure A-4. Endogenous MAIDs are targets for miR-513 and miR-769-3p repression.
(A) A cartoon depicts the DFFA 3’UTR and the localization of nine distinct MAIDs (lines above the 3’UTR). (B) Alignment of the nine DFFA 3’UTR MAIDs commonly deaminated in ESTs is represented. MAID sequences are shaded. Four MAIDs contain GU wobbles from consensus (bold). (C) Alignment of DFFA_1 sequences from independent DFFA clones isolated from HEK 293 cells and NB7 cells is shown. DFFA_1 was deaminated in NB7s (bold) but not in HEK 293s. RT reactions were performed using a thermostable reverse transcriptase. (D) A diagram of DFFA 3’UTR reporter constructs is shown. In DFFA-Edited (-E) and DFFA-Unedited (-U), the Renilla 3’ UTRs are the cloned DFFA 3’UTRs from NB7 and HEK 293 cells, respectively. DFFA-E nucleotides differing from DFFA-U are bolded and underscored (compare NB7_2 and 293_1 detailed in panel (C). (E) Luciferase assays performed identically to those in 3b except for the reporter constructs illustrated (n=3). *, p < 0.005. Contributed by Glen Borchert and Brian Gilmore.
(A) Relative miR-769-3p RNA levels in HEK 293, A549, HT1080 and NB7 cell lines are shown as determined by quantitative PCR. MiR-513 was not detected in these cell lines. (B) MiR-769-3p over-expression reduces DFFA levels specifically in NB7 cells. Endogenous DFFA protein levels in NB7 and HEK293 cells were determined by western blot densitometry. The ratio of DFFA levels in NB7 / HEK293 is shown. (C) Western blot analysis of endogenous DFFA in HEK 293 and NB7 cell lysates following transfection of miR-769 as indicated. Representative blots for DFFA and β-catenin (loading control) are shown. Relative DFFA levels were calculated as band intensity ratios of DFFA to β-catenin and normalized to mock (left most bar in each graph). 400 - 400 ng miR-769 expression vector; 200 - 200 ng miR-769 expression vector; 100 - 100 ng miR-769 expression. Contributed by Glen Borchert and Brian Gilmore.
130
REFERENCES
Anderson EM, Birmingham A, Baskerville S, Reynolds A, Maksimova E, Leake D, Fedorov Y, Karpilow J, Khvorova A (2008) Experimental validation of the importance of seed complement frequency to siRNA specificity. RNA 14: 853-861
Athanasiadis A, Rich A, Maas S (2004 ) Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391
Azuma-Mukai A (2008) Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. Proc Natl Acad Sci USA 105: 7964-7969
Bass BL (2002) RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71: 817-846
Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV, Weichenrieder O, Devine SE (2008) Active Alu retrotransposons in the human genome. Genome Res 18: 1875-1883
Berezikov E, Chung WJ, Willis J, Cuppen E, Lai EC (2007) Mammalian mirtron genes. Mol Cell 28: 328-336
Birmingham A, Anderson E, Sullivan K, Reynolds A, Boese Q, Leake D, Karpilow J, Khvorova A (2007) A protocol for designing siRNAs with high functionality and specificity. Nat Protoc 2: 2068-2078
Birmingham A, Anderson EM, Reynolds A, Ilsley-Tyree D, Leake D, Fedorov Y, Baskerville S, Maksimova E, Robinson K, Karpilow J, Marshall WS, Khvorova A (2006 ) 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods 3: 199-204
131
Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19: Unit 19 10 11-21
Bohnsack MT, Czaplinski K, Gorlich D (2004) Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. Rna 10: 185-191
Borchert G, Gilmore B, Spengler R, Xing Y, Lanier W, Bhattacharya D, Davidson B (2009) Adenosine deamination in human transcripts generates novel microRNA binding sites. Hum Mol Genet 18: 4801-4807
Borchert GM, Lanier W, Davidson BL (2006 ) RNA polymerase III transcribes human microRNAs. Nat Struct Mol Biol 13: 1097-1101
Boudreau RL, Martins I, Davidson BL (2009a) Artificial MicroRNAs as siRNA Shuttles: Improved Safety as Compared to shRNAs In vitro and In vivo. Mol Ther 17: 169-175
Boudreau RL, McBride JL, Martins I, Shen S, Xing Y, Carter BJ, Davidson BL (2009b) Nonallele-specific silencing of mutant and wild-type huntingtin demonstrates therapeutic efficacy in Huntington's disease mice. Mol Ther 17: 1053-1063
Boudreau RL, Spengler RM, Davidson BL (2011) Rational Design of Therapeutic siRNAs: Minimizing Off-targeting Potential to Improve the Safety of RNAi Therapy for Huntington's Disease. Molecular Therapy 19: 2169-2177
Boudreau RL, Spengler RM, Hylock RH, Kusenda BJ, Davis HA, Eichmann DA, Davidson BL (2013) siSPOTR: a tool for designing highly specific and potent siRNAs for human and mouse. Nucleic Acids Research 41
Bovia F, Wolff N, Ryser S, Strub K (1997) The SRP9/14 subunit of the human signal recognition particle binds to a variety of Alu-like RNAs and with higher affinity than its mouse homolog. Nucleic Acids Res 25: 318-326
132
Bracken CP (2008) A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res 68: 7846-7854
Bramsen JB, Pakula MM, Hansen TB, Bus C, Langkjaer N, Odadzic D, Smicius R, Wengel SL, Chattopadhyaya J, Engels JW, Herdewijn P, Wengel J, Kjems J (2010) A screen of chemical modifications identifies position-specific modification by UNA to most potently reduce siRNA off-target effects. Nucleic Acids Res 38: 5761-5773
Burchard J, Jackson AL, Malkov V, Needham RH, Tan Y, Bartz SR, Dai H, Sachs AB, Linsley PS (2009) MicroRNA-like off-target transcript regulation by siRNAs is species specific. Rna 15: 308-315
Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB (1997) Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387: 303-308
Caffrey DR, Zhao J, Song Z, Schaffer ME, Haney SA, Subramanian RR, Seymour AB, Hughes JD (2011) siRNA off-target effects can be reduced at concentrations that match their individual potency. PLoS One 6: e21503
Calin GA (2002) Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA 99: 15524-15529
Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier O, Chinappi M, Tramontano A, Bozzoni I (2011) A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147: 358-369
Chang DY, Hsu K, Maraia RJ (1996) Monomeric scAlu and nascent dimeric Alu RNAs induced by adenovirus are assembled into SRP9/14-containing RNPs in HeLa cells. Nucleic Acids Res 24: 4165-4170
Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37: W305-311
133
Chen LL, Carmichael GG (2008) Gene regulation by SINES and inosines: biological consequences of A-to-I editing of Alu element inverted repeats. Cell Cycle 7: 3294-3301
Chi JT, Chang HY, Wang NN, Chang DS, Dunphy N, Brown PO (2003 ) Genomewide view of gene silencing by small interfering RNAs. Proc Natl Acad Sci U S A 100: 6343-6346
Chiang HR, Schoenfeld LW, Ruby JG, Auyeung VC, Spies N, Baek D, Johnston WK, Russ C, Luo S, Babiarz JE, Blelloch R, Schroth GP, Nusbaum C, Bartel DP (2010) Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24: 992-1009
Cummins JM, He Y, Leary RJ, Pagliarini R, Diaz LA, Jr., Sjoblom T, Barad O, Bentwich Z, Szafranska AE, Labourier E, Raymond CK, Roberts BS, Juhl H, Kinzler KW, Vogelstein B, Velculescu VE (2006) The colorectal microRNAome. Proc Natl Acad Sci U S A 103: 3687-3692
Davidson BL, McCray PB, Jr. (2011) Current prospects for RNA interference-based therapies. Nat Rev Genet 12: 329-340
Davis BN, Hilyard AC, Lagna G, Hata A (2008) SMAD proteins control DROSHA-mediated microRNA maturation. Nature 454: 56-61
Davis-Dusenbery BN, Hata A (2010) Mechanisms of control of microRNA biogenesis. J Biochem 148: 381-392
134
Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S, Rozado D, Magen A, Canidio E, Pagani M, Peluso I, Lin-Marq N, Koch M, Bilio M, Cantiello I, Verde R, De Masi C, Bianchi SA, Cicchini J, Perroud E, Mehmeti S, Dagand E, Schrinner S, Nürnberger A, Schmidt K, Metz K, Zwingmann C, Brieske N, Springer C, Hernandez AM, Herzog S, Grabbe F, Sieverding C, Fischer B, Schrader K, Brockmeyer M, Dettmer S, Helbig C, Alunni V, Battaini MA, Mura C, Henrichsen CN, Garcia-Lopez R, Echevarria D, Puelles E, Garcia-Calero E, Kruse S, Uhr M, Kauck C, Feng G, Milyaev N, Ong CK, Kumar L, Lam M, Semple CA, Gyenesei A, Mundlos S, Radelof U, Lehrach H, Sarmientos P, Reymond A, Davidson DR, Dollé P, Antonarakis SE, Yaspo ML, Martinez S, Baldock RA, Eichele G, Ballabio A (2011) A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol 9: e1000582
Dinger ME, Amaral PP, Mercer TR, Pang KC, Bruce SJ, Gardiner BB, Askarian-Amiri ME, Ru K, Soldà G, Simons C, Sunkin SM, Crowe ML, Grimmond SM, Perkins AC, Mattick JS (2008) Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res 18: 1433-1445
Doench JG, Sharp PA (2004) Specificity of microRNA target selection in translational repression. Genes Dev 18: 504-511
Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP (2005) The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310: 1817-1821
Fedorov Y, Anderson EM, Birmingham A, Reynolds A, Karpilow J, Robinson K, Leake D, Marshall WS, Khvorova A (2006 ) Off-target effects by siRNA can induce toxic phenotype. RNA 12: 1188-1196
Friedman RC, Farh KK, Burge CB, Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19: 92-105
Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39: D876-882
135
Garcia DM, Baek D, Shin C, Bell GW, Grimson A, Bartel DP (2011) Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18: 1139-1146
Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15: 1451-1455
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11: R86
Gregory RI (2004) The Microprocessor complex mediates the genesis of microRNAs. Nature 432: 235-240
Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34: D140-144
Grimm D, Streetz KL, Jopling CL, Storm TA, Pandey K, Davis CR, Marion P, Salazar F, Kay MA (2006 ) Fatality in mice due to oversaturation of cellular microRNA/short hairpin RNA pathways. Nature 441: 537-541
Guo L, Lu Z (2010) Global expression analysis of miRNA gene cluster and family based on isomiRs from deep sequencing data. Comput Biol Chem 34: 165-171
Han J (2004) The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18: 3016-3027
136
Han J (2006) Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125: 887-901
Hsu K, Chang DY, Maraia RJ (1995) Human signal recognition particle (SRP) Alu-associated protein also binds Alu interspersed repeat sequence RNAs. Characterization of human SRP9. J Biol Chem 270: 10179-10186
Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610-617
Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Meloon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt F, Hall J (2005 ) Design of a genome-wide siRNA library using an artificial neural network. Nat Biotechnol 23: 995-1001
Hundley HA, Krauchuk AA, Bass BL (2008) C. elegans and H. sapiens mRNAs with edited 3' UTRs are present on polysomes. RNA 14: 2050-2060
Jackson AL, Bartz SR, Schelter J, Kobayashi SV, Burchard J, Mao M, Li B, Cavet G, Linsley PS (2003 ) Expression profiling reveals off-target gene regulation by RNAi. Nat Biotechnol 21: 635-637
Jackson AL, Burchard J, Leake D, Reynolds A, Schelter J, Guo J, Johnson JM, Lim L, Karpilow J, Nichols K, Marshall W, Khvorova A, Linsley PS (2006 ) Position-specific chemical modification of siRNAs reduces "off-target" transcript silencing. RNA 12: 1197-1205
137
Jackson AL, Burchard J, Schelter J, Chau BN, Cleary M, Lim L, Linsley PS (2006) Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. Rna 12: 1179-1187
Jackson AL, Linsley PS (2010) Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat Rev Drug Discov 9: 57-67
Kaneko H, Dridi S, Tarallo V, Gelfand BD, Fowler BJ, Cho WG, Kleinman ME, Ponicsan SL, Hauswirth WW, Chiodo VA, Karikó K, Yoo JW, Lee DK, Hadziahmetovic M, Song Y, Misra S, Chaudhuri G, Buaas FW, Braun RE, Hinton DR, Zhang Q, Grossniklaus HE, Provis JM, Madigan MC, Milam AH, Justice NL, Albuquerque RJ, Blandford AD, Bogdanovich S, Hirano Y, Witta J, Fuchs E, Littman DR, Ambati BK, Rudin CM, Chong MM, Provost P, Kugel JF, Goodrich JA, Dunaief JL, Baffi JZ, Ambati J (2011) DICER1 deficit induces Alu RNA toxicity in age-related macular degeneration. Nature 471: 325-330
Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32: D493-496
Karreth FA, Tay Y, Perna D, Ala U, Tan SM, Rust AG, DeNicola G, Webster KA, Weiss D, Perez-Mancera PA, Krauthammer M, Halaban R, Provero P, Adams DJ, Tuveson DA, Pandolfi PP (2011) In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma. Cell 147: 382-395
Kawahara Y, Megraw M, Kreider E, Iizasa H, Valente L, Hatzigeorgiou AG, Nishikura K (2008) Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36: 5270-5280
Kawahara Y, Zinshteyn B, Chendrimada TP, Shiekhattar R, Nishikura K (2007a) RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep 8: 763-769
Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K (2007b) Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315: 1137-1140
138
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12: 996-1006
Khvorova A, Reynolds A, Jayasena SD (2003 ) Functional siRNAs and miRNAs Exhibit Strand Bias. Cell 115: 209-216
Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A (2004) Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14: 1719-1725
Kim YK, Kim VN (2007) Processing of intronic microRNAs. Embo J 26: 775-783
Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP (2010) Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3: ra8
Kozomara A, Griffiths-Jones S (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39: D152-157
Krutzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M (2005 ) Silencing of microRNAs in vivo with 'antagomirs'. Nature 438: 685-689
Lai EC (2002) Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation. Nat Genetics 30: 363-364
Lal A, Navarro F, Maher CA, Maliszewski LE, Yan N, O'Day E, Chowdhury D, Dykxhoorn DM, Tsai P, Hofmann O, Becker KG, Gorospe M, Hide W, Lieberman J (2009) miR-24 Inhibits cell proliferation by targeting E2F2, MYC, and other cell-cycle genes via binding to "seedless" 3'UTR microRNA recognition elements. Mol Cell 35: 610-625
139
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ (2001 ) Initial sequencing and analysis of the human genome. Nature 409: 860-921
Lehnert S, Van Loo P, Thilakarathne PJ, Marynen P, Verbeke G, Schuit FC (2009) Evidence for co-evolution between human microRNAs and Alu-repeats. PLoS ONE 4: e4456
Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22: 1001-1005
140
Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004 ) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22: 1001-1005
Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120: 15-20
Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction of mammalian microRNA targets. Cell 115: 787-798
Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM (2009) Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324: 1210-1213
Liang H, Landweber LF (2007) Hypothesis: RNA editing of microRNA target sites in humans? Rna 13: 463-467
Luciano DJ, Mirsky H, Vendetti NJ, Maas S (2004) RNA editing of a miRNA precursor. RNA 10: 1174-1177
Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U (2004) Nuclear export of microRNA precursors. Science 303: 95-98
Ma Y, Creanga A, Lum L, Beachy PA (2006) Prevalence of off-target effects in Drosophila RNA interference screens. Nature 443: 359-363
Macrae IJ (2006) Structural basis for double-stranded RNA processing by Dicer. Science 311: 195-198
Martin JN, Wolken N, Brown T, Dauer WT, Ehrlich ME, Gonzalez-Alegre P (2011) Lethal toxicity caused by expression of shRNA in the mouse striatum: implications for therapeutic design. Gene Ther
141
Martí E, Pantano L, Bañez-Coronel M, Llorens F, Miñones-Moyano E, Porta S, Sumoy L, Ferrer I, Estivill X (2010) A myriad of miRNA variants in control and Huntington's disease brain regions detected by massively parallel sequencing. Nucleic Acids Res 38: 7219-7235
Matveeva O, Nechipurenko Y, Rossi L, Moore B, Saetrom P, Ogurtsov AY, Atkins JF, Shabalina SA (2007) Comparison of approaches for rational siRNA design leading to a new efficient and transparent method. Nucleic Acids Res 35: e63
McBride JL, Boudreau RL, Harper SQ, Staber PD, Monteys AM, Martins I, Gilmore BL, Burstein H, Peluso RW, Polisky B, Carter BJ, Davidson BL (2008) Artificial miRNAs mitigate shRNA-mediated toxicity in the brain: Implications for the therapeutic development of RNAi. Proc Natl Acad Sci U S A 105: 5868-5873
McBride JL, Pitzer MR, Boudreau RL, Dufour B, Hobbs T, Ojeda SR, Davidson BL (2011) Preclinical safety of RNAi-mediated HTT suppression in the rhesus macaque as a potential therapy for Huntington's disease. Mol Ther 19: 2152-2162
Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS (2008) Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A 105: 716-721
Miller V, Gouvion C, Davidson B, Paulson H (2004 ) Targeting Alzheimer's disease genes with RNA interference: an efficient strategy for silencing mutant allele. Nucleic Acids Res 32: 661-668
Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure TM, Luo B, Grenier JK, Carpenter AE, Foo SY, Stewart SA, Stockwell BR, Hacohen N, Hahn WC, Lander ES, Sabatini DM, Root DE (2006) A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124: 1283-1298
Monteys AM, Spengler RM, Wan J, Tecedor L, Lennox KA, Xing Y, Davidson BL (2010) Structure and activity of putative intronic miRNA promoters. Rna-a Publication of the Rna Society 16: 495-505
Naito Y, Yamada T, Ui-Tei K, Morishita S, Saigo K (2004 ) siDirect: highly effective, target-specific siRNA design software for mammalian RNA interference. Nucleic Acids Res 32: W124-129
142
Newman MA, Thomson JM, Hammond SM (2008) Lin-28 interaction with the let-7 precursor loop mediates regulated microRNA processing. RNA 14: 1539-1549
Ng L, Bernard A, Lau C, Overly CC, Dong HW, Kuan C, Pathak S, Sunkin SM, Dang C, Bohland JW, Bokil H, Mitra PP, Puelles L, Hohmann J, Anderson DJ, Lein ES, Jones AR, Hawrylycz M (2009) An anatomic gene expression atlas of the adult mouse brain. Nat Neurosci 12: 356-362
Nielsen CB, Shomron N, Sandberg R, Hornstein E, Kitzman J, Burge CB (2007) Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 13: 1894-1910
Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC (2007) The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130: 89-100
Osenberg S, Dominissini D, Rechavi G, Eisenberg E (2009) Widespread cleavage of A-to-I hyperediting substrates. RNA 15: 1632-1639
Packer AN, Xing Y, Harper SQ, Jones L, Davidson BL (2008) The bifunctional microRNA miR-9/miR-9* regulates REST and CoREST and is downregulated in Huntington's disease. J Neurosci 28: 14341-14346
Piriyapongsa J, Jordan IK (2007) A family of human microRNA genes from miniature inverted-repeat transposable elements. PLoS One 2: e203
Piriyapongsa J, Marino-Ramirez L, Jordan IK (2007) Origin and evolution of human microRNAs from transposable elements. Genetics 176: 1323-1337
Piskounova E, Polytarchou C, Thornton JE, LaPierre RJ, Pothoulakis C, Hagan JP, Iliopoulos D, Gregory RI (2011) Lin28A and Lin28B inhibit let-7 microRNA biogenesis by distinct mechanisms. Cell 147: 1066-1079
143
Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP (2010) A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465: 1033-1038
Provost P, Dishart D, Doucet J, Frendewey D, Samuelsson B, Radmark O (2002 ) Ribonuclease activity and RNA binding of recombinant human Dicer. The EMBO Journal 21: 5864-5874
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501-504
Ray DA, Batzer MA (2005) Tracking Alu evolution in New World primates. BMC Evol Biol 5: 51
Rinn JL, Chang HY (2012) Genome regulation by long noncoding RNAs. Annu Rev Biochem 81: 145-166
Saito K, Ishizuka A, Siomi H, Siomi MC (2005) Processing of pre-microRNAs by the Dicer-1-Loquacious complex in Drosophila cells. PLoS Biol 3: e235
Scadden AD (2005) The RISC subunit Tudor-SN binds to hyper-edited double-stranded RNA and promotes its cleavage. Nat Struct Mol Biol 12: 489-496
Schirle NT, MacRae IJ (2012) The crystal structure of human Argonaute2. Science 336: 1037-1040
Schultz N, Marenstein DR, De Angelis DA, Wang WQ, Nelander S, Jacobsen A, Marks DS, Massague J, Sander C (2011) Off-target effects dominate a large-scale RNAi screen for modulators of the TGF-beta pathway and reveal microRNA regulation of TGFBR2. Silence 2: 3
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115: 199-208
144
Semizarov D, Frost L, Sarthy A, Kroeger P, Halbert DN, Fesik SW (2003 ) Specificity of short interfering RNA determined through gene expression signatures. Proc Natl Acad Sci U S A 100: 6347-6352
Shin C, Nam JW, Farh KK, Chiang HR, Shkumatava A, Bartel DP (2010) Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38: 789-802
Sigoillot FD, Lyman S, Huckins JF, Adamson B, Chung E, Quattrochi B, King RW (2012) A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. Nat Methods 9: 363-366
Smalheiser NR, Torvik VI (2005) Mammalian microRNAs derived from genomic repeats. Trends Genet 21: 322-326
Smalheiser NR, Torvik VI (2006a) Alu elements within human mRNAs are probable microRNA targets. Trends Genet 22: 532-536
Smalheiser NR, Torvik VI (2006b) Complications in mammalian microRNA target prediction. Methods Mol Biol 342: 115-127
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101: 6062-6067
Sumazin P, Yang X, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, Califano A (2011) An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147: 370-381
Tan GS, Garchow BG, Liu X, Yeung J, Morris JP, Cuellar TL, McManus MT, Kiriakidou M (2009) Expanded RNA-binding activities of mammalian Argonaute 2. Nucleic Acids Res 37: 7533-7545
145
Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, Lieberman J, Rigoutsos I, Pandolfi PP (2011) Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell 147: 344-357
Vaish N, Chen F, Seth S, Fosnaugh K, Liu Y, Adami R, Brown T, Chen Y, Harvie P, Johns R, Severson G, Granger B, Charmley P, Houston M, Templin MV, Polisky B (2011) Improved specificity of gene silencing by siRNAs containing unlocked nucleobase analogs. Nucleic Acids Res 39: 1823-1832
Vasudevan S, Tong Y, Steitz JA (2007) Switching from repression to activation: microRNAs can up-regulate translation. Science 318: 1931-1934
Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y (2006) An accurate and interpretable model for siRNA efficacy prediction. BMC Bioinformatics 7: 520
Wahlstedt H, Daniel C, Enstero M, Ohman M (2009) Large-scale mRNA sequencing determines global regulation of RNA editing during brain development. Genome Res 19: 978-986
Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43: 904-914
Wang X, Wang X, Varma RK, Beauchamp L, Magdaleno S, Sendera TJ (2009) Selection of hyperfunctional siRNAs with improved potency and specificity. Nucleic Acids Res 37: e152
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35: D5-12
Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, 3rd, Su AI (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10: R130
146
Yang JH, Shao P, Zhou H, Chen YQ, Qu LH (2010) deepBase: a database for deeply annotating and mining deep sequencing data. Nucleic Acids Res 38: D123-130
Yang W (2006) Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nature Struct Mol Biol 13: 13-21
Yang W, Chendrimada TP, Wang Q, Higuchi M, Seeburg PH, Shiekhattar R, Nishikura K (2006 ) Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13: 13-21
Yi R, Doehle BP, Qin Y, Macara IG, Cullen BR (2005) Overexpression of exportin 5 enhances RNA interference mediated by short hairpin RNAs and microRNAs. RNA 11: 220-226
Yoshida M, Kaziro Y, Ukita T (1968) The modification of nucleosides and nucleotides. X. Evidence for the important role of inosine residue in codon recognition of yeast alanine tRNA. Biochim Biophys Acta 166: 646-655
Zhang XD, Santini F, Lacson R, Marine SD, Wu Q, Benetti L, Yang R, McCampbell A, Berger JP, Toolan DM, Stec EM, Holder DJ, Soper KA, Heyse JF, Ferrer M (2011) cSSMD: assessing collective activity for addressing off-target effects in genome-scale RNA interference screens. Bioinformatics 27: 2775-2781