(A) Streamlined Protocol Abstract High Throughput Single-stranded Plasma DNA Sequencing Using Thermostable Group II Intron Reverse Transcriptase Douglas C. Wu and Alan M. Lambowitz Institute of Cellular and Molecular Biology and Department of Molecular Biosciences, The University of Texas at Austin (B) Recovers Broad Size Range of DNA fragments (C) Nucleosome Positionings (D) Cells origin inferred by transcription factor footprinting Conclusion • TGIRTs enable efficient ssDNA-seq that can be use for analysis of cfDNA in human plasma and other bodily fluids. • Identification of protein binding features of cfDNA that provide information about the tissue of origin and have potential diagnostic applications • Enables effective library construction from degraded samples, bisulfite-treated DNA and ancient DNA Reference 1. Snyder et al., Cell 2016 2. Sun et al., PNAS 2015 3. Butler et al., PLoS One 2015 NIH grants GM37949 and GM37951 and Welch Foundation Grant F-1607 Thermostable group II intron reverse transcriptase (TGIRT) enzymes and methods for their use are the subject of patents and patent applications that have been licensed by the University of Texas at Austin and East Tennessee State University to InGex, LLC. A.M.L. and the University of Texas are minority equity holders in InGex, LLC, and A.M. L. and other present and former Lambowitz laboratory members receive royalty payments from sales of TGIRT enzymes and licensing of intellectual property. High-throughput DNA sequencing (DNA-seq) of cell-free DNA (cfDNA) in plasma and other bodily fluids is a powerful method for non-invasive prenatal testing and diagnosis (i.e., liquid biopsy) of cancer and other diseases. In healthy individuals, cfDNA in human plasma consists largely of ~160-bp DNA fragments derived from nucleosomes released by apoptosis of lymphoid and myeloid cells in blood. By contrast, in a variety of pathological conditions, plasma is enriched in DNA fragments released from dying cells in the affected tissues and can be identified by tissue-specific differences in nucleosome spacing, transcription factor (TF) occupancy 1 , and DNA methylation sites 2 , thereby providing diagnostic information. In cancer patients, a significant proportion of cfDNA originates from tumor cells and retains epigenetic features of the tumor tissue type (3-93% in one study depending on the type of cancer) 3 . These features of cfDNAs are best analyzed by single-stranded DNA sequence (ssDNA-seq), which is better suited for the analysis of fragmented and nicked DNA than are conventional dsDNA-seq methods 1 . However, established methods for ssDNA-seq, which were originally developed for the analysis of ancient DNA, are inefficient, costly and time-consuming 4 . Here, we present an efficient method for ssDNA-seq, which takes advantage of the beneficial properties of thermostable group II intron reverse transcriptases (TGIRTs) and the use of this method for the analysis of cfDNA in human plasma. By exploiting a novel template- switching activity of TGIRTs to add DNA-seq adaptors, DNA-seq libraries can be constructed from small amounts of starting material in <2 h. In future research, we will use this method for the analysis of clinical samples to develop new cfDNA-based diagnostic procedures. 4. Gansauge and Meyer, Nat Prot 2013 5. Thurman et al., Nature 2012 6. Qin et al., RNA 2016 (3) Template-switching by TGIRT Alkaline treatment cDNA clean-up (4) Adaptor ligation by thermostable 5’ AppDNA/RNA ligase R2 RNA 3’-Blocker 5’ 5’ 3’-N R2R DNA 5’ 3’OH TGIRT cDNA clean-up (5) PCR amplification 5’-App 3’-Blocker 5’ 3’ R1R R2R 5’ 3’ R2R R2 P5 3’ 5’ Barcode+P7 R1R R1 ’ 5’ 5’ P DNA nick P (2) Dephosphorylation & denaturation 5’ 3’ OH 5’ 3’ OH (1) Plasma DNA Target DNA (-) 5’ 5’ 5’ 3’ OH 3’ OH 3’ OH 5’ 3’ OH 5’ 3’ OH Target DNA (-) Target DNA (+) Target DNA (+) Target DNA (-) 20 min 20 min 60 min ~2 ng Grant Support and Conflict-of-interest statement ssDNA−seq (ref 1) TGIRT−seq 0.00 0.25 0.50 0.75 1.00 Blood Blood vessel Heart Kidney Lung Muscle Prostate ssDNA−seq (ref 1) TGIRT−seq 0.0 0.5 1.0 1.5 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 Insert Size (bp) Percent reads −2e+05 0e+00 2e+05 −50000 0 50000 Long (120−180 bp) Short (35−80 bp) −1000 −800 −600 −400 −200 0 200 400 600 800 1000 −20000 0 20000 40000 −2000 0 2000 4000 Long (120−180 bp) Short (35−80 bp) −1000 −800 −600 −400 −200 0 200 400 600 800 1000 Distance to CTCF start site (bp) Adjusted WPS 0.000 0.002 0.004 0.006 −500 −450 −400 −350 −300 −250 −200 −150 −100 −50 0 50 100 150 200 250 300 350 400 450 500 Distance to the Nearest Peak (bp) [ssDNA−seq (ref 1) & TGIRT−seq] Desnity 0.000 0.005 0.010 0.015 100 200 300 400 500 Distance to the Nearest Peak (bp) Fraction of Peaks ssDNA−seq (ref 1) TGIRT−seq