Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single-Stranded DNA-seq of Cell-Free DNA in Human Plasma

(A) Streamlined Protocol

Abstract

High Throughput Single-stranded Plasma DNA Sequencing Using Thermostable Group II Intron Reverse Transcriptase

Douglas C. Wu and Alan M. Lambowitz Institute of Cellular and Molecular Biology and Department of Molecular Biosciences, The University of Texas at Austin

(B) Recovers Broad Size Range of DNA fragments

(C) Nucleosome Positionings

(D) Cells origin inferred by transcription factor footprinting

Conclusion•TGIRTs enable efficient ssDNA-seq that can be

use for analysis of cfDNA in human plasma and other bodily fluids.

•Identification of protein binding features of cfDNA that provide information about the tissue of origin and have potential diagnostic applications

•Enables effective library construction from degraded samples, bisulfite-treated DNA and ancient DNA

Reference1. Snyder et al., Cell 20162. Sun et al., PNAS 20153. Butler et al., PLoS One 2015

NIH grants GM37949 and GM37951 and Welch Foundation Grant F-1607

Thermostable group II intron reverse transcriptase (TGIRT) enzymes and methods for their use are the subject of patents and patent applications that have been licensed by the University of Texas at Austin and East Tennessee State University to InGex, LLC. A.M.L. and the University of Texas are minority equity holders in InGex, LLC, and A.M. L. and other present and former Lambowitz laboratory members receive royalty payments from sales of TGIRT enzymes and licensing of intellectual property.

High-throughput DNA sequencing (DNA-seq) of cell-free DNA (cfDNA) in plasma and other bodily fluids is a powerful method for non-invasive prenatal testing and diagnosis (i.e., liquid biopsy) of cancer and other diseases. In healthy individuals, cfDNA in human plasma consists largely of ~160-bp DNA fragments derived from nucleosomes released by apoptosis of lymphoid and myeloid cells in blood. By contrast, in a variety of pathological conditions, plasma is enriched in DNA fragments released from dying cells in the affected tissues and can be identified by tissue-specific differences in nucleosome spacing, transcription factor (TF) occupancy1, and DNA methylation sites2, thereby providing diagnostic information. In cancer patients, a significant proportion of cfDNA originates from tumor cells and retains epigenetic features of the tumor tissue type (3-93% in one study depending on the type of cancer)3. These features of cfDNAs are best analyzed by single-stranded DNA sequence (ssDNA-seq), which is better suited for the analysis of fragmented and nicked DNA than are conventional dsDNA-seq methods1. However, established methods for ssDNA-seq, which were originally developed for the analysis of ancient DNA, are inefficient, costly and time-consuming4. Here, we present an efficient method for ssDNA-seq, which takes advantage of the beneficial properties of thermostable group II intron reverse transcriptases (TGIRTs) and the use of this method for the analysis of cfDNA in human plasma. By exploiting a novel template-switching activity of TGIRTs to add DNA-seq adaptors, DNA-seq libraries can be constructed from small amounts of starting material in <2 h. In future research, we will use this method for the analysis of clinical samples to develop new cfDNA-based diagnostic procedures.

4. Gansauge and Meyer, Nat Prot 2013 5. Thurman et al., Nature 20126. Qin et al., RNA 2016

(3) Template-switching by TGIRT

Alkaline treatment cDNA clean-up

(4) Adaptor ligation bythermostable 5’ AppDNA/RNA ligase

R2 RNA 3’-Blocker

5’

5’

3’-N R2R DNA

5’ 3’OH

TGIRT

cDNA clean-up

(5) PCR amplification

5’-App 3’-Blocker 5’ 3’ R1R R2R

5’

3’ R2R

R2

P53’

5’

Barcode+P7

R1R

R1

’

5’5’P

DNA nick

P

(2) Dephosphorylation &denaturation

5’ 3’ OH5’3’ OH

(1) Plasma DNA

Target DNA (-)

5’5’5’ 3’ OH

3’ OH3’ OH5’ 3’ OH

5’ 3’ OH

Target DNA (-)

Target DNA (+)

Target DNA (+)

Target DNA (-)

Figure 1: Overview of TGIRT-seq on plasma DNA.

2

20 min

20 min

60 min

~2 ng

Grant Support and Conflict-of-interest statement

ssDNA−seq(ref 1)

TGIRT−seq

0.00 0.25 0.50 0.75 1.00

BloodBlood vessel

HeartKidney

LungMuscle

Prostate

ssDNA−seq (ref 1) TGIRT−seq

0.00.51.01.5

50100150200250300350400 50100150200250300350400Insert Size (bp)

Per

cent

rea

ds

−2e+05

0e+00

2e+05

−50000

0

50000

Long (120−180 bp)S

hort (35−80 bp)

−1000−800−600−400−200 0

2004006008001000

−20000

0

20000

40000

−2000

0

2000

4000Long (120−180 bp)

Short (35−80 bp)

−1000−800−600−400−200 0

2004006008001000

Distance to CTCF start site (bp)

Adj

uste

d W

PS

0.000

0.002

0.004

0.006

−500

−450

−400

−350

−300

−250

−200

−150

−100 −50 0 50 100

150

200

250

300

350

400

450

500

Distance to the Nearest Peak (bp) [ssDNA−seq (ref 1) & TGIRT−seq]

Des

nity

0.000

0.005

0.010

0.015

100 200 300 400 500Distance to the Nearest Peak (bp)

Frac

tion

of P

eaks

ssDNA−seq (ref 1)TGIRT−seq

Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single-Stranded DNA-seq of Cell-Free DNA in Human Plasma

Science