Informative Priors on Fetal Fraction Increase Power of The Noninvasive Prenatal Screen Hanli Xu 1,8 PhD, Shaowei Wang 2,8,9 MD, Lin-Lin Ma 2 MD, Shuai Huang 2 MD, Lin Liang 2 MD, Qian Liu 3 PhD, Yang-Yang Liu 3 BS, Ke-Di Liu 3,4 PhD, Ze-Min Tan 3 PhD, Hao Ban 5,6 MD, Yongtao Guan 5,6,7,9 PhD, and Zuhong Lu 1,9 PhD 1 Department of Biomedical Engineering, Southeast University 2 Department of Obstetrics and Gynecology, Beijing Hospital 3 Beijing USCI Medical Laboratory 4 Department of Biochemistry & Cambridge Systems Biology Centre, University of Cambrdige 5 USDA/ARS Children’s Nutrition Research Center 6 Department of Pediatrics, Baylor College of Medicine 7 Department of Molecular and Human Genetics, Baylor College of Medicine 8 These authors contribute equally. 9 Correspondence: SW: w [email protected], YG: [email protected], ZL: [email protected]Mailing address of YG: 1100 Bates Room 2070, Houston TX 77030 Tel: 713-798-0362, Fax: 713-798-7098 Running title: Informative priors increase power of the NIPS 1
29
Embed
Informative Priors on Fetal Fraction Increase Power of The ... · Informative Priors on Fetal Fraction Increase Power of The Noninvasive Prenatal Screen ... An euploid control sample
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Informative Priors on Fetal Fraction Increase Power of
(b) Observed and fitted density of fetal fractions.
Figure 1: (a) Plots shows both X and Y chromosome contribute to fetal fraction estimates, asexpected, and the Y is more informative than the X. (b) Plot shows histogram of the observed fetalfractions and fitted Beta density (solid line) and LogN density (dashed line). The x-axis is on thelog10 scale.
Figure 2: Observed fetal fraction vs it posterior mean for di↵erent priors. Left panel is for Betaprior and right panel LogN. The fetal fractions were plotted on the log10 scale for clarity. On bothpanel, dash lines were obtained from priors g⇤1, while solid lines were obtained from priors g⇤2.
0 20 40 60 80 100
020
4060
8010
0
Positives (original dataset)
Posi
tives
(dow
n−sa
mpl
ed d
atas
et)
0 20 40 60 80 100
020
4060
8010
0
Positives (original dataset)
Neg
ative
s (d
own−
sam
pled
dat
aset
)
Figure 3: Power comparison between Bayes factor (solid line) and Z score (dashed line). Leftpanel plots counts of positives in the original datasets vs counts of positives in the down-sampleddatasets that are also positives in original dataset. Right panel plots counts of postives in theoriginal datasets vs counts of negatives in the down-sampled datasets but positives in originaldataset being positive.
15
Figure 4: Z values, fetal fractions, and Bayes factors. On the x-axis is the Z value, and y-axis thefetal fraction. The points whose log 10 Bayes factors are > 1 were colored in black and their sizesare proportional to log 10 Bayes factor, and the rest were colored in lightgray and in same size.Points that are associated with chromosomes 13, 18, and 21 are marked in blue, green, and red.
16
0.00 0.05 0.10 0.15 0.20 0.25
0.00
0.05
0.10
0.15
0.20
0.25
●●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●●●
●●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●● ●● ●●
●
Fetal fraction from sex chromosome
Feta
l fra
ctio
n fro
m a
neup
loid
y
Figure 5: Annals of Z-test positive calls. Each dot correspond to a significant Z test (Z-score > 4),whose x-axis is fetal fraction estimated from sex chromosome, y-axis is fetal fraction estimated fromtrisomy, and whose size is proportional to its log10 Bayes factor. Gray dots are putative femalefetus. Black dots are putative male fetus. See main text for description of dots in triangle, circle,square, and diamond.
17
Informative Priors on Fetal Fraction Increase Power of
Hao Ban5,6 MD, Yongtao Guan5,6,7,9 PhD, and Zuhong Lu1,9 PhD
1 Department of Biomedical Engineering, Southeast University2 Department of Obstetrics and Gynecology, Beijing Hospital3 Beijing USCI Medical Laboratory4 Department of Biochemistry & Cambridge Systems Biology Centre, University of Cambrdige5 USDA/ARS Children’s Nutrition Research Center6 Department of Pediatrics, Baylor College of Medicine7 Department of Molecular and Human Genetics, Baylor College of Medicine8 These authors contribute equally.9 Correspondence: SW: [email protected], YG: [email protected], ZL: [email protected]
Patients were enrolled, their peripheral blood were drawn, and cell-free DNA were extracted andsequenced (Patients and Data). The study was approved by the institutional review board ofthe Beijing Hospital, Beijing, China. We performed routine quality control for sequencing reads,mapped reads to the reference genome HG19, and retained uniquely mapped reads for furtheranalysis (Reads quality control). After quality control, we had 3405 patients who have more than4M uniquely mapped reads. Because the coverage was low (0.1X), we binned reads for statisticalprocessing before estimating the chromosomal dosages. We examined the coe�cients of variation ofdi↵erent bin sizes (Figure S6), and chose 100Kb as the optimal bin size, which is a balance betweenthe number of bins and the coe�cients of variation (Optimal bin size). We used overlapping binswith two adjacent bins overlapped by 50Kb. The full set of bins is the union of two subsets ofnon-overlapping bins, and we found the overlapping bins improve the stability of the estimates ofchromosomal dosages. Out of concern of reference biases, we developed a hidden Markov model(HMM) to detect and remove bins that suggest putative copy number variation (CNV) at thepopulation level (Remove maternal CNV). We then applied HMM separately for each sample oneach chromosome, and marked those bins that indicate maternal CNV, which were removed fromcomputing chromosomal dosages. (Figure S7 demonstrated an e↵ective detection of a maternalCNV which caused a false positive.) The remaining bins were normalized by their mean coverageto obtain bin dosages. The chromosomal dosage is the average of bin dosages on the chromosome.
We controlled for GC bias and chromosomal bias before estimating chromosomal dosages. Tothis end we only processed autosomes and left out the sex chromosomes. Unlike autosomes, the bindosages on the sex chromosomes depend on fetus’s gender and the fetal fraction even for euploidfetuses. In our data, the bin dosages correlate with bin GC contents in a nonlinear fashion (Fig-ure S8), which makes it ine↵ective to use linear regression to account for GC bias. We applied thesmoothing spline method [1] to e↵ectively remove GC bias (Account for GC bias). For each bin wecomputed mean and variance of GC-corrected bin dosages over all samples, including the possible
1
trisomy samples, and used them as the mean and variance for normal controls. Because of the lowprevalence of trisomy, using all samples as euploid controls produced little bias. For each sample,from GC-corrected bin dosages we regressed out the mean using the weighted linear regression,with the inverse of the variance as the weight. The residuals were used to compute the centeredchromosomal dosages by averaging the residuals over all bins on the chromosome, separately foreach individual. Finally we computed the SSD of each autosome using the centered chromosomaldosages of that chromosome from all samples. For each sample and each autosome, we obtained acentered chromosomal dosage (denoted by x) and the SSD (denoted by �) to test for trisomy.
Patients and Data
Beginning March 15, 2016, we started to enroll pregnant women who were undergoing routineobstetrical care at Beijing Hospital. The institutional review board of the Beijing Hospital approvedthe study. All experiments were performed in accordance with relevant guidelines and regulations.Written informed consent was obtained from all patients. To be eligible for the study, pregnantwomen must be at least 18 years of age and had to be carrying a fetus with a gestational ageof at least 8 weeks. All patients took NIPS test at Beijing Scisoon Medical Laboratory, a partnerclinical test laboratory of Beijing Hospital that is accredited by Beijing Health and Family PlanningCommission. Study inclusion also required accessibility to pregnancy and delivery records, andnewborn physical examination. At enrollment 10 ml peripheral venous blood sample was drawnin a cell-free DNA blood-collection tube. The tube was deidentified and labeled with a uniquebar-code. Samples were shipped the same day to the Beijing Scisoon Medical Laboratory.
Upon receipt, samples were inspected and cell-free plasma was isolated via a double centrifu-gation process of 1600 X g for 10 min, followed by 16000 X g for 10 min. Plasma sample wasfrozen at �80�C until cell-free DNA extraction and sequencing. Cell-free DNA was extracted from600 ul plasma using the Magnetic Serum/Plasma Circulating DNA kit (TIANGEN) according tothe manufacturer’s instructions. Library preparation was processed using NEXTflex Rapid DNASeq Kit (BIOO). After accurate quantification of the number of adatpter-liagted molecules usingKAPA Library Quantification Kits, libraries were pooled and sequenced using Illumine NextSeq500 system according to the manufacturer’s standard protocols. Single end reads of 75-bp wereobtained, and the target coverage is 0.1X.
Reads quality control
Reads were removed the PCR duplicates. A read that contains more than one N were removed.A read is also removed if it contains a consecutive five nucleotides having average Phred score lessthan 20. The remaining reads were mapped to the reference genome (HG19) using BWA, allowingmaximum one mismatch. Only reads that are uniquely mapped were retained for further analysis.A sample passing QC requires minimum of 4M reads that pass QC, although some sample mayhave as many as 10M reads. In our down-sampling experiment, we randomly (uniformly) sample3M reads.
Optimal bin size
The reference genome was divided into 1 Kb non-overlapping bins. Bins contain one or more Nwere removed. We obtained the basepair coverage, and computed the bin coverage by summing thebasepair coverage in the bin. Bins of zero coverage were removed. For each sample, we normalizethe bin coverages using their mean. The normalized bin coverage is called bin dosage. Using allsamples, we computed the mean bin dosage for each bin. We want to determine the optimal bin
2
size using 1 Kb bins as building block. The goal is to balance between the bin size and the numberof bins. We tried bin sizes of 2K, 5K, 10K, 20K, 50K, 100K, 200K, 500K, and 1000K. For eachtarget bin size, we averaged the adjacent 1Kb building-block bins to obtained bin dosages for thenew bins, and computed the coe�cient of variation (the ratio of the standard deviation to themean).
Remove maternal CNV
Let {xm} be a sequence of centered bin dosages for a chromosome. If a bin harbors a maternalCNV, its dosage changes from 0 to 1�h in case of maternal copy number gain of 1 or h� 1 in caseof maternal copy loss of 1, where h is the fetal fraction and usually is < 0.20. We detect such binsusing a hidden Markov model (HMM). The HMM presented here is a simplified version of a previousmodel [2]. The latent variable at each bin consists of three states, copy number loss, neutral, andgain, denoted by -1, 0, and 1 respectively. The latent states along each chromosome form a Markovchain with transition matrix as P (Zm+1 = km+1|Zm = km) = r↵k
m
+ (1� r)1km+1=k
m
, where 1· isthe indicator function, r is the probability such as 1/r is the mean length (in number of bins) of thematernal CNV, and ↵k (for k = �1, 0, 1) are fractions of bins that have copy number loss, neutral,
or gain. And P (Z1 = k) = ↵k. The emission is modeled as P (xm|Zm = k) = 1p2⇡�
k
e� 1
2(x
m
�µ
k
)2
�
2k for
k = �1, 0, 1.We fit the HMM using expectation maximization (EM) algorithm. The forward probabilities
can be computed recursively as
F (m+ 1, Zm+1) = P (x1, . . . , xm, xm+1, Zm+1)
= P (xm+1|Zm+1)X
Zm
F (m,Zm)P (Zm+1|Zm), (1)
with F (1, Z1) = P (x1|Z1). The backward probabilities can be computed recursively as
B(m,Zm) = P (xL, . . . , xm+1|Zm)
=X
Zm+1
P (xm+1|Zm+1)B(m+ 1, Zm+1)P (Zm+1|Zm), (2)
where L is the number of bins, and B(L,ZL) = 1. Then we can compute the posterior probabilitiesof the latent states
zmk = P (Zm = k|x1, . . . , xL) / F (m,Zm = k)B(m,Zm = k). (3)
We can update ↵k =P
m
zmkP
m,k
zmk
. To update µk and �k, we can compute xmzmk for each k to obtain a
L-vector, and then compute the mean of the L-vector as the estimate of µk and compute the SSDthe L-vector as the estimate of �k. We found that using µk = k for k = �1, 0, 1 is convenient andsu�cient because maternal CNV produce the change bin dosages at the magnitude of 1. Finally,we need to specify r. After trial and error, we found r = 0.001 worked well.
Account for GC bias
We first rank all bins on autosome by their GC contents. Then we smooth the bin coverage usingsmooth spline method on the reordered bins, and subtracted the smoothed value o↵ the originalbin dosages. Finally we restore bins to their original order. The processed bin dosages are called
3
GC-corrected bin dosages. Di↵erent chromosomal regions have di↵erent baseline coverages. Whileaccounting for GC bias mitigates the baseline di↵erence, it is far from removing it. We thereforecomputed, for each bin, the mean (m) and variance v of GC-corrected bin dosages over all samples,and regressed out m from GC-corrected bin dosages using weighted linear regression with v asweights.
Infer fetal fraction from sex chromosome dosages
Let h be the fetal fraction, then for a male fetus the expected X chromosome dosage is (2�h) and theexpected Y chromosome dosage is h. Define ⇢ = Y-dosage/X-dosage, then we have ⇢ = h/(2� h),which leads to h = 2⇢/(1 + ⇢). Note when 0 ⇢ < 1, we have 0 h < 1.
Fit probability densities to fetal fractions
The inferred fetal fractions were divided into two groups using the K-means method. Using K-means method is to avoid arbitrariness in specifying a threshold, although visionally identify athreshold will not change the results much. The threshold inferred by the K-means method is0.028. One group whose fetal fractions < 0.028 are putative female fetuses, and the other groupputative male fetuses. The fetal fractions of putative female fetus were fit to a density g0(h), thosefrom male fetus were fit to g1(h), and all fetal fractions were fit to g2(h). After visual inspection,we chose Beta and Log-Normal (LogN) distributions to fit g0, g1, and g2. The parameters weredetermined by matching the first two moments between analytical and empirical distributions. Wearrived at gB0 (h) = Beta(h; 44.6, 4628.4), gB1 (h) = Beta(h; 6.3, 65.2), gL0 (h) = LogN(h;�4.66, 0.13),gL1 (h) = LogN(h;�2.49, 0.36), gB2 (h) = Beta(h; 1.1, 20.8), and gL2 (h) = LogN(h;�3.58, 1.12).
Compute Bayes factors
Under the null H0 : h ⇠ f0(h), and P (D|H0) =RR P (D|h)f0(h)dh. Under the alternative H1 : h ⇠
f1(h), and P (D|H1) =RR P (D|h)f1(h)dh.We compute Bayes factor BF = P (D|H1)
P (D|H0). Recall our data
is a pair D = (x,�), where x is the centered chromosomal dosage, and � is the SSD of chromosomaldosages of controls samples. Treating |x| as one-sample estimate for h we have the likelihood
P (D|h) = 1p2⇡�
exp (� (|x|�h)2
2�2 ). Integrate out prior on h to obtain P (D|Hj) =RR P (D|h)fj(h)dh =
RR
1p2⇡�
exp (� (|x|�h)2
2�2 )fj(h)dh. This Bayes factor calculation can be applied to both trisomy and
monosomy alternative models.
Simulate chromosomal dosages
The goal is to simulated a pair (x, s), where x is the centered chromosomal dosage and s is SSDof euploid control samples, such that Z = x/s has a big spread and s varies in a reasonable range.We simulated z ⇠ Unif(1, 10) and s ⇠ Unif(0.003, 0.008) and obtained x = z ⇥ s. The range of s isclose to what was observed in real data (Figure S1).
References
[1] Hastie T, Tibshirani R. Statistical Science. 1986;1(3):297–318.
[2] Xu H, Guan Y. Detecting Local Haplotype Sharing and Haplotype Association. Genetics.2014;197(3):823–838.
Figure S1: Sample standard deviation of each chromosome vs number of bins in the chromosome fordi↵erent coverage. These SSDs are for autosomes, ordered by number of bins on each chromosomeon x-axis. Di↵erent shapes of points correspond to di↵erent coverages: black dots, all reads;darkgray diamonds, 4M subsampled reads; lightgray triangle, 3M subsampled reads.
5
20 25 30 35 40 45 50
0.00
0.02
0.04
0.06
0.08
0.10
maternal age
Den
sity
Figure S2: Distribution of the maternal age.
6
0 5 10 15 20
05
1015
20
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
Null Prior: Beta(44.6, 4628.4)
Shar
p N
ull
Figure S3: Comparision of Bayes factors between an informative null and the sharp null. Theinformative null is Beta(x; 44.6, 4628.4) on x-axis, and sharp null on y-axis. The numbers are onthe log10 scale.
7
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
0 5 10 15
05
1015
LogN null prior
Beta
nul
l prio
r
Figure S4: Comparision of Bayes factors between two choices of informative priors under thenull. The prior for the alternative model is Beta(x; 1.1, 21.8). The priors for the null model areBeta(x; 44.6, 4628.4) on x-axis and LogN(x;�4.66, 0.13) on y-axis. The numbers are on the log10scale.
8
−5 0 5 10 15
−50
510
15
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
Alternative prior: Beta(6.3,65.2)
Alte
rnat
ive p
rior:
Beta
(1.1
, 20.
8)
Figure S5: Comparision of Bayes factors between two choices of priors under the alternative.The prior for the null model is Beta(x; 44.6, 4628.4). The priors for the alternative model areBeta(x; 6.3, 65.2) on x-axis and Beta(x; 1.1, 20.8) on y-axis. The numbers are on the log10 scale.
9
●
●
●
●
●
●
●
●
●
●
0 200 400 600 800 1000
0.04
0.06
0.08
0.10
bin size in kb
coef
ficie
nts
of v
aria
tion
●
Figure S6: Optimal bin size. For each bin size, the coe�cients of variation is computed. The reddot near the reflection point is 100Kb.
10
Figure S7: Remove maternal copy number variation. Top panel shows an example of a maternalduplication at chromosome 10, which produced a false positive call for fetal trisomy 10. Bottompanel shows the maternal duplication is marked and masked by the our HMM model.
11
Figure S8: GC content vs normalized bin coverage. On both panels each point is represents a100K bin, x-axis is GC percent ⇥100, and y-axis is the normalized bin coverage. Both values areaveraged over all samples. Top panel is before spline smoothing, and bottom panel is after.