A VARIATIONAL BAYES BETA MIXTURE MODEL FOR FEATURE SELECTION IN DNA METHYLATION STUDIES ZHANYU MA * ,‡ and ANDREW E. TESCHENDORFF †,§,¶ * KTH-Royal Institute of Technology School of Electrical Engineering SE-100 44, Stockholm, Sweden † Statistical Genomics Group, Paul O'Gorman Building UCL Cancer Institute, University College London 72 Huntley Street, London WC1E 6BT, United Kingdom ‡ [email protected]§ a.teschendorff@ucl.ac.uk Received 13 September 2012 Revised 21 November 2012 Accepted 4 January 2013 Published 14 March 2013 An increasing number of studies are using beadarrays to measure DNA methylation on a genome-wide basis. The purpose is to identify novel biomarkers in a wide range of complex genetic diseases including cancer. A common di±culty encountered in these studies is dis- tinguishing true biomarkers from false positives. While statistical methods aimed at improving the feature selection step have been developed for gene expression, relatively few methods have been adapted to DNA methylation data, which is naturally beta-distributed. Here we explore and propose an innovative application of a recently developed variational Bayesian beta-mixture model (VBBMM) to the feature selection problem in the context of DNA methylation data generated from a highly popular beadarray technology. We demonstrate that VBBMM o®ers signi¯cant improvements in inference and feature selection in this type of data compared to an Expectation-Maximization (EM) algorithm, at a signi¯cantly reduced computational cost. We further demonstrate the added value of VBBMM as a feature se- lection and prioritization step in the context of identifying prognostic markers in breast cancer. A variational Bayesian approach to feature selection of DNA methylation pro¯les should thus be of value to any study undergoing large-scale DNA methylation pro¯ling in search of novel biomarkers. Keywords: Feature selection; beta mixture; DNA methylation; variational Bayes. ¶ Corresponding author. Journal of Bioinformatics and Computational Biology Vol. 11, No. 4 (2013) 1350005 (19 pages) # . c Imperial College Press DOI: 10.1142/S0219720013500054 1350005-1
19
Embed
A variational Bayes beta mixture model for feature selection in DNA methylation studies
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A VARIATIONAL BAYES BETA MIXTURE MODEL
FOR FEATURE SELECTION IN DNA
METHYLATION STUDIES
ZHANYU MA*,‡ and ANDREW E. TESCHENDORFF†,§,¶
*KTH-Royal Institute of Technology
School of Electrical Engineering
SE-100 44, Stockholm, Sweden†Statistical Genomics Group, Paul O'Gorman Building
UCL Cancer Institute, University College London
72 Huntley Street, London WC1E 6BT, United Kingdom‡[email protected]
Received 13 September 2012Revised 21 November 2012
Accepted 4 January 2013
Published 14 March 2013
An increasing number of studies are using beadarrays to measure DNA methylation on a
genome-wide basis. The purpose is to identify novel biomarkers in a wide range of complexgenetic diseases including cancer. A common di±culty encountered in these studies is dis-
tinguishing true biomarkers from false positives. While statistical methods aimed at improving
the feature selection step have been developed for gene expression, relatively few methods
have been adapted to DNA methylation data, which is naturally beta-distributed. Here weexplore and propose an innovative application of a recently developed variational Bayesian
beta-mixture model (VBBMM) to the feature selection problem in the context of DNA
methylation data generated from a highly popular beadarray technology. We demonstratethat VBBMM o®ers signi¯cant improvements in inference and feature selection in this type of
data compared to an Expectation-Maximization (EM) algorithm, at a signi¯cantly reduced
computational cost. We further demonstrate the added value of VBBMM as a feature se-
lection and prioritization step in the context of identifying prognostic markers in breastcancer. A variational Bayesian approach to feature selection of DNA methylation pro¯les
should thus be of value to any study undergoing large-scale DNA methylation pro¯ling in
search of novel biomarkers.
Keywords: Feature selection; beta mixture; DNA methylation; variational Bayes.
¶Corresponding author.
Journal of Bioinformatics and Computational BiologyVol. 11, No. 4 (2013) 1350005 (19 pages)
It is clear that there is an urgent need to identify novel biomarkers, e.g. prognostic
markers, for complex genetic diseases like cancer.1 However, identifying reliable
biomarkers from large scale genome-wide molecular pro¯ling studies is a notoriously
di±cult problem.1 From a methods perspective, this problem is known as feature
selection. The main aim of any feature selection procedure is to identify those fea-
tures which are more likely to be truly associated with a phenotype of interest. One of
the key di±culties of feature selection in the genomics context is the high-dimen-
sional nature of the data encompassing typically on the order of 104�106 features
(e.g. genes or single nucleotide polymorphisms (SNPs)), which may generate a sig-
ni¯cant number of false positives.2 Using stringent statistical signi¯cance measures
may also result in unacceptably large false negative rates, specially in the context of
quantitative data such as gene expression or DNA methylation.3 These problems are
further compounded by the presence of confounding factors, which may arti¯cally
in°ate or de°ate statistical signi¯cance levels.2,4 Therefore, statistical approaches
that aim to extract meaningul features while ¯ltering out false positives have re-
ceived considerable attention.5�12 One of the most popular methods in the gene
expression ¯eld has been to ¯lter features based on variance, since the assumption is
that features exhibiting low variability are more likely to represent noise.5 Others
have advocated a semi-supervised approach in which features are ¯rst selected using
a supervised algorithm and then further selected based on an unsupervised dimen-
sional reduction method such as principal component analysis (PCA) or nonnegative
matrix factorization (NMF).6 Subsequently, it was realized that similar improve-
ments in feature selection could be achieved by studying higher-order statistical
moments (e.g. skewness or kurtosis) of the molecular pro¯les (speci¯cally, gene ex-
pression pro¯les).7,8,13 Indeed, novel clinical subtypes and associated biomarkers in
prostate and breast cancer were identi¯ed using these more advanced feature se-
lection methods.13,14 These novel molecular subclasses and biomarkers in prostate
and breast cancer are now well established,15,16 which attests to the power and
potential clinical impact that such feature selection methods can have.
Studying higher-order statistical moments (e.g. kurtosis) of molecular pro¯les (i.e.
the expression pro¯le of a gene across a set of samples, or a CpG methylation pro¯le)
is not equivalent but is similar to the problem of identifying structure in the mo-
lecular pro¯le of a given feature.7 Intuitively, a feature exhibiting a striking bi-
modality (hence a non-Gaussian distribution) may be of more interest than a feature
which exhibits a highly variable but Gaussian pro¯le, specially if the bi-modality is
correlated to a phenotype of interest. Indeed, the bi-modality of such a feature is
more likely to describe genuine biology and to represent a feature that has not been
corrupted by biological noise or technical factors.7 This idea of performing feature
selection by studying the structure of individual molecular pro¯les and its proof-of-
concept has been demonstrated by us in the gene expression context7 using a vari-
ational Bayesian Gaussian Mixture Model.17
Z. Ma & A. E. Teschendor®
1350005-2
Themain purpose of this manuscript is to explore the analogous problem of feature
selection in the context of DNA methylation data. DNA methylation is an epigenetic
mark, a covalent modi¯cation of DNA, which normally happens at CpG dinucleo-
tides, and which plays an important role for cellular di®erentiation processes and in
complex genetic disease.18�24 Indeed, DNA methylation markers have been proposed
as early detection, diagnostic and prognostic markers in a wide range of di®erent
diseases including cancer.23 Catalyzing this increased interest in epigenomics are
signi¯cant advances in beadarray technology that now allow routine measurement of
DNA methylation at over thousands of CpG dinucleotides.25,26 These beadarrays
quantify DNA methylation in terms of a �-value, which represents the relative pro-
portion of methylation at the CpG site, thus taking values between 0 (unmethylated)
and 1 (fully methylated).25 Although some studies have considered using the logit-
transform y ¼ log2�=ð1� �Þ instead,27 owing to its more homoscedastic nature, it was
shown in Zhuang et al.28 that the logit-basis can lead to worse inference as it can
aggravate the e®ects of outliers (i.e. � values close to 0 or 1): from a biological per-
spective an outlier at � ¼ 0:999 is not more interesting than one at � ¼ 0:9, yet on the
logit scale they would be widely separated. Although normalization and clustering
methods designed for beta-valued DNA methylation data have recently been
investigated,29�36 there is still a signi¯cant shortage of feature selection methods.28,37
Thus, the second purpose of this manuscript is to explore the application of a
recently developed Variational Bayes beta-mixture model (VBBMM)38 to DNA
methylation data. To the best of our knowledge, this is the ¯rst application of a
VBBMM model to this type of data. To assess VBBMM on this data, we ¯rst
benchmark its performance against an analogous EM+BIC algorithm.36,39,40 Al-
though the advantages of using a variational Bayesian approach over EM+BIC are
well understood,41,42 it is important to investigate the relative performance of these
methods in novel contexts. To perform the comparison between methods, we focus
on DNA methylation data where e®ect sizes are small so as to provide a more
challenging scenario for the algorithms. Speci¯cally, we use DNA methylation data
from whole blood samples from ovarian cancer patients and age-matched healthy
controls where the di®erences in DNA methylation between cases and controls is
driven by relatively small changes in blood cell type composition as demonstrated by
us previously.43 Clearly, in the opposite extreme case where e®ect sizes are fairly
large, for instance when comparing normal to cancer epithelial tissue, both types of
algorithm are expected to yield similar results. The restriction to small e®ect sizes is
also of particular interest since the evidence so far points towards epidemiological
and disease risk DNA methylation markers of relatively small e®ect sizes.24,23,43,44
This manuscript is organized as follows. In Sec. 2 we describe the DNA meth-
ylation data sets and review the VBBMM. In Sec. 3 we ¯rst compare VBBMM to
EM+BIC in the context of DNA methylation data, and clearly demonstrate the
improved sensitivity and positive predictive value that VBBMM o®ers over EM
+BIC. We subsequently apply VBBMM to the problem of feature selection in
large-scale DNA methylation studies and demonstrate its added value in the
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
1350005-3
context of identifying prognostic markers in breast cancer. Section 4 presents our
conclusions.
2. Data and Methods
2.1. The Illumina In¯nium DNAm assay
All DNA methylation data sets have been generated using Illumina's In¯nium
Human Methylation 27k Beadchips25 and have already been presented else-
where.43,44 The Beadchips interrogate the methylation status of approximately
27,000 CpGs. In this work we used the normalized data as described in Refs. 43
and 44. Let i denote the CpG and j the sample. The normalized methylation values
of the CpGs follow an approximate �-valued distribution, with � constrained to
lie between 0 (unmethylated locus) and 1 (methylated). This follows from the de¯-
nition of � as the ratio of methylated to combined intensity values i.e.
�ij ¼Mij
Uij þMij þ e; ð1Þ
where Uij and Mij are the unmethylated and methylated intensity values of the
probe (averaged over bead replicates) and e is a small correction term to regularize
probes of low total signal intensity (i.e. probes with Uij þMij � 0 after background
subtraction). Thus, our data matrices Xij are such that Xij ¼ �ij where �ij is the
normalized methylation value as given above.
2.2. The data
Data Set 1: DNAm of whole blood samples from ovarian cancer patients before
treatment and age-matched healthy controls
We consider a DNAm data set over 25642 CpGs and consisting of 261 whole blood
samples, 113 of these from women with ovarian cancer (cases) and 148 from age-
matched healthy women (controls).43 We previously showed that there are many
CpGs which are di®erentially methylated between cases and controls, and that there
was an enrichment for di®erentially methylated CpGs mapping to markers of lym-
phocytes and granulocytes (a total of 138 CpGs) (Supp.Table S1),45 re°ecting an
increase in the granulocyte to lymphocyte ratio in the presence of ovarian cancer.43
Because the DNA methylation changes re°ect changes in blood cell type composi-
tion, the associated e®ect sizes are small, making it an ideal scenario in which to
evaluate feature selection methods.
Data Set 2: DNAm of breast cancer tissue samples
This Illumina 27k DNAm data set is de¯ned over 24589 CpGs and 113 breast cancer
tissue samples.46 Of the 113 patients, 59 died of the disease or disease-related causes
(overall survival) and 54 remained alive until end of study or were lost to follow-up.
Z. Ma & A. E. Teschendor®
1350005-4
Data Set 3: DNAm of breast cancer tissue samples
An independent Illumina 27k DNAm data set over 27578 CpGs and 103 breast
cancer tissue samples.47 Since survival information was not available for these
samples, we used relapse free survival as a surrogate. Of the 103 patients, relapse
information was available for 82 samples, of which 18 relapsed and 64 did not.
2.3. The variational Bayes Beta Mixture Model (VBBMM)
2.3.1. The variational Bayesian beta-mixture model
Background to the variational Bayes method can be found elsewhere.41,48 Here we
brie°y review the VBBMM, full details of which are described in Ma et al.38 The
probability density function of the beta distribution is
Betaðx;u; vÞ ¼ 1
betaðu; vÞ xu�1ð1� xÞv�1; u; v > 0; ð2Þ
where betaðu; vÞ is the beta function betaðu; vÞ ¼ �ðuÞ�ðvÞ�ðuþvÞ and �ð�Þ is the gamma
function de¯ned as �ðzÞ ¼ R 10tz�1e�tdt. The shape of the beta distribution depends
on two shape parameters u; v. Assuming a mixture model and a set of i.i.d obser-
vation X ¼ fx1; . . . ;xNg, the likelihood is given as
fðX;U;VÞ ¼YNn¼1
fðxn;¦;U;VÞ: ð3Þ
with
fðx;¦;U;VÞ ¼XI
i¼1
�iBetaðx;ui;viÞ ð4Þ
¼XI
i¼1
�i
YLl¼1
Betaðxl;uli; vliÞ; ð5Þ
and where x ¼ fx1; . . . ;xLg, ¦ ¼ f�1; . . . ; �Ig, U ¼ fu1; . . . ;uIg and V ¼fv1; . . . ;vIg. fui;vig denote the parameters vectors of the ith mixture component
and uli; vli are the (scalar) parameters of the beta distribution for element xl.
In order to perform the Bayesian analysis one seeks a conjugate prior for the beta
distribution. It can be shown that the conjugate prior is
fðu; vÞ ¼ 1
Cð�0; �0; �0Þ�ðuþ vÞ�ðuÞ�ðvÞ
� ��0
e��0ðu�1Þe��0ðv�1Þ ð6Þ
where �0, �0, �0 are free positive parameters and Cð�0; �0; �0Þ is a normaliza-
tion factor such thatR 10
R 10fðu; vÞdudv ¼ 1. Indeed this leads to the posterior
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
n¼1 lnð1� xnÞ.However, this expression is analytically intractable. In Ref. 38 a variational so-
lution was proposed by approximating the conjugate prior as
fðu; vÞ � fðuÞfðvÞ: ð9Þ
where
fðu;�; �Þ ¼ ��
�ð�Þ u��1e��u; fðv; �; �Þ ð10Þ
¼ ��
�ð�Þ v��1e��v: ð11Þ
The same form of approximation then applies to the posterior distribution as
fðu; vjXÞ � fðujXÞfðvjXÞ: ð12Þ
Next, a hierarchical model for Bayesian estimation can be constructed following
the principles of graphical models.49 For each observation xn, let the corre-
sponding zn ¼ ½zn1; . . . ; znI �T be the indication vector with one element equal to 1
and the rest equal to 0. Denoting Z ¼ fz1; . . . ; zNg and assuming the indication
vectors are independent given the mixing coe±cients, the conditional distribution
of Z given ¦ is
fðZj¦Þ ¼YNn¼1
YIi¼1
� znii : ð13Þ
Introducing the Dirichlet distribution as the prior distribution of the mixing
coe±cients, the probability function of ¦ can be written as
fð¦Þ ¼ Dirð�jcÞ ¼ CðcÞYIi¼1
� ci�1i ð14Þ
where CðcÞ ¼ �ðcÞ�ðc1Þ����ðcI Þ and c ¼ PI
i¼1 ci.
Z. Ma & A. E. Teschendor®
1350005-6
Finally, the logarithm of the full joint density function of the data X and all the
i.i.d. latent variables Z ¼ fU;V;¦;Zg is given by38
LðX;ZÞ ¼ ln fðX;Z;U;V;¦Þ ð15Þ
¼ con:þXNn¼1
XI
i¼1
zni ln�i þXL
l¼1
ln�ðuli þ vliÞ�ðuliÞ�ðvliÞ
(ð16Þ
þXL
l¼1
ðuli � 1Þ lnxln þ ðvli � 1Þ lnð1� xlnÞ½ �)
ð17Þ
þXL
l¼1
XI
i¼1
ð�li � 1Þ lnuli � �liuli½ � ð18Þ
þXL
l¼1
XI
i¼1
ð�li � 1Þ ln vli � �livli½ � þXI
i¼1
ðci � 1Þ ln�i: ð19Þ
The speci¯c update rules for the parameters can be found in Ma et al.38
2.3.2. Algorithm comparison
We benchmark the variational Bayesian beta mixture model to an analogous beta
mixture model implementation using the Expectation Maximization (EM) algorithm
and the Bayesian Information Criterion (BIC) for model selection.50
3. Experimental Results
3.1. Improved sensitivity of the variational Bayesian BMM
on DNA methylation data
To assess the VBBMM we benchmarked it to an analogous EMþBIC beta mixture
model. In our ¯rst test we compared the two algorithms in their ability to detect
biological structure in DNA methylation pro¯les (i.e. bi-modality or multi-modality
which correlates with a biological phenotype). To this end we studied DNA meth-
ylation pro¯les of whole blood samples from 113 ovarian cancer cases and 148 age-
matched healthy controls (DataSet1, Methods), focusing on a subset of 138 CpGs
which map to genes marking lymphocytes and granulocytes (Supp.Table S1),43 the
two main cell constituents of whole blood. Since the granulocyte to lymphocyte ratio
is increased in the blood of ovarian cancer patients,48,51,52 the CpGs associated with
these genes should be di®erentially methylated. From the point of view of unsu-
pervised clustering, the DNA methylation pro¯les of each of these CpGs should
exhibit structure, i.e. the optimal model should be one with at least two clusters, with
the clusters correlating with the case/control phenotype. Thus, we ran the VB and
EMþBIC algorithms separately on each of the 138 CpG methylation pro¯les and
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
1350005-7
recorded the optimal number of clusters. Using the VB mixture model, 120 of these
138 CpGs (i.e. 87%) exhibited structure, in stark contrast to the EMþBIC model
where only 29 (21%) did (Table 1). All CpGs but three that showed structure under
the EMþBIC model, did so also under the VB approach. In contrast, up to 94 CpGs
only showed structure under the VB model.
Although the selected CpGs should show structure on biological grounds, one still
needs to demonstrate that the clustering structure inferred is of biological relevance,
and in particular that the CpGs identi¯ed to have structure only under the VB model
are more correlated with the phenotype of interest than those identi¯ed under
EMþBIC. Thus, we asked how well the speci¯c clusters, inferred using the two
algorithms, correlated with case/control status. To evaluate the concordance be-
tween the clustering output and a binary phenotype one needs a correlative measure
which can handle clustering solutions of more than two clusters. The adjusted Rand
Index (ARI)53,54 has been used extensively for this purpose (see for e.g. Ref. 55 for the
rationale of using the Rand Index). The ARI can be viewed as a Rand Index cor-
rected for random chance with values further away from 0 re°ecting stronger sta-
tistical signi¯cance.54 The ARI analysis showed that the clusters inferred using
the VB approach were indeed more strongly associated with case/control status
(Fig. 1(a)). A typical example of a DNA methylation pro¯le where the VB algorithm
predicted structure but where EMþBIC did not, con¯rmed that the inferred clusters
correlate signi¯cantly with the phenotype of interest (Fig. 1(b)). Thus, we can
conclude that not only does the VB model identify more structure in DNA meth-
ylation data, thus potentially allowing for improved feature selection, but that
the inferred clusters themselves are more strongly associated with the biological
phenotype.
3.2. The variational Bayesian BMM improves the positive
predictive value
To further compare the algorithms, we adopted a discovery/test set partition
strategy. The data was split into a 50% discovery and 50% test set and features
selected from the discovery set using either EMþBIC or VB. A total of 50 di®erent
discovery/test set partitions were considered. In line with our previous results, the
VB algorithm o®ered substantial improved power of detecting CpGs with clustering
Table 1. For the 138 CpGs in DataSet1, we provide
their distribution in terms of the optimal number ofclusters in their DNA methylation pro¯les as estimated
using the EMþBIC and VB algorithms. The maxi-
mum number of clusters considered was in both cases 6.
Clusters 1 2 3 4 5 6
EMþBIC 109 27 2 0 0 0
VB 18 27 51 33 9 0
Z. Ma & A. E. Teschendor®
1350005-8
structure in their methylation pro¯les (Fig. 2(a)) and the clustering itself was also
more strongly associated with the phenotype of interest (Figs. 2(b) and 2(c)). The
improvement of VB over EMþBIC was more substantial owing to the smaller
sample size of the discovery set. Importantly, those CpGs selected to exhibit struc-
ture and signi¯cant Rand Index values under the VB model in the discovery sets, also
exhibited much higher ARI values in the corresponding test sets, compared to those
features selected under the EMþBIC model (Fig. 2(d)). This indicates that the VB
model improves the reproducibility and is far superior to EMþBIC in identifying the
most relevant features. Indeed, many of the CpGs predicted to exhibit clustering
under EMþBIC in the discovery set were not replicated in the evaluation set.
3.3. Improved positive predictive value on feature selection
from all 27k CpGs
So far, our analysis has focused on 138 CpGs which mark lymphocyte and granu-
locyte markers and which therefore should be discriminatory of cancer/normal status
as explained in Ref. 43. To show that the improved positive predictive value of VB
over EMþBIC is independent of this prior selection of CpGs, we next considered all
25,642 CpGs on the Illumina In¯nium Beadchip. The data set was split into two
mutually exclusive partitions of 130 (74 healthy and 56 cases) and 131 (74 healthy
and 57 cases) samples. Because of the high-computational cost associated with
running EMþBIC individually on each of 25,642 CpGs, we used t-test P-values to
(a) (b)
Fig. 1. (a) Adjusted Rand Index (ARI, y-axis) is compared between CpGs exhibiting structure under
EMþ BIC (BIC), CpGs exhibiting structure only under the VB model (OnlyVB) and CpGs exhibiting
structure under the VB model (VB). The number of CpGs in each class is given in brackets. The P-valuesare from a Wilcoxon rank sum test comparing each of the OnlyVB and VB categories to the BIC category.
(b) An example DNA methylation pro¯le of a CpG, for which the VB model inferred structure, but which
under the EMþBIC model did not. The clusters inferred using the VB model are shown in di®erent colors.
Squares denote controls (N), diamonds denote cases (C). The distribution of cases and controls in eachcluster is given together with the associated Fisher-test P -value.
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
1350005-9
¯rst rank and select 1000 features in each partition. Of the two separate 1000 CpG
lists, 537 overlapped. Running EMþBIC and VB separately on each of these two
lists of 1000 features, we next selected the CpGs exhibiting clustering structure. Only
14 of the 537 overlapping CpGs exhibited structure in both partitions under
EMþBIC, in stark contrast to 428 overlapping CpGs under the VB model. Com-
paring the adjusted Rand Indices of these subsets of CpGs in the corresponding test
set partition further showed that these were higher for the CpGs selected under the
VB model (Fig. 3). In all cases, the ARI values were signi¯cantly higher than ran-
dom, demonstrating once again that the inferred clusters correlate signi¯cantly with
cancer/normal status.
3.4. Application to identifying prognostic markers
To demonstrate the practical utility of the VBBMM in an omic context, we con-
sidered the problem of identifying prognostic DNA methylation markers. Prognostic
DNA methylation markers have been identi¯ed in many cancers.46,47,56,57 As with
blood-based diagnostic markers, the expected e®ect sizes of prognostic markers is
small, however, unlike diagnostic markers, we would expect a much smaller number
of DNAm markers to correlate with clinical outcome.46,47 Thus, this represents a
challenging scenario which may bene¯t from application of a clustering algorithm in
(a) (b) (c) (d)
Fig. 2. (a) Boxplots comparing the fraction of the 138 CpGs that exhibit structure under the EMþBIC
(BIC) and VB algorithms. The boxplots show the distributions of these fractions across 50 di®erent
discovery sets. Since the 138 CpGs were selected to exhibit structure associated with cancer/normal status,
we denote this fraction as the power obtained by feature selection algorithm (Power). (b) Boxplots of theadjusted Rand Index (ARI) values for the 138 CpGs, averaged over the 50 discovery sets. (c) Boxplots of
the mean ARI values over features exhibiting clustering in the discovery set. Here, each boxplot shows the
distribution of this mean ARI across the 50 distinct discovery sets. (d) Boxplots of the ARI values,averaged over selected features (i.e. those exhibiting clustering in the discovery set), as evaluated in the
corresponding evaluation/test set. Boxlots represent data over the 50 distinct discovery-test set partitions.
In panels a,c and d the P -value is from an unpaired two-tailed Wilcoxon rank sum test, in panel b from a
paired two-tailed rank sum test.
Z. Ma & A. E. Teschendor®
1350005-10
the feature selection process. Indeed, we posited that identi¯cation of prognostic
markers may bene¯t from an additional clustering step, similar to the improvements
we noted previously in the gene expression context.7
Because of the computational cost of running a beta mixture model for �104
features, we here adopted the following two-step feature selection strategy:
(1) First, an intial feature selection is performed using standard statistics and
standard corrections for multiple testing. This yields an initial candidate list.
(2) Second, on this candidate feature list, we run the VBBMM algorithm to identify
those exhibiting structure which is compatible (i.e. correlated) with the pheno-
type of interest.
(3) A new statistic based on the inferred structure (the ARI) is introduced to rerank
the candidate features. We note that this procedure penalizes structureless fea-
tures and places them at the bottom of the list. This yields a new ¯nal ranked list
of candidate biomarkers.
Our hypothesis is that steps 2 and 3 improves the ranking of the features, promoting
true positives to the top of the list, while penalizing and eliminating false positives,
thus allowing more robust biomarkers to be identi¯ed. To determine the robustness
of the candidate biomarkers we use an independent validation set.
To test this idea, we used the breast cancer samples of DataSet2 (103 samples) as
a discovery set, ranking all 24589 CpGs according to a Cox-regression (with overall
survival as endpoint). As a candidate feature list, we selected the top 634 ranked
CpGs at an estimated false discovery rate (FDR) < 0:25. Thus, about a quarter of
the 634 CpGs are expected to be false positives. Next, we applied the VB algorithm
to each of these 634 CpGs, computed their ARI, and ¯nally prioritized them
according to their ARI value. We veri¯ed that many of the top 100 reranked CpGs
Fig. 3. Boxplots of adjusted Rand Index (ARI) values for CpGs, selected by t-tests and EMþBIC (in-
dicated as BIC) or VB from one partition, as evaluated in the mutually exclusive partition. In the
EMþBIC case there were only 14 overlapping CpGs exhibiting structure in each partition, while in the
VB case there were 428. In the VB case, we also plot 428 \null" ARI values obtained by taking the 95%quantile from 100 randomizations of the phenotype labels. P-values given are from a Wilcoxon rank sum
test comparing the distribution of ARI values between BIC and VB.
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
1350005-11
had statistically signi¯cant or marginally signi¯cant ARI values (SuppFig. 1). A
total of 129 of the 634 CpGs were deemed structureless (ARI ¼ 0) by the VB al-
gorithm. Thus, we compared the group of 100 highly reranked CpGs (i.e. highest
ARI) to the 100 lowest reranked ones (i.e. with ARI ¼ 0), to determine which subset
validated better in an independent data set (DataSet3). Absolute Cox statistics were
signi¯cantly higher for the top 100 ARI-reranked CpGs compared to those with
ARI ¼ 0 or those highly ranked only by Cox-statistics (Fig. 4).
As another benchmark, we also compared the absolute Cox-statistics in the test
set against those of 100 CpGs selected using a sparse NMF58 in the discovery set.
NMF was applied in a semi-supervised context, analogous to the semi-supervised
PCA algorithm of Bair et al.6 Speci¯cally, NMF was applied to the data matrix of the
634 top Cox-ranked CpGs (FDR < 0:25), followed by selection of 100 CpGs with the
strongest weights in the basis NMF component with the largest absolute Cox-sta-
tistic in the discovery set. We observed that the top 100 ARI-reranked CpGs also had
higher Cox-statistics in the test set than those selected via NMF (Fig. 4).
The prognostic CpGs highly ranked under the VB model also showed a stronger
level of consistency than those with zero ARI values (Fig. 5 & SuppFig. 2). In fact,
the reranking induced by the ARI identi¯ed 28 hypomethylated prognostic CpGs
among the top ranked features (Fig. 5 & SuppFig. 2), twice as many as those with
ARI¼ 0. In contrast, if CpGs were ranked only according to their Cox-statistics, we
observed that this ranking did not in°uence the cross-validation accuracy (Fig. 5 &
SuppFig. 2). Importantly, twice as many CpGs validated among the top 100 VB/
ARI reranked CpGs, than among the top 100 Cox-ranked ones (Fig. 5 & SuppFig. 2),
supporting the view that the structural inference step can improve the prioritization
of relevant features in discovery/training sets. This result was robust to a more
Fig. 4. Comparison of the absolute Cox-statistics (Wald z-statistics) in the test set (DataSet3) of the
top 100 CpGs ranked according to the Cox-statistic (Cox) in the discovery set (DataSet2) against
the top 100 CpGs ranked according to the combined CoxþARI strategy (Coxþ high ARI) (in the
discovery set), 100 CpGs with zero ARI (ARI ¼ 0), and ¯nally also 100 CpGs selected from the NMF basiscomponent correlating strongest with survival in the discovery set (NMF1). Wilcoxon rank sum test
P-value between (Coxþhigh ARI) and the other two classes, as well as between (Coxþhigh ARI) and
NMF are given.
Z. Ma & A. E. Teschendor®
1350005-12
stringent signi¯cance threshold used in the test set (SuppFig. 3). We also observed
that feature selection using the combined CoxþARI strategy was superior to that
provided by the top (NMF1) and top two (NMF2) NMF components (Fig. 5).
Of note, among the 28 validated hypomethylated CpGs (SuppTab. 2), 4 mapped
to genes (ATP2B3, SLC25A31, NOS3, ITPR2) involved in the KEGG Calcium
Signaling Pathway, a 15-fold enrichment (Odds Ratio ¼ 15:3, Fisher's exact test
P ¼ 0:0002). Interestingly, calcium signaling is required for activation of the Epi-
thelial to Mesenchymal transition (EMT) pathway and the associated silencing of
the E-Cadherin gene,59 and so, given that activation of EMT and low expression of
E-Cadherin is a hallmark of poor prognosis in breast cancer,14,60,61 the observed
hypomethylation of genes in the calcium signaling pathway in poor outcome breast
cancers is consistent with their overexpression and the observed overactivation of
EMT in these cancers.
4. Discussion and Conclusions
We have here proposed a novel feature selection algorithm which is speci¯c to DNA
methylation data generated with Illumina beadarrays. The algorithm is based on the
hypothesis that features with bi- or multi-modal DNA methylation pro¯les and for
which the structure is correlated to a phenotype of interest are more likely to be true
positives. As such, they are more likely to exhibit larger absolute statistics in inde-
pendent data. We have veri¯ed this in the context of cancer diagnostic markers in
whole blood and prognostic markers in breast cancer, both challenging scenarios in
which e®ect sizes are small.
Fig. 5. Barplots showing the number of CpGs, declared as signi¯cantly associated with clinical outcome in
the breast cancer training set, and which are also signi¯cant in the test set. These numbers are shown for
those hypermethylated (hyperM) and hypomethylated (hypoM) in poor prognosis breast cancers sepa-
rately, and across six di®erent feature selection strategies. (1) CoxþARI: Top 100 CpGs ranked accordingto Cox statistic and then reranked by ARI. (2) CoxþðARI ¼ 0): 100 CpGs ranked high by Cox-statistic
but with ARI=0. (3) Cox-high: Top 100 CpGs ranked highest by Cox-statistic regardless of ARI. (4) Cox-
low: 100 CpGs ranked high by Cox-statistic but only marginally signi¯cant. (5) NMF1: 100 CpGs withlargest weights in the basis NMF component with the most signi¯cant Cox-statistic. (6) NMF2: 100 CpGs
with largest weights in the two basis NMF components with the most signi¯cant Cox-statistics (top
50 CpGs selected from each).
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
1350005-13
We have also seen that incorporating a clustering inference step in a univariate
fashion to prioritize ranked features, outperformed feature selection done via a pop-
ular multivariate dimensional reduction method (sparse NMF). Although we recently
demonstrated the power of NMF as an unsupervised dimensional reduction method,28
it is important to realize that NMF is not designed to identify individual features since
these are not easily inferred from the estimated NMF basis vectors. Thus, NMF, being
a multivariate dimensional reduction method, lacks the plasticity and hence power
required to identify all the features correlating with the phenotype of interest.
To infer structure in the DNA methylation pro¯les we used the VBBMM model.
The advantages of using a variational Bayes approach over EMþBIC are well
documented,41,48 and here we have con¯rmed, in the novel context of DNA meth-
ylation data, that VBBMM signi¯cantly outperforms EMþBIC in terms of sensi-
tivity and positive predictive value. Importantly, these improvements are obtained
at a reduced computational cost. For instance, with EMþBIC, it took about 10
minutes to run the algorithm for one feature, with up to six mixture components and
using 10 di®erent initializations on an Intel Core Processor i7-2720QM CUP at
2.20GHz. In contrast, using VB, it took only about 20 to 30 seconds to run one
feature, with a maximum of 6 components, and a total of 10 di®erent runs/intiali-
zations. Thus, the VB framework speeds up the analysis by over a factor of 1/10,
which is an important consideration if it is to be applied as a feature selection step in
large omic data sets. For instance, upcoming epigenome wide association studies
(EWAS) are using methylation beadchips with over 100,000 features.62 Parallelizing
the computation on an 8-core workstation would therefore take approximately four
days, or if access to a 30�40 node cluster is possible, the computation would take less
than a day. In contrast, using EMþBIC it would take 10 days even on a 30�40 node
cluster. Another important consideration is that the variational inference framework
may allow for further substantial speed enhancements (by factors of �10), through
use of information-geometric optimization methods.63,64 Hence, with VBBMM, even
a 100,000 feature data matrix would be manageable with an 8-core workstation. We
can conclude therefore that variational Bayes methods make the application of beta-
mixture models practical, in contrast to the EMþBIC framework which makes the
associated lengthy computations far less manageable.
It is important to also point out that in this work we have only explored the
VBBMM as a feature selection step, i.e. inferring structure in one-dimensional DNA
methylation pro¯les to identify true positives more reliably. One may also wish to
apply the VBBMM to cluster samples over more than one feature/dimension.
However, analytically, it is not yet possible to fully incorporate the covariance
structure of the features in the inference procedure, which thus precludes application
to clustering over more than one dimension. We leave this interesting and chal-
lenging question for a future investigation.
In summary, given that DNA methylation biomarkers for improved prognosis
and/or early detection of cancers are likely to be characterized by small e®ect
sizes,23,43,44 it is important to have powerful statistical algorithms in place that can
Z. Ma & A. E. Teschendor®
1350005-14
help discern true from false positives. Thus, the variational Bayes beta mixture
model presented here should be of interest to any study embarking on DNA meth-
ylation pro¯ling including upcoming EWAS.23
Note Added
Legends to Supplementary Data
Supplementary Table S1
The list of 138 CpGs mapping to genes that are overexpressed/underexpressed in
granulocytes and lymphocytes. We provide the CpG probe identi¯er, the Entrez ID,
Gene Symbol and if over/under expressed in granulocytes/lymphocytes.
Supplementary Table S2
The list of 30 validated prognostic CpGs in breast cancer (28 hypomethylated in poor
prognosis and 2 hypermethylated in poor prognosis). We provide the CpG ID, the
Entrez ID, the gene symbol, the hazard ratio (HR), associated Cox z-statistic, P-value
and number of samples with clinical annotation in both discovery and test sets.
Supplementary Figure S1
Observed adjusted Rand Index values for the top 100 CpGs ranked according to the
Cox-statistic and adjusted Rand Index in the discovery set (DataSet2). The null
distributions of the adjusted Rand Index values are shown as boxplots in black and
were estimated from 1000 Monte Carlo runs (randomising phenotype labels). The
observed values are colored in red if the adjusted Rand Index value is signi¯cant at a
nominal P < 0:05 level, or in green if less signi¯cant. For the top 100 CpGs, all
adjusted Rand Index values had P -values less than 0.12.
Supplementary Figure S2
(A{B) Scatterplots of Cox-statistics (z-stat) in discovery set (DataSet2) (x-axis)
against those in the validation set (DataSet3) (y-axis). (A) shows the statistics for
the top 100 ranked CpGs according to a Cox-regression and the adjusted Rand Index
from the VB clustering in the discovery set. (B) shows the statistics for the 100 CpGs
bottom ranked by the adjusted Rand Index in discovery set. (C{D) As (A�B), but
(C) panel shows the scatterplot for the top 100 CpGs ranked only according to the
Cox-statistic in the discovery set, while panel (D) shows the scatterplot for the
bottom 100 Cox-ranked features. In all panels, the number of CpGs passing statis-
tical signi¯cance (Cox P-value P < 0:1) in the test set are indicated in black.
Supplementary Figure S3
Same as Supplementary Figure S2, but now using a Cox P-value threshold of
P < 0:05 in the test set.
A Variational Bayes Beta Mixture Model for Feature Selection in DNA
1350005-15
Acknowledgments
AET is supported by aHeller Research Fellowship. ZM is partly supported by internal
KTH funding. We wish to thank Martin Widschwendter for useful discussions.
References
1. Sawyers CL, The cancer biomarker problem, Nature 452:548�552, 2008.2. Leek JT, Storey JD, Capturing heterogeneity in gene expression studies by surrogate
variable analysis, PLoS Genet 3:1724�1735, 2007.3. Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A, False discovery rate, sensi-
tivity and sample size for microarray studies, Bioinformatics 21:3017�3024, 2005.4. Leek JT Storey JD, A general framework for multiple testing dependence, Proc Natl Acad
Sci USA 105:18718�18723, 2008.5. Bourgon R, Gentleman R, Huber W, Independent ¯ltering increases detection power for
high-throughput experiments, Proc Natl Acad Sci USA 107:9546�9551, 2010.6. Bair E, Tibshirani R, Semi-supervized methods to predict patient survival from gene
expression data, PLoS Biol 2:E108, 2004.7. Teschendor® AE, Naderi A, Barbosa-Morais NL, Caldas C, Pack: Pro¯le analysis using
clustering and kurtosis to ¯nd molecular classi¯ers in cancer, Bioinformatics22:2269�2275, 2006.
8. Li L, Chaudhuri A, Chant J, Tang Z, Padge: Analysis of heterogeneous patterns ofdi®erential gene expression, Physiol Genomics 32:154�159, 2007.
9. Wang J, Wen S, Symmans WF, Pusztai L, Coombes KR, The bimodality index: Acriterion for discovering and ranking bimodal signatures from cancer gene expressionpro¯ling data, Cancer Inform 7:199�216, 2009.
10. Hellwig B, Hengstler JG, Schmidt M, Gehrmann MC, Schormann W, Rahnenfhrer J,Comparison of scores for bimodality of gene expression distributions and genome-wideevaluation of the prognostic relevance of high-scoring genes, BMC Bioinformatics 11:276,2010.
11. Bessarabova M, Kirillov E, Shi W, Bugrim A, Nikolsky Y, Nikolskaya T, Bimodal geneexpression patterns in breast cancer, BMC Genomics 11:S8, 2010.
12. Mpindi JP, Sara H, Haapa- Paananen S, Kilpinen S, Pisto T, Bucher E, Ojala K, Iljin K,Vainio P, Bjrkman M, Gupta S, Kohonen P, Nees M, Kallioniemi O, Gti: A novel algo-rithm for identifying outlier gene expression pro¯les from integrated microarray datasets,PLoS One 6:e17259, 2011.
13. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S,Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA,Chinnaiyan AM, Recurrent fusion of tmprss2 and ets transcription factor genes inprostate cancer, Science 310:644�648, 2005.
14. Teschendor® AE, Journe M, Absil PA, Sepulchre R, Caldas C, Elucidating the alteredtranscriptional programs in breast cancer using independent component analysis, PLoSComput Biol 3:e161, 2007.
15. Colombo PE, Milanezi F, Weigelt B, Reis-Filho JS, Microarrays in the 2010s: The con-tribution of microarray-based gene expression pro¯ling to breast cancer classi¯cation,prognostication and prediction, Breast Cancer Res 13:212, 2011.
16. Mosquera JM, Mehra R, Regan MM, Perner S, Genega EM, Bueti G, Shah RB, Gaston S,Tomlins SA, Wei JT, Kearney MC, Johnson LA, Tang JM, Chinnaiyan AM, Rubin MA,Sanda MG, Prevalence of tmprss2-erg fusion prostate cancer among men undergoingprostate biopsy in the united states, Clin Cancer Res 15:4706�4711, 2009.
17. Teschendor® AE, Wang Y, Barbosa-Morais NL, Brenton JD, Caldas C, A variationalbayesian mixture modelling framework for cluster analysis of gene-expression data,Bioinformatics 21:3025�3033, 2005.
18. Baylin SB Ohm JE, Epigenetic gene silencing in cancer ��� A mechanism for early on-cogenic pathway addiction? Nat Rev Cancer 6:107�116, 2006.
19. Feinberg AP, Ohlsson R, Heniko® S, The epigenetic progenitor origin of human cancer,Nat Rev Genet 7:21�33, 2006.
20. Jones PA, Baylin SB, The epigenomics of cancer, Cell 128:683�692, 2007.21. Petronis A, Epigenetics as a unifying principle in the aetiology of complex traits and
diseases, Nature 465:721�727, 2010.22. Feinberg AP, Epigenomics reveals a functional genome anatomy and a new approach to
common disease, Nat Biotechnol 28:1049�1052, 2010.23. Rakyan VK, Down TA, Balding DJ, Beck S, Epigenome-wide association studies for
common human diseases, Nat Rev Genet 12:529�541, 2011.24. Teschendor® AE, Jones A, Fiegl H, Sargent A, Zhuang JJ, Kitchener HC, Widsch-
wendter M, Epigenetic variability in cells of normal cytology is associated with the risk offuture morphological transformation, Genome Med 4:24, 2012.
25. Bibikova M, Fan JB, Genome-wide dna methylation pro¯ling, Wiley Interdiscip Rev SystBiol Med 2:210�223, 2010.
26. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M,Validation of a dna methylation microarray for 450,000 cpg sites in the human genome,Epigenetics 6:692�702, 2011.
27. Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, Lin SM, Comparison of beta-value and m-value methods for quantifying methylation levels by microarray analysis,BMC Bioinformatics 11:587, 2010.
28. Zhuang J, Widschwendter M, Teschendor® AE, A comparison of feature selection andclassi¯cation methods in dna methylation studies using the illumina 27k platform, BMCBioinformatics 13:59, 2012.
29. Bar¯eld RT, Kilaru V, Smith AK, Conneely KN, CpGassoc: An R function for analysis ofDNA methylation microarray data, Bioinformatics 28(9):1280-1, 2012.
30. Kilaru V, Bar¯eld RT, Schroeder JW, Smith AK, Conneely KN, Methlab: A graphicaluser interface package for the analysis of array-based dna methylation data, Epigenetics7:225�229, 2012.
31. Laurila K, Oster B, Andersen CL, Lamy P, Orntoft T, Yli-Harja O, Wiuf C, A beta-mixture model for dimensionality reduction, sample classi¯cation and analysis, BMCBioinformatics 12:215, 2011.
37. Sun H, Wang S, Penalized logistic regression for high-dimensional dna methylation datawith case-control studies, Bioinformatics 28:1368�75, 2012.
38. Ma Z, Leijon A, Bayesian estimation of beta mixture models with variational inference,IEEE Trans Pattern Anal Machine Intel 33(11):2160�2173, 2011.
39. Dempster AP, Laird NM, Rubin DB, Maximum likelihood from incomplete data via theem algorithm, J Roy Stat Soc B 39:1�38, 1977.
40. Schwarz G, Estimating the dimension of a model, Annls.Stat. 6:461�464, 1978.41. Attias H, Inferring parameters and structure of latent variable models by variational
bayes, Proc 15th Conf Uncertainty in Arti¯cial Intelligence, pp. 21�30, 1999.42. MacKay DJ, Developments in probabilistic modelling with neural networks-ensemble
learning, Neural Networks: Arti¯cial Intelligence and Industrial Applications. Proc 3rdAnnual Symp on Neural Networks, Springer, Nijmengen, pp. 191�198, 1995.
43. Teschendor® AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, Apostolidou S,Jones A, Lechner M, Beck S, Jacobs IJ, Widschwendter M, An epigenetic signature inperipheral blood predicts active ovarian cancer, PLoS One 4:e8274, 2009.
44. Teschendor® AE, Menon U, Gentry-Maharaj A, Ramus SJ, Weisenberger DJ, Shen H,Campan M, Noushmehr H, Bell CG, Maxwell AP, Savage DA, Mueller- Holzner E, MarthC, Kocjan G, Gayther SA, Jones A, Beck S, Wagner W, Laird PW, Jacobs IJ, Widsch-wendter M, Age-dependent dna methylation of genes that are suppressed in stem cells is ahallmark of cancer, Genome Res 20:440�446, 2010.
45. Palmer C, Diehn M, Alizadeh AA, Brown PO, Cell-type speci¯c gene expression pro¯lesof leukocytes in human peripheral blood, BMC Genomics 7:115, 2006.
46. Zhuang J, Jones A, Lee SH, Ng E, Fiegl H, Zikan M, Cibula D, Sargent A, Salvesen HB,Jacobs IJ, Kitchener HC, Teschendor® AE, Widschwendter M, The dynamics andprognostic potential of dna methylation changes at stem cell gene loci in women's cancer,PLoS Genet 8:e1002517, 2012.
47. Fackler MJ, Umbricht CB, Williams D, Argani P, Cruz LA, Merino VF, Teo WW, ZhangZ, Huang P, Visvananthan K, Marks J, Ethier S, Gray JW, Wol® AC, Cope LM,Sukumar S, Genome-wide methylation analysis identi¯es genes speci¯c to breast cancerhormone receptor status and risk of recurrence, Cancer Res 71:6195�6207, 2011.
48. Bishop, CM, Pattern Recognition and Machine Learning, Springer, New York, 2006.49. Jordan MI, Learning in Graphical Models, MIT Press, Boston, 1999.50. Ma Z, Leijon A, Beta mixture models and the application to image classi¯cation, Proc Int
Con Image Processing, pp. 2045�2048, 2009.51. Satomi A, Murakami S, Ishida K, Mastuki M, Hashimoto T, Sonoda M, Signi¯cance
of increased neutrophils in patients with advanced colorectal cancer, Acta Oncol34(1):69�73, 1995.
52. Yamanaka T, Matsumoto S, Teramukai S, Ishiwata R, Nagai Y, Fukushima M, Thebaseline ratio of neutrophils to lymphocytes is associated with patient prognosis in ad-vanced gastric cancer, Oncology 73(3�4):215�220, 2007.
53. Rand WM, Objective criteria for the evaluation of clustering methods, J American StatAssoc 66(336):846�850, 1971.
54. Hubert L, Comparing partitions, J Classif 2:193�218, 1985.55. Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL, Model-based clustering and data
transformations for gene expression data, Bioinformatics 17:977�987, 2001.56. TCGA, Comprehensive genomic characterization de¯nes human glioblastoma genes and
core pathways, Nature 455:1061�1068, 2008.57. TCGA, Integrated genomic analyses of ovarian carcinoma, Nature 474:609�615, 2011.58. Gaujoux R, Seoighe C, A °exible r package for nonnegative matrix factorization, BMC
59. Wu CH, Tang SC, Wang PH, Lee H, Ko JL, Nickel-induced epithelial-mesenchymaltransition by reactive oxygen species generation and e-cadherin promoter hypermethy-lation, J Biol Chem 287:25292�25302, 2012.
60. Creighton CJ, Li X, Landis M, Dixon JM, Neumeister VM, Sjolund A, Rimm DL, WongH, Rodriguez A, Herschkowitz JI, Fan C, Zhang X, He X, Pavlick A, Gutierrez MC,Renshaw L, Larionov AA, Faratian D, Hilsenbeck SG, Perou CM, Lewis MT, Rosen JM,Chang JC, Residual breast cancers after conventional therapy display mesenchymal aswell as tumor-initiating features, Proc Natl Acad Sci USA 106:13820�13825, 2009.
61. Creighton CJ, Chang JC, Rosen JM, Epithelial-mesenchymal transition (emt) in tumor-initiating cells and its clinical implications in breast cancer, J Mammary Gland BiolNeoplasia 15:253�260, 2010.
62. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP,Gunderson KL, Fan JB, Shen R, High density dna methylation array with single cpg siteresolution, Genomics 98:288�295, 2011.
63. Girolami M, Calderhead B, Riemann manifold langevin and hamiltonian monte carlomethods, J Royal Stat Society: Series B (Statistical Methodology) 73(2):123�214, 2011.
64. Hensman J, Rattray M, Lawrence ND, Fast variational inference in the conjugate ex-ponential family, Arxiv preprint arXiv:1206.5162, 2012.
ZhanyuMa received his M.Eng. degree in Signal and Information
Processing from BUPT (Beijing University of Posts and Tele-
communications), China, and his Ph.D. degree in Electrical
Engineering from KTH (Royal Institute of Technology), Sweden,
in 2007 and 2011, respectively. Since 2012, he is a Postdoc re-
searcher in the School of Electrical Engineering, KTH, Sweden.
His research interests include statistical modeling and machine
learning related topics with a focus on applications in speech
processing, image processing, and bioinformatics.
Andrew E Teschendor® received his B.Sc. in Mathematical
Physics from Edinburgh University and his Ph.D. in Theoretical
Particle Physics from the University of Cambridge, UK. He now
leads the Statistical Cancer Genomics group at the UCL Cancer
Institute, University College London, UK. His research interests
includes statistical genomics and epigenomics with a focus on
applications to cancer, as well as network physics.
A Variational Bayes Beta Mixture Model for Feature Selection in DNA