Top Banner
Journal meeting
13

Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Jan 18, 2018

Download

Documents

Robyn Jennings

Supplementary Table 1: Studies included in the cellular composition analyses Dataset (Reference)NGEO IDPrimary OutcomeMedian Age [IQR] (yrs) Hannum et al. (2012)641GSE40279Age65.0 [ ] Liu et al. (2013)335GSE42861Arthritis55.0 [ ] Alisch et al. (2012)78GSE36064Age3.1 [ ] Harris et al. (2012)25GSE32148 Inflammatory Bowel Disease 17.0 [ ] Heyn et al. (2012)19GSE30870Age0 [newborns] Total1098 MATRIALS AND METHODS Overall design: Six healthy male blood donors, age 38 ± 13.6 years, were included in the study. From each individual, global DNA methylation levels were analyzed in whole blood, peripheral blood mononuclear cells (PBMC) and granulocytes as well as for seven isolated cell populations (CD4+ T cells, CD8+ T cells, CD56+ NK cells, CD19+ B cells, CD14+ monocytes, neutrophils, and eosinophils), n=60 samples analyzed in total. Cell–sorted data Dataset (Reference)NGEO IDPrimary OutcomeMedian Age (yrs) Reinius et al.(2012)60GSE35069Age38 ± 13.6 Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One 2012;7(7):e Validation set
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Journal meeting

Page 2: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

BACKGROUND

▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of particular interest because it is dynamic across the lifetime, affected by environmental insults, and previously implicated in developmental disorders and cancer.

▪ In these studies, DNAm levels are measured genome-wide at thousands to millions of sites in hundreds of individuals to identify loci where these levels are associated with quantitative traits or disease.

▪ Whole blood is a heterogeneous collection of different cell types, each with a very different DNA methylation profile. Observed whole blood DNAm profiles are therefore mixtures of the cell type profiles.

▪ In a seminal paper, Houseman et al. describe a statistical method that can accurately estimate relative proportions of cell type components in whole blood.

▪ Using practically the same statistical approach, Guintivano et al. describe a method for estimating neu-ron and non-neuron components in brain samples.

▪ However, currently there are no published statistical solutions to parsing age effects by cell type from observed whole blood DNAm measurements.

Page 3: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Supplementary Table 1: Studies included in the cellular composition analysesDataset (Reference) N GEO ID Primary Outcome Median Age [IQR] (yrs)

Hannum et al. (2012) 641 GSE40279 Age 65.0 [54.0- 75.0]Liu et al. (2013) 335 GSE42861 Arthritis 55.0 [46.0-61.5]

Alisch et al. (2012) 78 GSE36064 Age 3.1 [2.0 - 4.7]Harris et al. (2012) 25 GSE32148 Inflammatory Bowel Disease 17.0 [15.0-44.0]Heyn et al. (2012) 19 GSE30870 Age 0 [newborns]

Total 1098

MATRIALS AND METHODS

Overall design: Six healthy male blood donors, age 38 ± 13.6 years, were included in the study. From each individual, global DNA methy -lation levels were analyzed in whole blood, peripheral blood mononuclear cells (PBMC) and granulocytes as well as for seven isolated cell populations (CD4+ T cells, CD8+ T cells, CD56+ NK cells, CD19+ B cells, CD14+ monocytes, neutrophils, and eosinophils), n=60 samples analyzed in total.

Cell–sorted dataDataset (Reference) N GEO ID Primary Outcome Median Age (yrs)Reinius et al.(2012) 60 GSE35069 Age 38 ± 13.6

Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease sus-ceptibility. PLoS One 2012;7(7):e41361.

Validation set

Page 4: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Introduction

Age (blood samples)

DNA methyla-tion

Cell type difference

Page 5: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionDNAm profiles show large between cell type differences

▪ They downloaded Illumina HumanMethylation450 Bead- Chip (Illumina 450k) data from flow-sorted neutrophils (granulocytes), lymphocytes (CD8+ and CD4+ T cells, CD56+ natural killer cells and CD19+ B cells) and CD14+ monocytes from six adult male samples (mean age 38 ±13.6 years) as previously de-scribed and confirmed that sorted blood cell types have unique DNAm profiles.

▪ In fact 63.5% of the CpGs on the Illumina 450k array showed differences with P < 0.05 across these cell types.

Supplementary figure

Page 6: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionVarying cell composition may explain apparent age-associated differences

▪ They select subset of 600 cell-type specific CpGs (figure1) and then use these to estimate proportions in whole blood samples.

▪ The algorithm fits a nonlinear random effects model at each of these CpGs, estimating the coefficient for each cellular component, and then uses these coefficients to predict the relative proportion of each cellular component in peripheral blood samples.

Page 7: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionVarying cell composition may explain apparent age-associated differences

▪ Nonlinear random effects model?

예를 들어 , cell type 간의 methylation 차이에 대한 변이 (variability) 를 서술하고 그 원인을 연구하는 방법임 .

Nonlinear effects model 에서 cell type methylation parameter 를 평균과 분산으로 표시될 수 있는 집단의 분포상에 위치한다는 가정하에 개개인의 parameter 를 집단의 (typical value) 대표 값과 그로부터의 차이로 표현됨 .

이렇게 추정된 모델을 통해서 cell type parameter(coefficients) 들을 입력하여 simulation 을 수행하여 cell type composition 을 예측할 수 있음 .

Page 8: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionIn sorted samples, cell type explains a larger percentage of variability than age

▪ Given these results, for the purposes of our analysis, they assumed that, for the selected 600 CpGs, the cell type spe-cific DNAm profiles are the same for all ages.

▪ Although they know this assumption does not hold true for all CpGs, the results of this section suggest that it is rea-sonable for most CpGs, and our 600 CpG profile in particular.

Page 9: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionCell type proportions change with age following monotonic patterns

▪ They observed consistent age-related changes for the proportions of each cell type (Figure 2).

▪ These results are in line with previously published findings related to T cells, namely the involution of the thymus, where T cells in lymphocytes mature.

▪ This process begins very early in life and continues with age - the size of the thymus drops approximately 3% per year until the mid-60s, and is approximately 5% the size of the thymus in a newborn, suggesting that the number, and diversity, of T cells decreases with age.

▪ However, they also note these age cell count rela-tionships, although monotonic, were nonlinear with an inflection point around 40 years (Figure 2).

Dataset (Reference) Median Age [IQR] (yrs)Hannum et al. (2012) 65.0 [54.0- 75.0]

Liu et al. (2013) 55.0 [46.0-61.5]Alisch et al. (2012) 3.1 [2.0 - 4.7]Harris et al. (2012) 17.0 [15.0-44.0]Heyn et al. (2012) 0 [newborns]

Page 10: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionCellular composition correlates strongly with global DNAm profiles

▪ Given that blood cell types have very different DNAm pro-files and that cell type proportions change across age (Figure 2), they assessed if cell composition was a major source of variability in the five peripheral blood data sets.

▪ They computed the first two principal components of the epigenome-wide DNAm profiles across the five studies and compared them to the first principal component of the cell proportion estimates (Figure 3).

▪ The correlation between DNAm variance and composition variance was apparent within each study, often to a stronger degree.

▪ These observed correlations therefore empirically demon-strate that cell composition is a very large source of variabil-ity in DNAm data derived from peripheral blood.

Figure 3. Cellular composition is a major source of variability in DNAm datasets in whole blood. Principal components (PCs) (A) 1 and (B) 2 of the 456,655 DNAm probes (y-axis) and the first PC of the empirical cell counts (x-axis) are highly correlated. The first PC of the DNAm data explains 10.9% of the variance, and the second explains 9.3% of the variance.

Dataset (Reference) Median Age [IQR] (yrs)Hannum et al. (2012) 65.0 [54.0- 75.0]

Liu et al. (2013) 55.0 [46.0-61.5]Alisch et al. (2012) 3.1 [2.0 - 4.7]Harris et al. (2012) 17.0 [15.0-44.0]Heyn et al. (2012) 0 [newborns]

Page 11: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionPCA (Principle Component Analysis)

x-axis

y-axis

e1

e2

1. PCA(Principal Component Analysis)란 ?

PCA 는 분포된 데이터들의 주성분 (Principal Component) 를 찾아주는 방법이다 .

좀더 구체적으로 보면 아래 그림과 같이 2 차원 좌표평면에 n개의 점 데이터 (x1,y1), (x2,y2), ..., (xn,yn) 들이 타원형으로 분포되어 있을 때 ~

이 데이터들의 분포 특성을 2 개의 벡터로 가장 잘 설명할 수 있는 방법은 무엇일까 ? 그림에서와 같이 e1, e2 두 개의 벡터로 데이터 분포를 설명하는 것이다 . e1 의 방향과 크기 , 그리고 e2 의 방향과 크기를 알면 이 데이터 분포가 어떤 형태인지를 가장 단순하면서도 효과적으로 파악할 수 있음 .

PCA 는 데이터 하나 하나에 대한 성분을 분석하는 것이 아니라 , 여러 데이터들이 모여 하나의 분포를 이룰 때 이 분포의 주 성분을 분석해 주는 방법이다 .

여기서 주성분이라 함은 그 방향으로 데이터들의 분산이 가장 큰 방향벡터를 의미한다 . <figure 1> 에서 e1 방향을 따라 데이터들의 분산 ( 흩어진 정도 ) 이 가장 크다 . 그리고 e1 에 수직이면서 그 다음으로 데이터들의 분산이 가장 큰 방향은 e2이다 .

즉 , 주성분 분석 방법은 데이터 집합을 분석하는 방법으로서 데이터의 분산이 가장 커지는 축을 첫번째 주성분으로 하고 , 두 번째로 커지는 축을 두번째 주성분으로 하며 , 이와 같은 방법으로 새로운 좌표계로 데이터를 선형 변환함으로써 데이터 집합의 차원을 낮은 차원으로 변환하게 된다 .

Page 12: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Results and DiscussionConfounding between cell composition and age leads to false positives

▪ A simple linear regression model including the cellcomposition percentages as covariates has been sug-gested as a way to adjust for the confounding.

▪ They applied this method to the data from Hannum et al. and Alisch et al. and found that the adjusted estimates are, on average, closer to 0.

▪ However, at this level of confounding it is not clear that this naïve approach will in fact produce unbiased adjusted estimates (Figure 4A).

▪ They therefore tried two alternative approaches. First they applied the Remove Unwanted Variation (RUV) method, an analysis that estimates and adjusts for un-known surrogate variables as done by Leek and Storey.

▪ This resulted in much greater, but not complete, attenu-ation of the age association estimates (Figure 4B).

▪ Next they obtained age association estimates from fit -ting the model to data from sorted CD4+ T cells and granu-locytes.

▪ Note that in these data, cell composition is not a con-founder and they see minimal evidence of age association (Figure 4C,D).

Figure 4 Confounding between cellular composition and age at the CpG level. Comparisons between resulting t-statistics for age on DNA methylation levels in Han-num et al. using (A) naïve (for example, including cell composition estimates as covari-ates in regression models), (B) two-step Remove Unwanted Variation (RUV), (C) flow-sorted CD4+ T cells and (D) flow-sorted monocytes compared to the effect of age on DNAm in a univariate model. Here, analysis with RUV attenuates the association between DNAm and age. The solid lines indicate the resulting t-statistic cutoff for false discovery rate <5% - no probes were significant at this threshold in the cell sorted data. All panels contain probes present on both the Illumina 450k and 27k (n = 24,692) to facilitate comparisons to age associa-tions in the flow-sorted cellular populations.

Page 13: Journal meeting. BACKGROUND ▪ Epigenome-wide association studies (EWAS) of human disease are becoming increasingly common. DNA methylation (DNAm) is of.

Conclusion

► Whole blood has been one of the most widely used source tissues in EWAS. Here they demonstrate that, in these studies, cellular composition explains much of the observed variability in DNAm.

► Therefore, when the outcome of interest correlates with cell composition, as age does, failure to account for cel-lular heterogeneity may result in many false positives.

► They therefore recommend that users of the Illumina 450k array studying whole blood perform the cell composi-tion estimation (using, for example, the estimateCellCounts function they have added to the minfi Bioconductor package) and check for possible confounding. If confounding is present, they recommend the use of our table (avail-able in the FlowSorted.Blood.450k Bioconductor package) that summarizes cell-type variability for each CpG.