Top Banner
1 Supplementary Information Network inference analysis identifies SETDB1 as a key regulator for reverting colorectal cancer cells into differentiated normal-like cells Soobeom Lee, Chansu Lee, Chae Young Hwang, Dongsan Kim, Younghyun Han, Sung Noh Hong, Seok-Hyung Kim, and Kwang-Hyun Cho* *Correspondence to: [email protected]
34

MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

Nov 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

1

Supplementary Information

Network inference analysis identifies SETDB1 as a key regulator for

reverting colorectal cancer cells into differentiated normal-like cells

Soobeom Lee, Chansu Lee, Chae Young Hwang, Dongsan Kim, Younghyun Han,

Sung Noh Hong, Seok-Hyung Kim, and Kwang-Hyun Cho*

*Correspondence to: [email protected]

Page 2: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

2

This file includes:

Supplementary Materials and Methods

Supplementary References

Supplementary Figures and Figure Legends (S1-S8)

Supplementary Tables (S1-S5)

(Supplementary Table 2 is provided as a separate file in a spreadsheet format.)

Page 3: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

3

Supplementary Materials and Methods

Gene expression data processing

In order to avoid the error caused by the use of heterogeneous Affymetrix platforms,

we collected 17 independent gene expression data of normal colon mucosa performed by

Affymetrix GeneChip Human Genome U133 Plus 2.0 (GEO-integrated normal dataset, n =

318; GSE71571, GSE23194, GSE37364, GSE41328, GSE33114, GSE23878, GSE20916,

GSE15960, GSE9254, GSE4183, GSE8671, GSE11831, GSE32323, GSE9348, GSE4107,

GSE10714, GSE18105) from the Gene Expression Omnibus. The data were normalized by

robust multi-array average (RMA) method as implemented in the ‘affy’ package (1) and batch-

corrected with ComBat method as implemented in the ‘sva’ package (2) in R. Also, the 11

Gene expression data of tumor samples performed by Affymetrix GeneChip Human Genome

U133 Plus 2.0 (GEO-integrated cancer dataset, n = 2,166; GSE33113, GSE13067, GSE13294,

GSE14333, GSE17536, GSE20916, GSE2109, GSE23878, GSE35896, GSE37892,

GSE39582) with KFSYSCC dataset (3) were downloaded and processed by same methods. In

addition, we used the paired data of 98 adjacent normal colon mucosa samples and their tumor

tissue samples performed by Affymetrix Human Genome U219 Array (GEO-paired dataset, n

= 196; GSE44076). For The Cancer Genome Atlas (TCGA) data, we used the transcriptome

data by the format ‘HTSeq- Fragments Per Kilobase of transcript per Million mapped reads

upper quartile (FPKM-UQ)’ downloaded from GDC Data Portal using ‘TCGAbiolinks’

package (4) (TCGA dataset, n = 698). A Single-cell RNA-seq dataset was downloaded from

the Gene Expression Omnibus (Single-cell RNA-seq dataset, n = 432; GSE81861) and we

selectively used the epithelial samples annotated by the previous study (5). In brief, the data

was performed by the SMARTer Ultra-Low RNA Kit for Illumina (Clontech, 634948) and

sequenced using HiSeq 2000 platform (Illumina). Among the 1,591 cells from tumors and

Page 4: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

4

matched normal mucosa, they retained the 626 cells after quality control. To further filter

detectably expressed genes, we removed the genes that were measured to less than one tenth

of the total samples. The details about the datasets used in this study are given in Table S1.

Gene sets

Tissue-specific gene sets were obtained from Human Protein Atlas (6) (HPA),

Genotype-Tissue Expression project (7) (GTEx), and Functional Annotation of the Mammalian

Genome (8) (FANTOM) project. From HPA, we used the tissue specific gene lists which the

study provides with the labels of “elevated genes” for each tissue. From GTEx, we selected the

tissue-specific gene sets that have at least five-fold higher gene expression in a particular tissue

as compared to all other tissues. Likewise, we defined the tissue-specific gene sets based on

the same rule from the gene expression data of FANTOM5. The stem/progenitor associated

gene sets were individually collected from the previous studies (5,9-13). The detail gene lists

used in this study are given in Table S2.

Page 5: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

5

Supplementary References

1. Gautier L, Cope L, Bolstad BM, Irizarry RA. affy--analysis of Affymetrix GeneChip

data at the probe level. Bioinformatics 2004;20(3):307-15 doi 10.1093/bioinformatics/btg405.

2. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012;28(6):882-3 doi 10.1093/bioinformatics/bts034.

3. Guinney J, Dienstmann R, Wang X, de Reynies A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nat Med 2015;21(11):1350-6 doi 10.1038/nm.3967.

4. Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 2016;44(8):e71 doi 10.1093/nar/gkv1507.

5. Park NI, Guilhamon P, Desai K, McAdam RF, Langille E, O'Connor M, et al. ASCL1 Reorganizes Chromatin to Direct Neuronal Fate and Suppress Tumorigenicity of Glioblastoma Stem Cells. Cell Stem Cell 2017;21(3):411 doi 10.1016/j.stem.2017.08.008.

6. Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Proteomics. Tissue-based map of the human proteome. Science 2015;347(6220):1260419 doi 10.1126/science.1260419.

7. Consortium GT. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 2015;348(6235):648-60 doi 10.1126/science.1262110.

8. Consortium F, the RP, Clst, Forrest AR, Kawaji H, Rehli M, et al. A promoter-level mammalian expression atlas. Nature 2014;507(7493):462-70 doi 10.1038/nature13182.

9. Wong DJ, Liu H, Ridky TW, Cassarino D, Segal E, Chang HY. Module map of stem cell genes guides creation of epithelial cancer stem cells. Cell Stem Cell 2008;2(4):333-44 doi 10.1016/j.stem.2008.02.009.

10. Loh KM, Ang LT, Zhang J, Kumar V, Ang J, Auyeong JQ, et al. Efficient endoderm induction from human pluripotent stem cells by logically directing signals controlling lineage bifurcations. Cell Stem Cell 2014;14(2):237-52 doi 10.1016/j.stem.2013.12.007.

11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 2010;143(2):313-24 doi 10.1016/j.cell.2010.09.010.

12. Ben-Porath I, Thomson MW, Carey VJ, Ge R, Bell GW, Regev A, et al. An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet 2008;40(5):499-507 doi 10.1038/ng.127.

13. Xie R, Everett LJ, Lim HW, Patel NA, Schug J, Kroon E, et al. Dynamic chromatin remodeling mediated by polycomb proteins orchestrates pancreatic differentiation of human embryonic stem cells. Cell Stem Cell 2013;12(2):224-37 doi 10.1016/j.stem.2012.11.023.

14. Li H, Courtois ET, Sengupta D, Tan Y, Chen KH, Goh JJL, et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nature Genetics 2017;49(5):708-18 doi 10.1038/ng.3818.

Page 6: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

6

Supplementary Figure legends

Page 7: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

7

Supplementary Figure 1. Loss of colon-specific gene expression in colorectal cancer cells

and its clinical relevance.

A, Violin plots showing GSVA scores (bottom axis) of adult tissue associated gene sets across

datasets. –log10(P value) (top axis) is overlaid by red dots. B, Kaplan-Meier analysis of overall

survival (n = 623, P = 0.011) and disease-free survival (n = 545, P = 0.0011). GSVA score of

colon-specific gene set was dichotomized into high and low using the median as a cutoff. C,

Box plots showing the GSVA score of colon-specific gene set based on TNM staging system

assessed by TCGA. The extent of the tumor (left), the extent of spread to the lymph nodes

(middle), and the presence of distant metastasis (right). Colon-specific gene set from Human

Protein Atlas were used for above analysis. (*P < 0.05, **P < 0.01, ***P < 0.001)

Page 8: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

8

Page 9: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

9

Supplementary Figure 2. Acquisition of stem/progenitor gene expression program in

colorectal cancer cells.

A, Heatmap colored based on overlapping score among stem/progenitor associated gene sets.

B, Violin plots showing GSVA scores (bottom axis) of stem/progenitor associated gene sets

across datasets. –log10(P value) is overlaid by red dots (top axis). GEO-integrated dataset (top-

right), GEO-paired dataset (bottom-left), TCGA dataset (bottom-middle), and Single-cell

RNA-seq (bottom-right). C, t-SNE visualization plots colored by GSVA score of colon-tissue

specific gene set (left column), intestinal stem cell-specific gene set (middle column), and

embryonic stem cell-specific gene set (right column). Each dot indicates a sample; dashed

circles indicate samples that show unidentically increased patterns of GSVA score of intestinal

stem cell and the score of embryonic stem cell. GSVA score of embryonic stem cells, but not

that of intestinal stem cells, was particularly increased in those samples.

Page 10: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

10

Page 11: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

11

Supplementary Figure 3. Differentially expressed gene sets analysis and pseudotime

analysis of single-cell RNA sequencing data.

A, Heatmap of Spearman correlation coefficient values for 74 gene sets showing differential

expression between normal colon epithelial cells and colorectal cancer epithelial cells. The

differentially expressed gene sets were determined by adjust P value (FDR) under 1e-5. B, t-

SNE visualization plots of single cell RNA-seq data. First plot shows disease states (top) and

three plots on the bottom side are colored by GSVA score of colon-tissue specific gene set

(second row), intestinal stem cell-specific gene set (third row), and embryonic stem cell-

specific gene set (fourth row), respectively. C, Scatter plot colored by cell types based on the

previous study (14). The previous study classified the normal cells into enterocytes, goblet cells,

stem/TA (stem/transit-amplifying) cells, non-stem cells, and unknown cells and classified the

cancer cells into stem/TA cells, enterocyte-like cells, goblet-like cells and unknown cells. We

described the subtypes of cancer cells according to annotation of normal cells; ‘enterocyte-like’

to ‘enterocyte’, and ‘goblet-like’ to ‘goblet’. Shape indicates if the cell is a normal cell (star)

or a cancer cell (circle). Color indicates differentiation state. D, Dimensional reduction using

SLICER algorithm or E, using Monocle. Points are colored based on PC1 of each state-specific

gene set: Colon (left), intestinal stem cell (middle), embryonic stem cell (right). The

smoothened profiles of the PC1 for the state-specific gene sets over pseudotime are shown at

the right side. F, Transcription entropy measured using stemID algorithm is plotted along with

pseudotime. G, Z-score-normalized imputed gene expression level of various stem cell markers

along with pseudotime inferred by SLICER (top) and Monocle (bottom).

Page 12: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

12

Page 13: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

13

Supplementary Figure 4. Identification of core transcription factors of normal colon

epithelial cells and their reduced gene expression level and transcriptional activity in

colorectal cancer cells.

A, Summary of inferred gene regulatory networks. B, The number of genes in colon-associated

gene sets. The union of gene sets was used for MARINA analysis. C, Violin plot showing

GSVA scores of colon-associated gene sets used in MARINA analysis. D, Venn-diagram of

differentially expressed genes (DEGs) that are highly expressed in normal colon mucosa

compared to CRC tissues. DEGs were defined in GEO-paired dataset, GEO-integrated dataset,

and TCGA dataset, respectively. Among the DEGs, the common genes that are highly

expressed in all three datasets of normal colon mucosa were used in MARINA analysis. We

called these genes normal-specific genes. E-H, Bar plots showing –log10(P value) of putative

master regulators. We identified potential TFs with P value of less than 1e-10. For GEO-paired

normal GRN and GEO-integrated normal GRN, the five TFs significantly included normal-

specific genes and colon-specific genes as their target genes. These five TFs were defined as

the core TFs, and are colored in red. E, Identification of regulators of normal-specific genes in

GEO-paired normal GRN, F, regulators of colon-specific genes in GEO-paired normal GRN,

G, regulators of normal-specific genes in GEO-integrated normal GRN, H, regulators of colon-

specific genes in GEO-integrated normal GRN. I-K, Gene expression level of the core TFs,

transcriptional activity inferred by VIPER algorithm, and average expression level of target

genes of the core TFs I, in GEO-integrated dataset, J, in TCGA dataset, K, GEO-paired dataset.

Page 14: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

14

Supplementary Figure 5. SETDB1 as a negative modulator that hinders transcriptional

activity of the core TFs and its clinical relevance in CRC patients.

A, GO-term enrichment among the genes that showed negative correlation with colon-

associated gene expression. The number of the genes that belong to each GO-term is labeled.

B, Interaction score (left axis, yellow color) with the core TFs based on STRING database and

conditional mutual information (CMI) score (right axis, blue color) are shown. STRING score

Page 15: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

15

was calculated based on the combined score of STRING database. CMI score represents the

number of target genes whose regulatory interaction with the core TFs were altered relying on

the expression level of a potential modulator. For the 15 potential modulators, the target genes

that lost their interaction with their core TFs in CRC samples with high gene expression of the

potential modulator were counted. The target genes of the core TFs were inferred from GEO-

integrated normal GRN, and 300 samples of CRC with the highest gene expression of potential

modulators and 300 samples of CRC with the lowest gene expression level of potential

modulators were used. C, Heatmap showing the gene expression level of the core TFs in four

stem-like CRC cell lines. The data was produced from microarray analysis of four cell lines.

(n = 1) D, qRT-PCR analysis to confirm the knockdown efficiency upon siRNA transfection

in Caco2 cells. E, Imputed gene expression level of SETDB1 compared by normal stem/TA

cells and cancer stem/TA-like cells based on annotation of previous study(12). F, Imputed gene

expression level of SETDB1 in single-cell RNA-seq. G-I, Correlation between imputed

SETDB1 expression and PC1 value of each state-specific gene set in single-cell RNA seq data.

G, Colon, H, intestinal stem cell, I, embryonic stem cell. J, Kaplan-Meier analysis of overall

survival (n = 623, P = 0.48) and disease-free-survival (n = 545, P = 0.00074) with SETDB1

expression. SETDB1 expression level was dichotomized into high and low using the third

quantile as a cutoff. K, Kaplan-Meier analysis of overall survival (n = 623, P = 0.064) and

disease-free survival (n = 545, P = 7.8e-05). GSVA score of colon-specific gene set was

dichotomized into high and low using the median as a cutoff and further divided by SETDB1

expression level using its third quantile.

Page 16: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

16

Supplementary Figure 6. Gene regulation of in vitro stem-like colorectal cancer cells and

immunofluorescence confocal imaging anaylsis.

Page 17: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

17

A, SETDB1, KRT20, and MYC mRNA levels were measured by qRT-PCR in Caco2 cells stably

expressing three different shRNA sequences targeting SETDB1. Further analyses were

performed with shRNA#1. B, mRNA expression level of core TFs were compared in three cell

lines. The relative abundance of mRNA is presented with arbitrary units normalized by β-actin

mRNA expression. C, Overexpression of the core TFs was assessed by qRT-PCR analysis

(arbitrary units). All qRT-PCR analyses were conducted in triplicate and presented as

mean ± s.e.m. D, qRT-PCR analysis of gene expression in sorted cells. Relative fold change

was calculated over a mock-treated group of each cell line (M; Mock, 5F; five core TF

overexpression). (n = 3, mean ± s.e.m.) (*P < 0.05, **P < 0.01, ***P < 0.001). E,

Immunofluorescence of KRT20 (red) and Ki-67 (green) with DAPI nuclear counterstain (blue).

Scale bar, 20 μm. Caco2 (top), HCT116 (middle), and SW480 (bottom)

Page 18: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

18

Supplementary Figure 7. Restoration of tissue-specific gene expression profiles and

transcriptional activity of the core TFs in differentiated normal-like cells.

Page 19: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

19

A, Single-cell RNA-seq of Caco2 cell lines stably expressing shSETDB1 and shScramble was

reconstructed using SLICER algorithm. Dimensional reduction was performed by LLE. B,

Scatter plots colored by PC1 of each state-specific gene set. C-D, The average expression level

of inferred target genes from C, GEO-integrated dataset, and D, GEO-paired dataset. E-F, Z-

score-normalized average gene expression of target genes of the core TFs in single-cell RNA-

seq along with pseudotime. The target genes of the core TFs are inferred from E, GEO-

integrated normal GRN and F, GEO-paired normal GRN. G-H, Average expression of inferred

target genes from G, GEO-integrated normal GRN and H, GEO-paired normal GRN, in bulk

RNA sequencing data.

Page 20: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

20

Supplementary Figure 8. Bulk RNA sequencing data analysis and drug responsiveness of

SETDB1-downregulated cells.

Page 21: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

21

A, Heatmap showing colon-specific genes used in GSEA. B, ESC-specific genes used in GSEA.

C, GSVA scores of intestinal differentiation-related gene signatures, intestinal stem cell-related

gene signatures, and Wnt signaling-related gene signatures in Caco2, HCT116, and SW480

cells. D, Gene expression of epithelial cell markers, E-cadherin, ZO-1, and ZO-2 in Caco2,

HCT116, and SW480 cells. P-values were determined by unpaired two-tailed Student’s t-test

E, Percent growth over time in Caco2 after SETDB1 downregulation with cytotoxic drugs, 5-

FU and Oxaliplatin. F, Percent of growth inhibition after 5 days of cytotoxic drugs treatment

of Caco2 cells.

Page 22: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

22

Tables and their legends Supplementary Table 1. Used public datasets of normal colon mucosa and colorectal cancer tissues for network analysis.

Dataset GSE ID Platform Number of samples

Sample type Reference

GEO-integrated dataset

GSE71571 [HG-U133_Plus_2] 44 Normal colon mucosa

Thomas SS, Makar KW, Li L, Zheng Y, Yang P, Levy L, Rudolph RY, Lampe PD, Yan M, Markowitz SD, et al. (2015) Gene Expression Omnibus GSE71571 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE71571)

GSE23194 [HG-U133_Plus_2] 11 Normal colon mucosa

Olivero M, Galatola G, Laudi C, Crispi S, Cantarella D, Morino M, Calogero RA, Direnzo MF (2010) Gene Expression Omnibus GSE23194(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23194)

GSE37364 [HG-U133_Plus_2] 38 Normal colon mucosa

Valcz G, Patai AV, Kalmár A, Péterfia B et al. (2014) Gene Expression Omnibus GSE37364 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE37364)

GSE41328 [HG-U133_Plus_2] 10 Normal colon mucosa

Lin G, He X, Ji H, Shi L et al. (2006) Gene Expression Omnibus GSE41328 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE41328)

GSE33114 [HG-U133_Plus_2] 6 Normal colon mucosa

de Sousa E Melo F, Colak S, Buikhuisen J, Koster J et al. (2011) Gene Expression Omnibus GSE33114 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE33114)

GSE23878 [HG-U133_Plus_2] 24 Normal colon mucosa

Uddin S, Ahmed M, Hussain A, Abubaker J et al. (2011) Gene Expression Omnibus GSE23878 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE23878)

GSE20916 [HG-U133_Plus_2] 44 Normal colon mucosa

Skrzypczak M, Goryca K, Rubel T, Paziewska A et al. (2010) Gene Expression Omnibus GSE20916 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE20916)

GSE15960 [HG-U133_Plus_2] 6 Normal colon mucosa

Galamb O, Spisák S, Sipos F, Tóth K et al. (2010) Gene Expression Omnibus GSE15960 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE15960)

GSE9254 [HG-U133_Plus_2] 19 Normal colon mucosa

LaPointe LC, Dunne R, Brown GS, Worthley DL et al. (2008) Gene Expression Omnibus GSE9254 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE9254)

GSE4183 [HG-U133_Plus_2] 8 Normal colon mucosa

Galamb O, Györffy B, Sipos F, Spisák S et al. (2008) Gene Expression Omnibus GSE4183 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE4183)

GSE8671 [HG-U133_Plus_2] 32 Normal colon mucosa

Sabates-Bellver J, Van der Flier LG, de Palo M, Cattaneo E et al. (2007) Gene Expression Omnibus GSE8671 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE8671)

Page 23: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

23

GSE11831 [HG-U133_Plus_2] 17 Normal colon mucosa

Csillag C, Nielsen OH, Borup R, Nielsen FC et al. (2007) Gene Expression Omnibus GSE11831 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE11831)

GSE32323 [HG-U133_Plus_2] 17 Normal colon mucosa

Khamas A, Ishikawa T, Shimokawa K, Mogushi K et al.(2012) Gene Expression Omnibus GSE32323 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE32323)

GSE9348 [HG-U133_Plus_2] 12 Normal colon mucosa

Hong Y, Downey T, Eu KW, Koh PK et al. (2010) Gene Expression Omnibus GSE9348 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE9348)

GSE4107 [HG-U133_Plus_2] 10 Normal colon mucosa

Hong Y, Ho KS, Eu KW, Cheah PY. (2007) Gene Expression Omnibus GSE4107 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE4107)

GSE10714 [HG-U133_Plus_2] 3 Normal colon mucosa

Galamb O, Sipos F, Solymosi N, Spisák S et al. (2008) Gene Expression Omnibus GSE10714 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE10714)

GSE18105 [HG-U133_Plus_2] 17 Normal colon mucosa

Matsuyama T, Ishikawa T, Mogushi K, Yoshida T et al. (2010) Gene Expression Omnibus GSE18105 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE18105)

GSE33113 [HG-U133_Plus_2] 90 Colorectal cancer

de Sousa E Melo F, Colak S, Buikhuisen J, Koster J et al. (2010) Gene Expression Omnibus GSE33113 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE33113)

GSE13067 [HG-U133_Plus_2] 74 Colorectal cancer

Jorissen RN, Lipton L, Gibbs P, Chapman M et al. (2008) Gene Expression Omnibus GSE13067 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE13067)

GSE13294 [HG-U133_Plus_2] 155 Colorectal cancer

Jorissen RN, Lipton L, Gibbs P, Chapman M et al. (2008) Gene Expression Omnibus GSE13294 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE13294)

GSE14333 [HG-U133_Plus_2] 157 Colorectal cancer

Jorissen RN, Gibbs P, Christie M, Prakash S et al. (2009) Gene Expression Omnibus GSE14333 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE14333)

GSE17536 [HG-U133_Plus_2] 177 Colorectal cancer

Smith JJ, Deane NG, Wu F, Merchant NB et al. (2010) Gene Expression Omnibus GSE17536 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE17536)

GSE20916 [HG-U133_Plus_2] 101 Colorectal cancer

Skrzypczak M, Goryca K, Rubel T, Paziewska A et al. (2010) Gene Expression Omnibus GSE20916 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE20916)

GSE2109 [HG-U133_Plus_2] 293 Colorectal cancer

(2005) Gene Expression Omnibus GSE2109 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse2109

Page 24: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

24

GSE35896 [HG-U133_Plus_2] 62 Colorectal cancer

Schlicker A, Beran G, Chresta CM, McWalter G et al. (2012) Gene Expression Omnibus GSE35896 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE35896

GSE37892 [HG-U133_Plus_2] 130 Colorectal cancer

Laibe S, Lagarde A, Ferrari A, Monges G et al. (2012) Gene Expression Omnibus GSE37892 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE37892)

GSE39582 [HG-U133_Plus_2] 585 Colorectal cancer

Marisa L, de Reyniès A, Duval A, Selves J et al. (2013) Gene Expression Omnibus GSE39582 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE39582)

KFSYSCC [HG-U133_Plus_2] 307 Colorectal cancer

GEO-paired dataset

GSE44076 [HG-U219] 196 (98,98) Both normal and cancer

Sanz-Pamplona R, Berenguer A, Cordero D, Molleví DG et al. (2014) Gene Expression Omnibus GSE44076 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE44076)

Single cell RNA-seq dataset

GSE81861 Illumina HiSeq 2000

432 (160 normal, 272 cancer)

Both normal and cancer

Li H, Courtois ET, Sengupta D, Tan Y et al. (2017) Gene Expression Omnibus GSE81861 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE81861)

TCGA dataset

- 698 (51 normal ,647 cancer)

Both normal and cancer

Page 25: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

25

Supplementary Table 2. State-specific gene sets used for gene set enrichment analysis. Supplementary Table 2 is provided as a separate file in a spreadsheet format.

Page 26: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

26

Supplementary Table 3. Statistics for the differences of gene expression level and transcriptional activity of the core transcription factors between normal cells and cancer cells.

GEO-integrated dataset Gene expression level of the core transcription factors

CDX2 ELF3 HNF4G PPARG VDR statistic 10.69473218 -0.27991229 18.9951185 15.7165582 35.70163876 dm 0.536065923 -0.01096842 0.949724098 0.844554083 1.296283306 p.value 3.96E-26 0.77956812 3.15E-75 3.90E-53 1.17E-225 Transcriptional activity of the core transcription factors (inferred based on GRN-integrated normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 41.72789397 35.73218089 40.40599845 56.91114407 39.2084793 dm 1.556328825 1.335448951 1.564461287 1.68932893 1.479790124 p.value 8.36E-289 5.66E-226 9.28E-275 0 4.08E-262 Transcriptional activity of the core transcription factors (inferred based on GRN-paired normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 46.39576685 46.62009775 45.79826922 62.61198087 49.68043703 dm 1.5641431 1.561384625 1.58824364 1.599375646 1.615947558 p.value 0 0 0 0 0 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-integrated normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 38.30887338 28.94046032 34.93541657 46.41666984 35.78563691 dm 0.832416882 0.590634435 0.658375335 0.873905633 0.672654616 p.value 1.12E-252 6.20E-159 7.84E-218 0 1.60E-226 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-paired normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 35.09524829 32.77176544 36.92188928 51.67591474 40.47372016 dm 1.095832079 1.093964678 0.834339496 0.747114108 1.208036378 p.value 1.84E-219 4.69E-196 3.09E-238 0 1.78E-275

GEO-paired dataset Gene expression level of the core transcription factors

Page 27: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

27

CDX2 ELF3 HNF4G PPARG VDR

statistic 2.20562556 1.02542367 6.863472471 8.208834982 10.59242257 dm 0.271255102 0.128129592 0.749781633 0.860363265 1.001171429 p.value 0.028584775 0.306440214 8.83E-11 3.06E-14 5.57E-21 Transcriptional activity of the core transcription factors (inferred based on GRN-integrated normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 14.7116911 12.88483083 14.12615278 20.13903245 13.47782902 dm 1.562898255 1.342930046 1.513515714 2.145258496 1.400957751 p.value 2.12E-33 7.52E-28 1.27E-31 2.03E-49 1.19E-29 Transcriptional activity of the core transcription factors (inferred based on GRN-paired normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 15.72344139 14.78518976 15.57701892 15.31153259 17.67934757 dm 1.488905195 1.435685995 1.539381305 1.603325312 1.693519582 p.value 1.85E-36 1.27E-33 5.10E-36 3.23E-35 2.68E-42 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-integrated normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 14.69408576 13.91245304 16.4467919 23.23589747 18.91836501 dm 1.193446551 1.151831565 0.854344098 1.115462689 1.448346815 p.value 9.02E-43 5.31E-39 1.36E-51 7.52E-89 1.02E-64 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-paired normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 13.55443476 10.69792629 14.81879947 22.85192316 13.30928806 dm 1.103688083 0.704405943 0.821973535 1.159182815 0.783557499 p.value 2.59E-37 7.83E-25 2.21E-43 1.15E-86 3.60E-36

TCGA dataset Gene expression level of the core transcription factors

CDX2 ELF3 HNF4G PPARG VDR statistic 1.850584092 1.61995657 7.51564191 4.886970402 11.07323336 dm 0.326161729 0.152009336 0.841597477 0.599766811 1.024741445 p.value 0.064652966 0.105694577 1.75E-13 1.27E-06 2.32E-26

Page 28: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

28

Transcriptional activity of the core transcription factors (inferred based on GRN-integrated normal GRN CDX2 ELF3 HNF4G PPARG VDR

statistic 10.42687764 8.512523836 11.47089852 14.20163248 10.92893464 dm 0.710771495 0.57664583 0.802963512 0.911107815 0.637488983 p.value 9.44E-24 1.04E-16 5.09E-28 2.20E-40 9.06E-26 Transcriptional activity of the core transcription factors (inferred based on GRN-paired normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 13.25263534 12.20855149 16.74341816 16.14569644 15.41648611 dm 0.808598482 0.788867198 0.945253546 0.797628655 0.912970501 p.value 6.58E-36 3.37E-31 3.95E-53 4.81E-50 2.38E-46 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-integrated normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 16.67919501 14.13717554 14.60667781 21.99732743 13.801814 dm 1.049325423 0.776367367 0.611777134 1.041892253 0.765029478 p.value 2.50E-39 1.18E-31 4.42E-33 1.33E-54 1.23E-30 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-paired normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 13.6229712 12.68273866 12.90965964 18.15256124 16.75612911 dm 1.207318893 1.189132501 0.803333128 0.955618561 1.484688643 p.value 4.31E-30 3.08E-27 6.32E-28 1.09E-43 1.47E-39

Single cell RNA-seq dataset Gene expression level of the core transcription factors

CDX2 ELF3 HNF4G PPARG VDR statistic 26.82099459 6.558394416 3.99202638 15.35351893 25.01643239 dm 0.624101918 0.648971563 0.027649821 0.40625563 0.471720276 p.value 7.64E-94 1.56E-10 7.70E-05 9.89E-43 6.63E-86 Transcriptional activity of the core transcription factors (inferred based on GRN-integrated normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 17.15267181 16.80751284 16.32338629 15.88131642 15.61006688 dm 1.107044628 1.1128358 1.085570903 1.01851108 1.020206844 p.value 1.27E-50 4.29E-49 5.83E-47 5.03E-45 7.64E-44

Page 29: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

29

Transcriptional activity of the core transcription factors (inferred based on GRN-paired normal GRN

CDX2 ELF3 HNF4G PPARG VDR statistic 19.73716781 16.35767972 19.47534317 14.917533 16.95096755 dm 1.264901405 1.068227106 1.263164344 0.933674995 1.094555468 p.value 3.32E-62 4.12E-47 5.03E-61 7.46E-41 9.94E-50 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-integrated normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 21.403009 20.78112317 21.38801285 23.54853846 20.10508557 dm 0.444017111 0.585597828 0.185252537 0.419437979 0.381470771 p.value 1.01E-69 6.48E-67 1.18E-69 2.29E-79 7.28E-64 Average gene expression of target genes of the core transcription factors (target genes inferred based on GRN-paired normal GRN)

CDX2 ELF3 HNF4G PPARG VDR statistic 23.85053526 21.28693763 24.15705599 20.66630722 21.37577609 dm 0.668953341 0.790239087 0.349984256 0.221123007 0.817355903 p.value 1.02E-80 3.38E-69 4.38E-82 2.14E-66 1.34E-69

Page 30: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

30

Supplementary Table 4. Protein activity analysis using VIPER. NES log10(p-value) p-value

WEE1 -3.549173254 3.412914859 0.000386443

GPX1 -3.529568401 3.380658178 0.000416238

SMYD3 -3.496855425 3.32718486 0.000470777

UBA2 -3.406411922 3.181623746 0.000658228

PPP1R14B -3.384150606 3.146308922 0.000713988

AJUBA -3.360652919 3.109252278 0.000777585

COPS7B -3.346616469 3.087223833 0.000818043

SPRY4 -3.340551877 3.077731084 0.000836121

NTMT1 -3.327013997 3.056594707 0.00087782

SLC39A10 -3.324082464 3.052027614 0.0008871

CDK4 -3.319172692 3.044386435 0.000902846

MIF -3.299914018 3.014508613 0.000967145

ZC3HC1 -3.272621706 2.972426221 0.00106555

CTU2 -3.269790845 2.968078639 0.00107627

TTLL4 -3.263550441 2.958506267 0.001100256

CSNK1G2 -3.262768669 2.957308197 0.001103295

TESC -3.225970608 2.901196258 0.001255462

PSMC2 -3.213261916 2.881945186 0.001312366

NUP62 -3.178945011 2.830289926 0.001478121

UBE2I -3.176390627 2.826464082 0.0014912

CDC25B -3.171750382 2.819520905 0.001515232

CBFB -3.168591755 2.814799665 0.001531794

RYK -3.167660269 2.813408135 0.00153671

TRIM28 -3.164372195 2.808498953 0.001554179

HDAC2 -3.125509695 2.750808447 0.001774972

PTPN12 -3.12046471 2.743364186 0.001805659

CETN2 -3.09995667 2.71320917 0.00193549

CCND1 -3.096410869 2.708012698 0.001958787

B3GALT6 -3.094509079 2.705227669 0.001971389

HSP90AB1 -3.092394554 2.702132824 0.001985488

NEK2 -3.088926825 2.697061333 0.002008809

CBX4 -3.083839599 2.689630166 0.002043477

MEAF6 -3.073105391 2.673984535 0.002118437

LIG1 -3.061932706 2.657749321 0.002199129

PHF19 -3.050742301 2.64153897 0.002282764

NAA10 -3.049310578 2.639468638 0.002293672

RIPK2 -3.036815442 2.621435325 0.002390918

BARD1 -3.031084 2.613184665 0.002436774

RIOK1 -3.023942285 2.602922432 0.00249504

Page 31: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

31

BAZ1B -3.019047477 2.595900775 0.002535708

SKP2 -3.012528704 2.586564547 0.002590809

MLLT1 -3.01123508 2.584713851 0.002601873

DPY30 -3.010932476 2.584281035 0.002604468

PPIL1 -2.979939312 2.540147079 0.002883055

CDK2 -2.957723001 2.508749716 0.003099205

SOX4 -2.951944718 2.500616123 0.003157795

TP53BP1 -2.951433627 2.499897352 0.003163025

MET -2.951262412 2.499656587 0.003164779

YEATS2 -2.940995485 2.485240726 0.003271593

PREB -2.939821908 2.483595604 0.003284009

MED1 -2.927324054 2.466110494 0.003418924

CAD -2.925303682 2.463289801 0.003441202

CKS2 -2.915733925 2.449951541 0.00354853

RNF220 -2.915366513 2.44944018 0.003552711

TRIM32 -2.914844582 2.448713853 0.003558657

PSMB4 -2.910008741 2.441989443 0.003614186

SETDB1 -2.908148943 2.439405832 0.003635751

ILF3 -2.906677651 2.437362917 0.003652894

KDM2B -2.90550857 2.435740246 0.003666568

PPIL3 -2.902915395 2.432142915 0.003697065

NEK6 -2.898683319 2.426277859 0.003747332

PTPN11 -2.896536526 2.423305464 0.003773067

DCAF13 -2.866747047 2.382250949 0.004147143

UBE2T -2.863994598 2.378475647 0.004183351

KDM1A -2.861129166 2.374548608 0.00422135

RPAP2 -2.859811426 2.37274377 0.00423893

FKBP14 -2.855745765 2.367179645 0.004293588

SOCS7 -2.851646271 2.361575938 0.004349347

CDK6 -2.84732014 2.355669752 0.0044089

DUSP10 -2.84439913 2.351686135 0.004449527

CLK2 -2.818722963 2.316816819 0.004821511

NMT1 -2.80985171 2.304830751 0.004956433

NAA20 -2.802728833 2.295229774 0.005067225

UBE3D -2.792322055 2.281238957 0.005233124

MRE11A -2.789695691 2.277714945 0.00527576

PRMT1 -2.775125945 2.258215673 0.005518033

GLMN -2.760101668 2.238197088 0.005778338

SMYD2 -2.754726887 2.231057582 0.005874115

IRAK1 -2.726853737 2.194218028 0.006394138

PRKRA -2.723022064 2.189178038 0.006468774

Page 32: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

32

MARCH9 -2.719483733 2.184529105 0.006538391

DSTYK -2.707772829 2.16917808 0.006773637

EHMT2 -2.700965147 2.160279524 0.006913858

ADCY3 -2.687279218 2.142446147 0.007203671

ITGA2 -2.685822757 2.140552712 0.007235146

PRDX1 -2.668693518 2.118347781 0.00761469

CCNB1IP1 -2.668563486 2.118179666 0.007617638

FGFRL1 -2.658955696 2.105776624 0.007838327

WDR75 -2.655533109 2.10136717 0.007918316

EIF2A -2.638851789 2.079942759 0.008318734

C1QTNF1 -2.619141017 2.0547703 0.00881515

FUT1 -2.612330994 2.046109206 0.008992714

UGGT2 -2.60748243 2.039953966 0.009121075

ENY2 -2.606052504 2.038140464 0.009159242

IQGAP3 -2.598206665 2.02820445 0.009371207

AASDHPPT -2.590843001 2.018901321 0.009574116

UHRF1 -2.586514431 2.013442733 0.009695211

USP36 -2.582924337 2.008921053 0.009796681

KAT2A -2.582495456 2.008381225 0.009808865

ACP1 -2.57914256 2.004163488 0.00990459

The results of multi-sample VIPER (msVIPER) showed those with p-value less than 0.01. NES; normalized enrichment score

Page 33: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

33

Supplementary Table 5. Primer sequences for the quantitative PCR and target sequences of small interference RNA (siRNA) and short hairpin RNA (shRNA). Primer sequences for the RT-PCR

Name Forward primer (5’->3’) Reverse primer (5’->3’)

AQP8 GCC ATC AAT GAG AAG ACA AAG G CAC CTA ATG AGC AGT CCA ACA A B-actin AGA GCT ACG AGC TGC CTG AC AGC ACT GTG TTG GCG TAC AG CA1 CCA AAC ATG ACA CCT CTC TGA A AGC TCG GCA GAA TAT TTG ACT C CD44 CTG CCG CTT TGC AGG TGT A CAT TGT GGG CAA GGT GCT ATT CDX2 TTC ACT ACA GTC GCT ACA TCA CC TCT GGG ACA CTT CTC AGA GGA C CEACAM5 CAG ATC AGG GGA AAA TCT GAA C TCG TGA CTG TGG TCC TAT TGA G EHMT2 TGC GTG CTG TTA TTC CTG TC TGA TCT TCT CTG TGC GGA TG ELF3 CAA AGA GTA CTG GGA CTG TCT CG CTC AGC TTC TCG TAG GTC ATG TTFABP1 GCA GAG CCA GGA AAA CTT TG TCT CCC CTG TCA TTG TCT CC FABP2 AAG CTT GCA GCT CAT GAC AA TCC ATT GTC TGT CCG TTT GA GAPDH TGA TGA CAT CAA GAA GGT GGT GAA G TCC TTG GAG GCC ATG TGG GCC ATHDAC2 GAG GTG GCT ACA CAA TCC GTA ACA CCA GGT GCA TGA GGT AAC HNF4G GTT TCT TCA GAC GCA GCA TTC AAC TTC AGC TTG TGC CAG TGT KAT2A GCT GAC CAC GTA TCC CAC TT ATG CAT TTC CGC AGT AGC TT KDM1A TCA ACT CTC TCC CTT AAG CAC TG CAC AGC TAT CAC TTC ACA TCC TGKRT20 ACG CCA GAA CAA CGA ATA CC ACG ACC TTG CCA TCC ACT AC LGR5 CTC CCA GGT CTG GTG TGT TG GAG GTC TAG GTA GGA GGT GAA GMYC GGC TCC TGG CAA AAG GTC A CTG CGT AGT TGT GCT GAT GT PPARG ATC TCT CCG TAA TGG AAG ACC A AGG CTC TTC ATG AGG CTT ATT G PRMT1 ACC GCA ACT CCA TGT TTC A CCA CCT TCC CCT TGA TGA T SETDB1 CCG GCC TAC AGA AAT AAT TGA G CAA GGT TCC TTT ATG CAG ATC C SMYD2 GTG TAC CAC CAA GGA CAA GGA T AGA TCT CCA GCA GCT CAC TAG G SMYD3 AGA ACT GAA GGC ACA CTG GAA TTC ATC ACT TGA ACC CCT CTG VDR AGA TGA CCC TTC TGT GAC CCT A ATG GCA CTT GAC TTC AGC AGT A

Primer sequences for the ChIP-qPCR

Name Forward primer (5’->3’) Reverse primer (5’->3’)

KRT20 TAG AGG CAT TTC TTG GAG GCT TT CGC ATG GGC TAA AAG AAT ACA CT FABP1 CTG GGC CTC CTC TAC TCT GT GTG ACA CTA GGC CAT CCA GG E-cadherin TGT AAA AGC CCT TTC TGA TCC CA TTG CTA GGG TCT AGG TGG GTT AT MUC2 CTC TAG TAG CTG CAT GTG TTC CC ATC GGA GGT CAT CTC TTT ATG GC

Target sequences of siRNA

Name Sense (5’->3’) Antisense (5’->3’) PMID

EHMT2 CACACACCGACCAGAGA UCUCUGGUCGGUGUGUG 20940408 HDAC2 CGGGGCAACAACA UGUUGUUGCCCCG 16642021 KAT2A GGAAAUGCAUCCUGCAGAU AUCUGCAGGAUGCAUUUCC 25770212 KDM1A CACAAGGAAAGCAGAAGA UCUUCUGCUUUCCUUGUG 16079795 PRMT1 GGACAUGACAUCCAAAGAU AUCUUUGGAUGUCAUGUCC 18951090 SETDB1 CGGGGGCGCCAAAACAA UUGUUUUGGCGCCCCCG 26471002 SMYD2 GAUUUGAUUCAGAGUGACA UGUCACUCUGAAUCAAAUC 22787429 SMYD3 AGCCUGAUUGAAGAUUUGA UCAAAUCUUCAAUCAGGCU 22194464

Target sequences of shRNA

Name Target sequence Catolog Number

shSETDB1#1 CCGGGCTCAGATGATAACTTCTGTACTCGAGTACAGAAGTTATCATCTGAGCTTTTTG

TRCN0000276169

Page 34: MCR-19-0450 revised Supplementary Information...11. Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, et al. A Myc network accounts for similarities between embryonic stem and cancer

34

shSETDB1#2 CCGGGCTCAGATGATAACTTCTGTACTCGAGTACAGAAGTTATCATCTGAGCTTTTTTG

TRCN0000148112

shSETDB1#3 CCGGAGTTAGAGACATGGGTAATACCTCGAGGTATTACCCATGTCTCTAACTTTTTTG

TRCN0000276105