Top Banner
From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012
13

From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Jan 03, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

From ChIP-chip to ChIP-Seq: the study of mammalian transcription

factor binding sites and epigenetics

I519 Introduction to Bioinformatics, Fall, 2012

Page 2: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

From Chip-Chip to Chip-Seq

ChIP-chip (ChIP on tiled microarrays) ChIP-sequencing (ChIP-seq) combines

chromatin immunoprecipitation (ChIP) and massively parallel sequencing to identify mammalian DNA sequences bound by transcription factors in vivo.

Page 3: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Chromatin immunoprecipitation (ChIP)

Formaldehyde (CH2O) is a very reactive dipolar compound (the carbon atom is the nucleophilic center). Amino and imino groups of proteins (e.g., the side chains of lysine and arginine) and of nucleic acids (e.g., cytosine) react with formaldehyde, leading to the formation of a Schiff base (reaction I)

between the side chains of two lysines

between lysine & cytosine

Page 4: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Chip-Seq workflow

Nature Methods - 4, 613 - 614 (2007)

Solexa sequencing technology provided short read length sequences of approx 30 base pairs that were ideal for characterizing ChIP-derived fragments.

Page 5: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Advantages of ChIP-Seq Single base-pair resolution of direct sequencing ChIP-seq data are likely to have less noise or

artifacts potential binding regions need not be specified prior

to experiment lower cost, minimal hands-on processing and a

requirement for fewer replicate experiments as well as less input material.

Epigenetics meets next-generation sequencing. Epigenetics. 2008 Nov;3(6):318-21

Page 6: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Next generation sequencing (NGS) techniques

454 Sequencing Illumina/Solexa ABI SOLiD

Sequencing Chemistry

PyrosequencingPolymerase-based sequence-by-synthesis

Ligation-based sequencing

Amplification approach

Emulsion PCR Bridge amplification Emulsion PCR

Paired end (PED) separation

3 kb 200-500 bp 3 kb

Mb per run 100 Mb 1300 Mb 3000 Mb

Time per PED run <0.5 day 4 days 5 days

Read length (update)

250-400 bp 35, 75 and 100 bp 35 and 50 bp

Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD

Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD

Page 7: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Tools for extracting transcription factor targets from ChIP-Seq data

CisGenome uses a conditional binomial model to identify enriched regions when a control data set is provided (Nat. Biotechnol. 26:1293–1300, 2008)

MACS (Model-based Analysis of ChIP-Seq) uses the control dataset to model the tag distribution across the genome using the Poisson distribution BG (Genome Biol, 9:R137, 2009)

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Biotechnol, 27:66–75, 2009)

QuEST (Quantitative Enrichment of Sequence Tags) Nat. Methods, 5:829–834, 2008

GLITR (GLobal Identifier of Target Regions) identifies enriched regions in target data by calculating a fold-change based on random samples of control (input chromatin) data

Page 8: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

PeakSeq: Biotechnol, 27:66–75, 2009

Why peak detection is difficult

The signal for a given transcription factor is the 'convolution' of various effects: the density of mappable bases in a region, the underlying chromatin structure and the actual signal from transcription factor binding. Some fraction of the peaks in the ChIP-seq signal map for a transcription factor might be due to the nature of the open chromatin structure instead of the presence of transcription factor binding--one must compare the signal against one from a control.

Page 9: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

PeakSeq scoring procedure

Biotechnol, 27:66–75, 2009

Page 10: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

High-Resolution Profiling of Histone methylations in the human genome

Ref: Cell, 129(4):823-837, 2007 Generated high-resolution maps for the genome-wide

distribution of 20 histone lysine and arginine methylations and others across the human genome using the Solexa 1G sequencing technology (The cells were digested with MNase to generate mainly mononucleosomes with minor fraction of dinucleosomes for histone modification mapping)

Typical patterns of histone methylations exhibited at promoters, insulators, enhancers, and transcribed regions are identified.

– The monomethylations of H3K27, H3K9, H4K20, H3K79, and H2BK5 are all linked to gene activation

– trimethylations of H3K27, H3K9, and H3K79 are linked to repression. – H2A.Z (a Histone variant) associates with functional regulatory elements,

and CTCF marks boundaries of histone methylation domains.– …

Page 11: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

BS-seq for epigenetic profiling

BS-seq (bisulphite sequencing) combines bisulphite treatment of genomic DNA with ultra-high-throughput sequencing

Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences

Page 12: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

Bisulphite sequencing

Page 13: From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics I519 Introduction to Bioinformatics, Fall, 2012.

References

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods - 4, 651 - 657 (2007)