Top Banner
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215
27

ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Dec 21, 2015

Download

Documents

Andra Jones
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

ChIP-seq QC

Xiaole Shirley Liu

STAT115, STAT215

Page 2: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Initial QC

• FASTQC• Mappability• Uniquely mapped reads• Uniquely mapped locations• Uniquely mapped locations / Uniquely mapped

reads• Good to keep one read / location in peak calling

2

Page 3: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Peak Calls

• Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size)

• ChIP-Seq show local biases in the genome– Chromatin and sequencing bias– 200-300bp control windows have to few tags– But can look

further

Dynamic λlocal =

max(λBG, [λctrl, λ1k,] λ5k, λ10k)

ChIP

Control

300bp1kb5kb10kb

http://liulab.dfci.harvard.edu/MACS/Zhang et al, Genome Bio, 2008

Page 4: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Peak Call Statistics

• P-value and FDR • Simulation: random sampling of reads? • FDR = A / B, BH correction or Qvalue• P-value / FDR changes with sequencing depth• Fold change does not

4

<1% enriched

MAT: Quality Control

Background

Enriched DNA

A B

Page 5: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

ChIP-seq QC

• Number of peaks with good FDR and fold change• FRiP score:

– Fraction of reads in peaks

– Often higher for histone modifications than transcription factors

– Often increase slightly with increasing read depth

• Overlap with union of peaks in public DNase-seq data– Working ChIP-seq peaks overlap > 70% of union DHS

5

Page 6: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

DNase-seq

• Captures all regulatory sequences in the prostate genome

66

Sabo et al, Nat Methods 2006; Thurman et al, Nat 2012

Page 7: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

ChIP-seq QC

• Evolutionary conservation– Can be used for ChIP QC

• Conserved sites more functional?– Majority of functional sites

not conserved

7

Odom et al, Nat Genet 2007

Page 8: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Enrichment Distribution

• CEAS (Shin et al, Bioinfo, 2009)– Meta-gene profiles: TF and histone marks

– % of peaks at promoter, exons, introns, and distal intergenic sequences

– SitePro of signal at specific sites

• Replicate agreement: > 60% or > 0.6

8

Page 9: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

ChIP-seq Downstream Analysis

9

Page 10: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Target Gene Assignment

10

Protein

Gene

RegulateTranscribe

Yeast TF Regulatory

Network

Page 11: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Human TF Binding Distribution

• Most TF binding sites are outside promoters• How to assign targets?• Nearest distance?• Binding within 10KB?• Number of binding?• Other knowledge?

11

Page 12: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Higher Order Chromatin Interactions

Chromatin confirmation capture

Page 13: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Hi-C

Interactions follows exponential decay with distance

Lieberman-Aiden et al, Science 2009

Page 14: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

How to Assign Targets for Enhancer Binding Transcription Factors?

• Regulatory potential: sum of binding sites weighted by distance to TSS with exponential decay

• Decay modeled from Hi-C experiments

14

TSS

Page 15: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Direct Target Identification

• Binary decision?• Rank product of

regulatory potential and differential expression

• BETA

15

Page 16: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Is My Factor an Activator, Repressor, or Both?

• Most labs have differential expression profiling of transcription factor together with TF ChIP-seq

• Do genes with higher regulatory potential show more up- or down-expression than all the genes in the genome?

16

Page 17: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

ChIP-chip/seq Motif Finding

• ChIP-chip gives 10-5000 binding regions ~200-1000bp long. Precise binding motif?– Raw data is like perfect clustering, plus enrichment

values

• MDscan– High ChIP ranking => true targets, contain more sites

– Search TF motif from highest ranking targets first (high signal / background ratio)

– Refine candidate motifs with all targets

17

Page 18: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Similarity Defined by m-match

For a given w-mer and any other random w-mer

TGTAACGT 8-mer

TGTAACGT matched 8

AGTAACGT matched 7

TGCAACAT matched 6

TGACACGG matched 5

AATAACAG matched 4

m-matches for TGTAACGT

Pick a reasonable m to call two w-mers similar

18

Page 19: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

MDscan Seeds

ATTGCAAATTTTGCGAATTTTGCAAAT

Seedmotif pattern

ATTGCAAAT

A 9-mer

TTTGCAAAT

TTTGCGAAT

Hig

her

enri

chm

ent

ChIP-chip selected upstream sequences

TTGCAAATC

CAAATCCAACAAATCCAAGAAATCCAC

GCAAATCCAGCAAATTCGGCAAATCCAGGAAATCCAGGAAATCCT

TGCAAATCCTGCAAATTC

GCCACCGTACCACCGTACCACGGTGCCACGGC…

TTGCAAATCTTGCGAATATTGCAAATTTTGCCCATC

19

Page 20: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Seed1 m-matches

Update Motifs With Remaining Seqs

ExtremeHighRank

All ChIP-selected targets20

Page 21: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Seed1 m-matches

Refine the Motifs

ExtremeHighRank

All ChIP-selected targets21

Page 22: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Further Refine Motifs

• Could also be used to examine known motif enrichment

• Is motif enrichment correlated with ChIP-seq enrichment?

• Is motif more enriched in peak summits than peak flanks?

• Motif analysis could identify transcription factor partners of ChIP-seq factors

22

Page 23: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Estrogen Receptor

• Carroll et al, Cell 2005• Overactive in > 70% of breast cancers• Where does it go in the genome?• ChIP-chip on chr21/22, motif and expression

analysis found its “pioneering factor” FoxA1

TF??ER

Page 24: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Estrogen Receptor (ER) Cistrome in Breast Cancer

• Carroll et al, Nat Genet 2006

• ER may function far away (100-200KB) from genes

• Only 20% of ER sites have PhastCons > 0.2

• ER has different effect based on different collaborators

AP1

ER

NRIP

Page 25: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Estrogen Receptor (ER) Cistrome in Breast Cancer

• Carroll et al, Nat Genet 2006

• ER may function far away (100-200KB) from genes

• Only 20% of ER sites have PhastCons > 0.2

• ER has different effect based on different collaborators

AP1

ERNRIP

Page 26: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Cell Type-Specific Binding

• Same TF bind to very different locations in different tissues and conditions, why?

• TF concentration?• Collaborating factors, esp pioneering factors• Interesting observations about pioneering factors

26

Page 27: ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Summary

• ChIP-seq identifies genome-wide in vivo protein-DNA interaction sites

• ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR

• Functional analysis of ChIP-seq data:– Strong vs weak binding, conserved vs non-conserved

– Target identification

– Motif analysis

• Cell type-specific binding Epigenetics

27