Top Banner
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in lab we will predict orthologs using reciprocal genome-scale BLAST searches W Oct 31 – Phylogenetic Profiles ( an example of unsupervised machine learning) and supervised machine learning approaches and applications M Nov 5 - Phylogeny (Phylogeny Lab) W Nov 7 – Metabolic reconstruction and modeling ***2-3 pg paper on preliminary results due*** Today: Chip-chip and Chip-seq analysis
18

Starting Monday

Jan 08, 2016

Download

Documents

triage

Starting Monday. Today: Chip-chip and Chip-seq analysis. M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in lab we will predict orthologs using reciprocal genome-scale BLAST searches - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Starting Monday

Starting Monday

M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in lab we will predict orthologs using reciprocal genome-scale BLAST searches

W Oct 31 – Phylogenetic Profiles ( an example of unsupervised machine learning) and supervised machine learning approaches and applications

M Nov 5 - Phylogeny (Phylogeny Lab)

W Nov 7 – Metabolic reconstruction and modeling

***2-3 pg paper on preliminary results due***

Today: Chip-chip and Chip-seq analysis

Page 2: Starting Monday

Chromatin immunoprecipitation (ChIP)

1. Chemical or light-basedcrosslinking added toliving cells

2. Shear DNA by sonication ordigestion

3. IP by specific Ab orAb against protein tag

2

Page 3: Starting Monday

ChIP on ChIP (tiled genomic microarrays)

Sign

al I

nten

sity

Array Probes

Peak resolution a function of:- shearing size- probe resolution- ChIP enrichment

3

Page 4: Starting Monday

ChIP - Seq

Rea

d C

ount

s

4

Page 5: Starting Monday

5

Page 6: Starting Monday

1. Map reads to the reference genome

2. Convert to ‘tag’ counts: sequence coverage at each base pair in the genome

3. Find peaks of high tag count (using a fixed/sliding window with count threshold)or based on bimodal peak distribution

4. Convert bimodal peaks into summits (by shifting 3’ tag positions OR byextending the tag signal to estimated size of fragments)

5. Identify summits that represent fragment enrichment relative to control

6. Assign a confidence score (p-value, enrichment score, and/or FDR)

Page 7: Starting Monday

Types of ‘control’ data for ChIP experiments

1. ‘Input’ DNA = sheared but no IP

2. No-antibody mock IP

3. Untagged strain

Almost always somebackground in mock-IP

… hope is to haveenrichment of IP material

over background.

* Certain artifacts can givethe appearance of real peaks in

control experiments.

Page 8: Starting Monday

Pepke et al. 2009

Read counts/ tag profile is generallysmoothed before peak calling(e.g. running average) and then the‘summit’ is inferred by the dual read peaks

* using a method that incorporatesmeasured background model is probably very important

Page 9: Starting Monday
Page 10: Starting Monday

10

3 Types of peaks1. Sharp & narrow (100s bp)

(eg. site-specific TF)

2. Broader but defined (kb)(eg. RNA Polymerase)

3. Very broad (regional, 1000s kb)(eg. heterochromatin histone marks)

• methods that identify bimodal peak profiles to identify summits work less well forbiologically wider peaks/loci

Page 11: Starting Monday

Hidden Markov Models for Identifying Bound Fragments

HMM’s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states

Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.

You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.

Once trained, an HMM can be used to identify the ‘hidden’ states in an unknown dataset, based on the known characteristics of each state (‘emission probabilities ’) and

the probability of moving between states (‘transition probabilities’)

Example: “A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences” 2005. Li, Meyer, Liu

Page 12: Starting Monday

Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.

You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.

P( I ) = 0.2P( i ) = 0.8

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

I = Intensity units > 10,000 i = Intensity units < 10,000

P= 0.5

P= 0.5

P= 1.0

P= 0

P= 0.7

P= 0.3

P= 1.0

Unbound 25mer Bound 25mer Bound 25mer Bound 25mer

Page 13: Starting Monday

Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.

You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.

P= 0.5

P= 0.5

P= 1.0

P= 0

P= 0.7

P= 0.3

P= 1.0

Unbound 25mer Bound 25mer Bound 25mer Bound 25mer

Emission Probabilities

Transition Probabilities

Given the data, an HMM will consider many different models and give back the optimal model

P( I ) = 0.2P( i ) = 0.8

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

Page 14: Starting Monday

14

Evaluated 11 different peak-calling algorithms using 3 real datasets * & defaultparameters (mimicking “non-expert users”)

- methods with smaller peak lists often return peaks identified by other methods(more stringent)

“many programs call similar peaks, though default parameters are tuned to different levels of stringency”

Page 15: Starting Monday

15

Page 16: Starting Monday

Output: list of peak locations (start & stop) and p-values

Challenge is peaks do not show precisely where protein binds.

Different programs vary in the width of the identified peaks

Can apply the same type of motif finding to a set of IP’d regionsto identify motifs shared by regions.

Page 17: Starting Monday

Other approaches

ChIP-exoDNaseI hypersensitive sites

Micrococcal nuclease sensitive sites(nucleosome mapping)

Page 18: Starting Monday

What can you do with the data?

1.Motif finding: look for motif shared in bound regions (e.g. XX)

2.Association bound loci with neighboring genes, elements- functional enrichment of neighboring genes- other non-random association among neighboring genes,

e.g. shared expression profiles, expression dependency on factor in question

3.Locus distribution across the genome