Top Banner
NASC Normalisation and Analysis of the Affymetrix Data David J Craigon
51

NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

Mar 28, 2015

Download

Documents

Madison Rose
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Normalisation and Analysis of the Affymetrix Data

David J Craigon

Page 2: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCWhat I am not going to

talk about• General microarray topics• Biology

Page 3: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

The introduction

Page 4: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Affymetrix workflow

Biological sample of some sort

AmplifyExtract mRNA

Label and Fragment

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Hybridise to a chipScan chipFind features in scanAnalyse down to one number per gene

Page 5: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCWhat do we want to

find out?• We want to find out how much mRNA of

each type was in the original sample

Page 6: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Biological sample of some sort

AmplifyExtract mRNA

Label and Fragment

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Hybridise to a chipScan chipFind features in scanAnalyse down to one number per gene

Each of these steps need to be proportional

Page 7: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Biological sample of some sort

AmplifyExtract mRNA

Label and Fragment

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Hybridise to a chipScan chipFind features in scanAnalyse down to one number per gene

This talk is about this bit

Page 8: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Affymetrix Chips• On an Affymetrix chip each oligo

takes up a “square”• The RNA extracted from the plant

is first amplified. Then is labelled. This allows the scanner to see it.

• The RNA is then hybridised to the array. Matching RNA for that square sticks to the square, and can be seen by the scanner.

• By observing the intensity of a square, the amount of RNA bound to that oligo can be calculated

Page 9: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Design of the oligos

• Series of oligos designed for one gene• Each oligo comes in two versions…

5’ 3’

Page 10: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Match and mismatch• The exact match is a section of

the mRNA sequence you wish to probe for

• The mismatch is identical except for one base difference from it’s exact match counterpart, and is used to calculate a background.

• There are typically 11 “probe pairs” scattered around the chip- called a probe set.

• By combining the expression values for a probe set, a value for the expression of mRNA can be found.

Page 11: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCEXP, DAT, CEL, CHP

files• EXP file- experiment file• DAT file- the picture- like

a TIFF.• CEL file- a

unnormalised number for each probe.

• CHP file- one number for each probeset

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 12: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCWhat do you think of it

so far?So far…• What we want to find out is the amount

of each mRNA in the starting sample.• The mRNA hybridises to a series of

probes.• We can get a number for each probe

from the CEL file.

Page 13: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

The rest of this talk

We are going to go through four distinct ways of determining “Signal” values from CEL file data

• MAS 4• MAS 5• MBEI (dChip)• RMA

Page 14: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Mismatch probes in detail

Page 15: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCAll about mismatch

probesATGCTGTACAATCGCTTGATACTGGATGCTGTACAATAGCTTGATACTGGATGCTGTACAATAGCTTGATACTGG

Mismatch probe:

Target sequence:

Perfect match probe:

Page 16: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCWhy do we have

mismatch probes?• Mismatch probes (MM) are trying to

detect background.• The mismatch probes are supposed to

detect things that are close but not an exact match.

• It is assumed that these things also bind to the perfect match (PM), erroneously.

Page 17: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCYes folks, it’s

Expression Method No 1!

• The original method that was used by MAS 4

Page 18: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

MAS 4 Algorithm

AvDiff =1

# A(PM j −MM j )

j∈A

∑For a probe set:• A is the set of probes you haven’t thrown away due to being outliers

• j=0 to the number of probesets• In English, the formula is very simple- throw away the outliers, then simply

average the differences between PM and MM of the probes you’ve got left.

Page 19: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCProblems with the MAS4 algorithm

• Better fit with log(PM) preferred

Page 20: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCExpression Method No

2!• MAS 5 method. • Still used by GCOS-

the current Affymetrix supplied method.

Page 21: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCNormalisation

Procedure• Before any work is done with the “CEL”

data, the CEL file is normalised.• Corrects for intra-chip differences

Page 22: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCNormalisation

Procedure• Divides the chip into K

zones (by default, 16 zones)

• Select the lowest 2% of probes (of any description)

• Assume these are “switched off”

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 23: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCNormalisation

Procedure• Calculate Mean, SD of

these “switched off” probes for each section.

• Used as background.• Each point’s local

background weighted difference between each zone

• Subtract background from each probe.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 24: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

MAS 5 Algorithm

Signal = TukeyBiweight{log(PM j − IM j )}

For a probe set:• Tukey’s Biweight is an average that minimises the effect of outliers.• IM is the “ideal mismatch”. This is the same as the MM intensity,

except in the case where the MM is greater than the PM, in which case a new MM values is calculated based on other probes nearby

Page 25: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 26: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCMAS4 to MAS5

comparison

Signal = TukeyBiweight{log(PM j − IM j )}

AvDiff =1

# A(PM j −MM j )

j∈A

Page 27: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Signal Normalisation

• To try to eliminate chip-to-chip variability.• Sort the signal values and remove the top and

bottom 2%• Calculate a scaling factor to adjust this middle

96%’s mean to 100 (configurable, and variable)• Multiply all signal values by the scaling factor• Affymetrix state that scaling factors should be

similar for arrays to be comparable

Page 28: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCExpression Method No

3!• The MBEI method of

Li and Wong.• Found in dChip, so

often known as the dChip method.

Page 29: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Observation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 30: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Observation

• The probes are vastly variable in effectiveness

• Li and Wong point out that the difference between probes is much greater than the difference between arrays!

• They contend that any proper model should take this into account.

Page 31: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

MBEI model

MM ij = v j +θ iα j + ε

PM ij = v j +θ iα j +θ iΦ j + ε

∴ PM ij −MM ij =θ iΦ j + ε

Page 32: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

MBEI model

MM ij = v j +θ iα j + ε

PM ij = v j +θ iα j +θ iΦ j + ε

∴ PM ij −MM ij =θ iΦ j + ε

Baseline response due to noise

Expression value (the thing we are interested in)

Rate of increase of PM probe as signal increases (separate for each probe)

Rate of increase of MM probe as signal increases (really? See later)

Error term

Page 33: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCModel is fitted over all

chips• Processes an entire experiment at once• Model is fitted using residual sum of

squares

• In their paper on the subject they talk a lot about how you can use this model to detect outliers, scratches on the array, etc. I’m not going to talk about that.

Page 34: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

RMA paper observations

Page 35: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCA spiked in experiment

from the RMA paper• It would be useful if we had an

experiment where we “knew the answer”

• Run a series of experiments with a fixed background, but spike in some artificial RNA for a series of probes, at different concentrations.

Page 36: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Mismatch probes

• Mismatch probes are supposed to calculate what similar things hybridise to probes, to detect background for PM probes.

• The background should be at a relatively low level most of the time…

Page 37: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Yikes!• Actually MM>PM between 33% and 40% of the time!

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 38: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Mismatch probes

• Mismatch probes are supposed to calculate what similar things hybridise to probes, to detect background for PM probes.

• The amount of this stuff shouldn’t depend on how much “interesting” RNA there is about…

Page 39: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Man the lifeboats!

Page 40: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCSome observations from the RMA paper

… perfect match probes appear to be additive (in the log scale)

Page 41: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

• The amount of signal does affect mismatch probes.

• Clearly some of the useful mRNA is hybidising to the MM probes.

• This kind of shock has led to some people abandoning the use of MM probes altogether!

Page 42: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

What’s going on?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 43: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Perfect match probes

… in RMA, in the log scale, they assume that probe effects are effectively additive

Page 44: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

How RMA (roughly) works

Page 45: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

RMA process

• Normalise array• Fit model

Page 46: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCNormalisation procedure involves adusting

distributions

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 47: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

RMA process

• Normalise array• Fit model

Page 48: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

Fit model

Correct background using estimate from all mismatch probes for each array.

Fit model:

PM =θ +α + εLog scale expression value

Additive probe affinitive effect for this probe over all slides

Background corrected PM value

Page 49: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC

In summary then…

• There are various ways you can get from a CEL file to expression estimates.

• These models are derived by considering the behaviour of PM and MM probes

• Both dChip and RMA show better results than the standard Affy algorithm

• MM probes in particular behave contrary to how you would expect.

Page 50: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASCEnough theory- how do you actually do these things?

• The MAS5 algorithm can be performed using (erm) MAS5!

• dChip is a piece of software that will be making an appearance later this afternoon, and can do the MBEI algorithm

• The RMA authors have a piece of software called RMAExpress, which does RMA for Windows.

• All of these algorithms can be done using the Bioconductor package in R.

Page 51: NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

NASC