Microarray Preprocessing MBP1010H, Department of Medical Biophysics Irakli (Erik) Dzneladze
Feb 24, 2016
Microarray Preprocessing
MBP1010H, Department of Medical BiophysicsIrakli (Erik) Dzneladze
Affymetrix microarray processing in R
• Four separate processing steps: background correction, normalization, pm correction and summary expression value computation
• Single Affy function runs selected algorithms in sequence
Affy processing pipeline
Step 1 - background correction
• Scanner image picks up background noise in every image
• Background noise may be due to unbound fluorescent dyes (e.g. Cy3 and Cy5) used to label the RNA on the chip
• This background is quantified and subtracted from probe intensity values
Step 2 - normalization
• The hybridization step cannot be perfectly controlled.
• Event though RNA is quantified prior to hybridization, it is impossible to get the exact same amount of RNA to hybridize to each chip
• The result of this is chip to chip differences in overall distribution of probe intensity values
• The purpose of normalization is to minimize these systematic differences between chips so that individual chips can be compared to each other
Step 3 – pm correction
• Affymetrix GeneChips contain both perfect match (mm) and mismatch (mm) probes
• Mm probes quantify non-specific and cross-hybridization
• Originally mm signal was subtracted from pm signal to correct for non-specific and cross hybridization
• Many researchers prefer to ignore the mm probes entirely and use uncorrected pm probes alone
Step 4 – summary expression value computation
• Each gene is represented by one or more probes sets
• Each probe set includes 11-20 probe pairs
• Expression value for a gene is a summary of corresponding probe-level data
• i.e. probe level intensity values correlated with “gene expression”
Instructions can be found in the manual
Some terms used in the manual
Preprocessing ExampleAgilent Platform (Cy3 and Cy5)