1 DNA microarray and DNA microarray and array data analysis array data analysis What is DNA Microarray What is DNA Microarray DNA microarray is a new technology to DNA microarray is a new technology to measure the level of the measure the level of the mRNA gene mRNA gene products products of a living cell. of a living cell. A microarray chip is a rectangular chip on A microarray chip is a rectangular chip on which is imposed a grid of which is imposed a grid of DNA spots DNA spots. . These spots form a These spots form a two dimensional array two dimensional array. . Each spot in the array contains millions of Each spot in the array contains millions of copies of some DNA strand, bonded to the copies of some DNA strand, bonded to the chip. chip. Chips are made tiny so that a small amount of Chips are made tiny so that a small amount of RNA is needed from experimental cells. RNA is needed from experimental cells. DNA Microarray DNA Microarray Many applications in both basic and clinical Many applications in both basic and clinical research research determining the role a gene plays in a pathway, determining the role a gene plays in a pathway, disease, diagnostics and pharmacology, … disease, diagnostics and pharmacology, … There are There are three main platforms three main platforms for performing for performing microarray analyses. microarray analyses. cDNA arrays cDNA arrays (generic, multiple manufacturers) (generic, multiple manufacturers) Oligonucleotide arrays Oligonucleotide arrays (genechips genechips) (Affymetrix) ) (Affymetrix) cDNA membranes (radioactive detection) cDNA membranes (radioactive detection) cDNA Microarray cDNA Microarray Spot cloned cDNAs onto a glass/nylon microscope slide Spot cloned cDNAs onto a glass/nylon microscope slide usually PCR amplified segments of plasmids usually PCR amplified segments of plasmids Complementary hybridization Complementary hybridization -- -- CTAGCAGG actual gene CTAGCAGG actual gene -- -- GATCGTCC cDNA ( GATCGTCC cDNA (Reverse transcriptase) Reverse transcriptase) -- -- CUAGCAGG mRNA CUAGCAGG mRNA Label 2 mRNA samples with 2 different colors of Label 2 mRNA samples with 2 different colors of fluorescent dye fluorescent dye -- -- control vs. experimental control vs. experimental Mix two labeled mRNAs and hybridize to the chip Mix two labeled mRNAs and hybridize to the chip Make two scans Make two scans - one for each color one for each color Combine the images to calculate ratios of amounts of Combine the images to calculate ratios of amounts of each mRNA that bind to each spot each mRNA that bind to each spot CTRL TEST Spotted Microarray Process Spotted Microarray Process
8
Embed
DNA microarray and array data analysis - Computer …duan/class/bioinformatics/Notes/6_Microarray.pdf · 1 DNA microarray and array data analysis What is DNA Microarray DNA microarray
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
DNA microarray and DNA microarray and array data analysisarray data analysis
What is DNA MicroarrayWhat is DNA Microarray DNA microarray is a new technology to DNA microarray is a new technology to
measure the level of the measure the level of the mRNA gene mRNA gene productsproducts of a living cell. of a living cell.
A microarray chip is a rectangular chip on A microarray chip is a rectangular chip on which is imposed a grid of which is imposed a grid of DNA spotsDNA spots. . These spots form a These spots form a two dimensional arraytwo dimensional array. .
Each spot in the array contains millions of Each spot in the array contains millions of copies of some DNA strand, bonded to the copies of some DNA strand, bonded to the chip.chip.
Chips are made tiny so that a small amount of Chips are made tiny so that a small amount of RNA is needed from experimental cells.RNA is needed from experimental cells.
DNA MicroarrayDNA Microarray
Many applications in both basic and clinical Many applications in both basic and clinical research research determining the role a gene plays in a pathway, determining the role a gene plays in a pathway,
disease, diagnostics and pharmacology, …disease, diagnostics and pharmacology, …
There are There are three main platformsthree main platforms for performing for performing microarray analyses. microarray analyses. cDNA arrayscDNA arrays (generic, multiple manufacturers)(generic, multiple manufacturers) Oligonucleotide arraysOligonucleotide arrays ((genechipsgenechips) (Affymetrix)) (Affymetrix) cDNA membranes (radioactive detection)cDNA membranes (radioactive detection)
cDNA MicroarraycDNA Microarray Spot cloned cDNAs onto a glass/nylon microscope slideSpot cloned cDNAs onto a glass/nylon microscope slide
usually PCR amplified segments of plasmidsusually PCR amplified segments of plasmids Complementary hybridizationComplementary hybridization
---- CTAGCAGG actual geneCTAGCAGG actual gene---- GATCGTCC cDNA (GATCGTCC cDNA (Reverse transcriptase)Reverse transcriptase)---- CUAGCAGG mRNACUAGCAGG mRNA
Label 2 mRNA samples with 2 different colors of Label 2 mRNA samples with 2 different colors of fluorescent dye fluorescent dye ---- control vs. experimentalcontrol vs. experimental
Mix two labeled mRNAs and hybridize to the chipMix two labeled mRNAs and hybridize to the chip Make two scans Make two scans -- one for each colorone for each color Combine the images to calculate ratios of amounts of Combine the images to calculate ratios of amounts of
each mRNA that bind to each spoteach mRNA that bind to each spot
CTRL
TEST
Spotted Microarray Process Spotted Microarray Process
2
cDNA Array Experiment MoviecDNA Array Experiment Movie
AffymetrixAffymetrix Uses 25 base oligos synthesized in place on a chip (20 Uses 25 base oligos synthesized in place on a chip (20
pairs of oligos for each gene)pairs of oligos for each gene) cRNA labeled and scanned in a single “color”cRNA labeled and scanned in a single “color”
one sample per chipone sample per chip Can have as many as 760,000 probes on a chipCan have as many as 760,000 probes on a chip Arrays get smaller every year (more genes)Arrays get smaller every year (more genes) Chips are expensive (Chips are expensive (Human Genome U133A Plus 2.0 ~$500Human Genome U133A Plus 2.0 ~$500) ) Proprietary system: “black box” software, can only use Proprietary system: “black box” software, can only use
their chipstheir chips
GeneChip® Human Gene 1.0 ST ArrayGeneChip® Human Gene 1.0 ST Array
Chips are placed in the Fluidics station where they are washed, stained and washed again (2.5 hours)
Chip is placed in a hybridization oven and incubatedovernight
Hybridization cocktail
Affymetrix Array Chip
Sample is added to a hybridization cocktail along with spiked control transcripts and is loaded onto an array chip
Data is acquired by the computer as soon as the scan has been completed.
After staining, the signal intensities are measured with a laser scanner (15 min)
The chip image data file (or “.dat” file) is the first part of data acquisition and appears on the computer screen upon completion of the laser scan.
Here, we zoom in to see an individual probe set that has been highlighted
Probe set
The first image is “sample1.dat.” note the pixel to pixel variation within a probe cell
A “*.cel.” file is automatically generated when the “*.dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells
Unsupervised learningUnsupervised learning Clustering and pattern detectionClustering and pattern detection
Gene regulatory regions predictions based coGene regulatory regions predictions based co--regulated genesregulated genes
Linkage between gene expression data and gene Linkage between gene expression data and gene sequence/function databasessequence/function databases
……
Data preprocessingData preprocessing
Data preparation or preData preparation or pre--processingprocessing NormalizationNormalization Feature selectionFeature selection
Base on the quality of the signal intensityBase on the quality of the signal intensity Based on the fold changeBased on the fold change TT--testtest ……
……
NormalizationNormalization
Need to scale the red sample so that the overall Need to scale the red sample so that the overall intensities for each chip are equivalent intensities for each chip are equivalent
control control
Sam
ple
1
Sam
ple
2What can we tell from the two plots ?
NormalizationNormalization To insure the data are comparable, normalization To insure the data are comparable, normalization
attempts to correct the following variables:attempts to correct the following variables: Number of cells in the sampleNumber of cells in the sample Total RNA isolation efficiencyTotal RNA isolation efficiency Signal measurement sensitivitySignal measurement sensitivity ……
Can use simple/complicated math Can use simple/complicated math Normalization by global scaling (bring each image to the Normalization by global scaling (bring each image to the
same average brightness) same average brightness) Normalization by sectorsNormalization by sectors Normalization to housekeeping genesNormalization to housekeeping genes ……
Active research areaActive research area
SP22 vs. SP23
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000
SP22 (normal) vs SP23 (normal)SP22 (normal) vs SP23 (normal)
6
SP 33 vs SP34
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000
SP33 (normal) vs SP34 (failure)SP33 (normal) vs SP34 (failure)
Basic Data AnalysisBasic Data Analysis
Biological markersBiological markers Fold change (relative change in intensity for each gene)Fold change (relative change in intensity for each gene)
Mn-SODAnnexin IV
Aminoacylase 1
Microarrays: An ExampleMicroarrays: An Example
Leukemia: Acute Lymphoblastic (ALL) vs Acute Leukemia: Acute Lymphoblastic (ALL) vs Acute Myeloid (AML), Golub et al, Myeloid (AML), Golub et al, ScienceScience, v.286, 1999, v.286, 199972 examples (38 train, 34 test), about 7,000 probes72 examples (38 train, 34 test), about 7,000 probeswellwell--studied (CAMDAstudied (CAMDA--2000), good test example2000), good test example
Null hypothesisNull hypothesis is a hypothesis set up to be nullified is a hypothesis set up to be nullified in order to support an in order to support an alternative hypothesisalternative hypothesis..
Hypothesis testing is to test the viability of the null Hypothesis testing is to test the viability of the null hypothesis for a set of experimental datahypothesis for a set of experimental data
Example:Example: Test whether the time to respond to a tone is affected by the Test whether the time to respond to a tone is affected by the
consumption of alcoholconsumption of alcohol Hypothesis : µ1 Hypothesis : µ1 -- µ2 = 0µ2 = 0
µ1 is the mean time to respond after consuming alcohol µ1 is the mean time to respond after consuming alcohol µ2 is the mean time to respond otherwiseµ2 is the mean time to respond otherwise
?
ZZ--testtest TheoremTheorem: If : If xxii has a normal distribution with mean has a normal distribution with mean µµ and standard and standard
deviation deviation σσ22, , ii=1,…,=1,…,nn, then , then UU==∑∑ aai i xxii has a normal distribution has a normal distribution with a mean E(with a mean E(UU)=)=µµ ∑∑ aai i and standard deviation D(and standard deviation D(UU)=)=σσ22∑∑ aai i
22.. ∑∑xxi i /n /n ~ N(~ N(µµ, , σσ22/n)./n).
Z test : H: µ = µZ test : H: µ = µ00 (µ(µ00 and and σσ00 are known, assume are known, assume σσ = = σσ00)) What would one conclude about the null hypothesis that a sample of N = 46 What would one conclude about the null hypothesis that a sample of N = 46
with a mean of 104 could reasonably have been drawn from a population with with a mean of 104 could reasonably have been drawn from a population with the parameters of the parameters of µµ = 100 and = 100 and σσ = 8? Use = 8? Use