Microarray and functional genomics Wenjing Tao University of Missouri
Mar 29, 2015
Microarray and functional genomics
Wenjing Tao
University of Missouri
Microarray: high through-put whole genome approach
Microarray is a tool for analyzing gene expression that consists of a small membrane or glass slide containing samples of many genes arranged in a regular pattern
48 grids, with 31k probes
Each grid contain 650 probes
Microarray terminology
• Feature - an array element
• Probe - a feature corresponding to a defined sequence (immobilized on a solid surface in an ordered array)
• Target - a pool of nucleic acids of unknown sequence
- Find the genes and assign them functions
- Predict protein structures and functions
- Reconstruct metabolic, signaling, and other pathways
- Reconstruct informational networks
- Link genotype to phenotype
- Use genotype/phenotype to predict relevant outcome
- Cross- species comparisons
Microarray provides the opportunities
Kinds of array features
Synthetic oligonucleotides:
Affymetrix genechip
Long oligo array
PCR products from:
Cloned cDNAs
Genomic DNA
cDNA & oligonucleotide arrays
100-300 m spot 20-25 mers
Schulze and Downward, 2001 Nat Cell Biol 3, 190
Target2
Target1
RNART
RT
Labeling withFlouresent dye
ORFs or ESTs
Design long oligoes
Microtiter plate Microarray slides
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
HybridizationScan
cDNA and long oligo array experiment
Affymetrix GeneChip
GREEN represents Reference DNA hybridized to the target DNA.
RED represents Test DNA hybridized to the target DNA.
YELLOW represents a combination of Test and Reference DNA hybridized equally to the target DNA.
BLACK represents areas where neither the Reference nor Test DNA hybridized to the target DNA.
Fluorescent microarrays are composed of a combined two false color laser scanned images
Image file post-processing• Single slide normalization – GenePix Pro 4.1
• Slide-slide and dye-swap comparison – TMEV & MIDAS • Cross-slides quality evaluation - GeneSpring + R script for CV filter• Mixed linear model analysis of Variance to identify significant differentially expressed genes – R or SAS program• Data Analysis in the Post-Genomic Era (gene annotation, ontology and pathway analysis– KOG, COG, KEGG, TAIR, Onto-Tools, GenMapp…• Data validation – qPCR or Northern blot
Whole genome approaches to biological questions
• Gene expression
• Gene variation
• Gene function
NSF-DB1-0211842, PI: Henry Nguyen
Functional Genomics of Root Growth and Root Signaling under drought
http://rootgenomics.missouri.edu/prgc/research.html
Drought-stress inducible genes and their possible functions in stress tolerance and response.
Yamaguchi-Shinozaki et al. JIRCAS Working Report, 2002
Dr. Henry Nguyen’s lab, Plant Sciences, University of Missouri
Characterize the transcript profiles of apical and basal regions of the root
growth zone under water deficit condition using maize long oligonucleitide arrays
To identify genes contributing to root growth maintenance under water deficit condition
To determine genes responsible for progressive inhibition of root elongation under water-deficit condition
To compare the differential gene expression in root region of progressive inhibition of root elongation under water stress with the normal growth deceleration in well-watered root region
Objectives
WW48 WS48
1
2
3
4
1
2
3
4
1
2
3456
Pair-wise comparison of maize root segments using oligo array
Characterization of the maize long oligo array
• Maize oligo array, printed at the University of Arizona, contains 56,311 70-mer oligonucleotide probes, including >30,000 identifiable unique maize genes. 16,915 oligoes do not have any annotation.
• 70-mer oligonucleotides in conjunction with Operon Qiagen based on the TIGR Maize Database
WS/WW=Cy5/Cy3 WS/WW=Cy3/Cy5
Dye Swap
Slides feature and dye-swap experiment
1. Channel A intensity vs. channel B intensity
2. Log channel A intensity vs. log channel B intensity
3. R-I
4. Z-score histogram5. Box plot
Two-color microarray data feature
Flip dye consistency checking
- processed data count: 27852 (only slides A)- pre-filtering corr. coeff: 0.11360581- post-filtering data count: 26747- confidence factor: 0.9647781- dispersion factor: 0.035401408
Summary of the evaluation of replicates (technique & biological)
• ~50,000 of the 56,311 genes have intensity >200 (at least one channel).
• Confidence of dye-swap is > 96%• 99.9% confidence limit was estimated by testing
the coefficient of variance (CV) for replicates
Mixed linear model analysis of two color microarray data- producing lists of differentially expressed
genes with low false discovery rates To obtain accurate and precise estimates of gene expression values
between treatment and control, analyze gene effects with a simultaneous consideration of all blocking factors, a linear mixed ANOVA model is applied:
There are two processes:
First, global mixed model was applied:
Log2(singal values) = treat + dye + treat*dye + tech_reps_effect + array_effect (within treat*dye and tech_reps_effect)
Second, take residual values from the first model and then apply this model for individual gene:
Residuals = treat + dye + tech_reps_effects + array(within tech_reps_effects)
POORLY CHARACTERIZED - 6%
METABOLISM - 11%
INFORMATION STORAGE AND PROCESSING - 4%
CELLULAR PROCESSES AND SIGNALING 10%
NOT ASSIGNED – 69%
Gene function categorization of significantly differentially expressed
genes
KOG analysis
RNA processing and modification
16%
Chromatin structure and
dynamics13%
Translation, ribosomal
structure and biogenesis
23%
Transcription34%
Replication, recombination and
repair14%
Information storage and processing
Energy production and
conversion15%
Amino acid transport and
metabolism14%
Nucleotide transport and
metabolism5%
Carbohydrate transport and
metabolism18%
Inorganic ion transport and
metabolism16%
Secondary metabolites
biosynthesis, transport and
catabolism14%
Lipid transport and metabolism
14%Coenzyme
transport and metabolism
4%
Metabolites
Posttranslational modification, protein turnover, chaperones
30%
Signal transduction mechanisms
31%
Intracellular trafficking, secretion, and vesicular
transport11%
Defense mechanisms8%
Extracellular structures1%
Nuclear structure1%
Cytoskeleton6% Cell
wall/membrane/envelope biogenesis
9%
Cell motility0%
Cell cycle control, cell division, chromosome
partitioning3%
CELLULAR PROCESSES AND SIGNALING
Summary
• Microarray is a high through-put tool to identify novel genes
• We have identified 19 hundred drought response and root growth maintenance related genes
• Combining functional analysis we would find drought stress tolerance related pathways and genes
• This knowledge will lead to novel approaches for improving drought tolerance in maize.