Introduction to microarray analysis and tools Module B: Survey of Microarray Analysis Tools Commercial Tools Agnes Viale, Ph.D. Genomics Core lab MSKCC
Introduction to microarrayanalysis and tools
Module B: Survey of Microarray Analysis ToolsCommercial Tools
Agnes Viale, Ph.D.Genomics Core lab
MSKCC
Microarray assay life cycle
BiologicalQuestion
SamplePreparation
MicroarrayHybridization
Microarray Detection
Data Analysis& modeling
M.Shena and R. Davis,MIcroarray biochip technology
Plan
I- GeneChip Operating System (GCOS)
II- Genespring
III-Submission to public repository
IV- NetAffx
Affymetrix GeneChip-Definitions
5’ 3’600bp
PMMM
Software name
– MicroArray Suite 4.0 (MAS 4.0) = Empirical algorithm
– Microarray Suite 5.0 (MAS 5.0) = Statistical algorithm
– Genechip Operating System (GCOS)
Data comparability
• Affy arrays
1995 1998 2001 20032.0 platform
•Redesign the oligos•Change probe set names
•Keep same name•Change the manufacturingprocess
•Redesign the oligos•Change probe set names
• Software
MAS 4.0 MAS 5.0 GCOS
•New Algorithm •Same algorithm•Different data management
GenechipSoftware
•New Algorithm
Signal intensity
1- Genechip software: use all pairs
∑Α∈
−Α
=j
jj MMPMAvDiff )(1 A: probe pairs selected by the
software
2- MAS 4.0: excluded outlier pairs: PM-MM values that were more than 3 SDfrom the mean PM-MM value
- not robust average- negative Average difference if MM>PM
3- MAS5.0: weighted mean of avg.(PM-MM)- Probe intensities preprocessed for global background.- PM-IM intensities are log transformed- Robust mean of probe set values taken using Tukey Biweight.
)}{log(BiweightTukey *jj MMPMsignal −=
MAS4.0/MAS5.0
MAS 5.0 MAS 4.0
Detection p-valueChange p-value
MAS 5.0/GCOS
+--Centralized Data Sharing
+--User Access, Roles, Security
+++-Barcode Support, Automation
++-MIAME Standard Template
++-Publishing to AADM Database
++-Manage & Associate Projects,Experiments, Samples & Data
+++Gene Expression Data Analysis(Statistical Algorithm (CHP)
-++Instrument Control / Data Acquisition
GCOSServer
GCOSClient
MAS
Raw files-comparison files
.Exp
.DAT
.CEL
.CHP
.CHP
Raw CHP file B Raw CHP file A
GCOS
Software status window
Data window
Files window
.Exp file
.DAT file
DAT= scan
Using the .DAT file
1- To identify defective arrays
Using the .DAT file
1- To identify image problem
.DAT file
.CEL file
.CEL= Computerized version of the .DAT file CEL file is used to generate the .CHP file
Raw . CHP file
comparison . CHP file
.RPT file
• Data set QC
GeneChip built-in control 1: % present genes
GeneChip built-in control 2: 3’/5’ratio for “house keeping” genes
Right click a .CHP file
=> Report (.RPT) file
.RPT file
.RPT file
Access to probe cell information
Generate an comparison file
- Drag and drop the experimental CEL file- Choose the baseline file- Enter the output file name
Scatter plot
2X up
2X down
Scatter plot
“Background box”
Next stepsData export in ExcelData export to third party software
Applied Maths, GenExplore™ :BioDiscovery, GeneSight:GeneData AG -Expressionist. LION Bioscience AG'sMolecular Applications Group, Stingray™.MolecularWare, Inc.: ArrayAnalyzerDBPartek, Inc., Partek Pro 2000Rosetta Inpharmatics. Resolver™Scanalytics, Inc. , MicroArray SuiteSilicon Genetics' GeneSpringTMSpotfire, Inc., .Media Cybernetics, Array-Pro(R).Microarray Software developed by Stanford UniversityTIGR (The Institute for Genome Research) offers software tools (free foracademic institutions) for array analysis.
OmniViz, Inc., OmniViz Pro Xpogen Inc., PathlinX
…
Plan
I- GeneChip Operating System (GCOS)
II- Genespring
III- NetAffx
Plan
I- Introdution and potential applications of array platformII- Existing platformsIII- Experimental design
IV- Steps involved in data analysisData set QCNormalizationFeature (gene) filteringReplicate analysisClusteringStatistical testsPathway
Genespring interface
Choice of “genome”
Data import
TXT files from a Genechiparray or from a spottedarray or from any othertype of array as long asyou have a “signal”associated with a identifier(gene, transcript, protein,other)
Samples information
• Sample-centric system (not experiment centric)
• Sample attributes format is MIAME compliant
Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.Nat Genet. 2001 Dec;29(4):365-71.
MIAME goal : to specify the minimum information that must bereported about a microarray experiment in order to ensure itsinterpretability, as well as potential verification of the results
• MIAME format required for microarray data publication
1. Experimental design: the set of the hybridisation experiments as a whole2. Array design: each array used and each element (spot) on the array3. Samples: samples used, the extract preparation and labeling4. Hybridizations: procedures and parameters5. Measurements: images, quantitation, specifications6. Controls: types, values, specifications
Hybridisation ArraySample
Analysis
Experiment Normalisation
6 parts in MIAME
MIAME
Samples information
• Sample-centric system (not experiment centric)
• MIAME compliant sample attributes format
Experiment parameters
Parameters can be used for gene filtering with a statistical test
Gene filtering
Gene filtering
Gene lists
Gene lists
Union/Intersection of gene lists
Venn Diagram
Union/Intersection of gene lists
Venn Diagram
Statistical analysis
Statistical analysis output
•Venn diagram•Clustering•Pathway analysis•…
Clustering tools in GS
Hierarchical clusteringExperiments and samples
Projection methods
•Principal component analysis (PCA)•Multi-Dimensional Scaling (MDS)•Not clustering methods but can beused to determine or visualize clusterstructure if present
Microarray assay life cycle
BiologicalQuestion
SamplePreparation
MicroarrayHybridization
Microarray Detection
Data Analysis& modeling
M.Shena and R. Davis,MIcroarray biochip technology
PLAN
I- GeneChip Operating System (GCOS)
II- Genespring
III-Submission to public repository
IV- NetAffx
Data submission to public repository
Do you submit your data to MIAME compliant microarray public database?Response % Response Total
Always 9.70% 6sometimes 19.40% 12Only if requested by publisher 38.70% 24never 33.90% 21
Total Respondents 62
Which database are you submitting your data to?Response % Response Total
GeneExpression Omnibus- (GEO-NIH) 43.50% 27Array Express (EMBL) 29% 18Other (please specify) 33.90% 21
Total Respondents 62
Data submission to GEO
3 steps process:
1- Submission of theplatform (Array type)
Data submission to GEO
3 steps process:
1- Submission of theplatform (Array type)
2- Submission of thesamples (MIAME)
ID_REF VALUE DETECTION Detection p-valueAFFX-MurIL2_at 13.4 A 0.953518AFFX-MurIL10_at 17.3 A 0.843268AFFX-MurIL4_at 18.1 A 0.749204AFFX-MurFAS_at 15.8 A 0.425962AFFX-BioB-5_at 730.6 P 0.001593AFFX-BioB-M_at 1952.8 P 0.000044AFFX-BioB-3_at 1267.6 P 0.000147AFFX-BioC-5_at 3155.5 P 0.00007AFFX-BioC-3_at 2296.3 P 0.000052AFFX-BioDn-5_at 2987.8 P 0.000044AFFX-BioDn-3_at 16968.8 P 0.00006AFFX-CreX-5_at 31299.5 P 0.000044AFFX-CreX-3_at 47550 P 0.000044AFFX-BioB-5_st 117.8 A 0.165861AFFX-BioB-M_st 155 A 0.108979AFFX-BioB-3_st 179.6 A 0.327079
Data submission to GEO
3 steps process:
1- Submission of theplatform (Array type)
2- Submission of thesamples (MIAME)
3- Submission of a“serie”( experiment)
Plan
I- GeneChip Operating System (GCOS)
II- Genespring
III-Submission to public repository
IV- NetAffx
NetAffx
Def: comprehensive resource of functional annotations and public database
NetAffx
Def: comprehensive resource of functional annotations and public database
Accession number
Access to NetAffx
Free registrationUpdated every quarter
Quick query input
Key wordGene symbolPublic DB numberProbe set name
Quick query output
GO: Gene Ontology Pathway information
Pathway Diagram
Quick query output
Detailed information
•Genechip Array Information•Probe design information•Genomic Alignment of target sequence•Public domain and Genome references•Functional annotations•Sequence
Genechip Array Information
Probe design/ Genomic Alignment
Link to UC Santa Cruz Genome Browser
Public domain
Functional Annotations
Sequence information
Batch query
Batch query
Batch query output
Export data to ExcelGene ontology brower
Gene ontology browser
•DNA microarrayBowtell, Sambrook, CSHL
•A Biologist's Guide to Analysis of DNA Microarray Data Steen Knudsen
•http://ihome.cuhk.edu.hk/%7Eb400559/array.html
•DNA Microarray (genome chip) Leming Shihttp://www.gene-chips.com/
Useful links and lectures
Conclusion
Proteomics
Human Genetics(Genotyping)
ClinicalDatabase
Genomics
Basic ResearchAnimal Models of Human Cancer
Pathway Analysis
GLOBAL UNDERSTANDING OF MOLECULAR BASIS OF CANCER