Analysis of GO annotation at cluster level
by H. Bjørn NielsenSlides from Agnieszka S. Juncker
Sample PreparationHybridization
Array designProbe design
QuestionExperimental Design
Buy Chip/Array
Statistical AnalysisFit to Model (time series)
Expression IndexCalculation
Advanced Data Analysis
Clustering PCA Classification Promoter Analysis
Meta analysis Survival analysis Regulatory Network
Normalization
Image analysis
The DNA Array Analysis Pipeline
ComparableGene Expression Data
GO annotations
Gene Ontology
Gene Ontology (GO) is a collection of controlled vocabularies describing the biology of a gene product in any organism
There are 3 independent sets of vocabularies, or ontologies:
• Molecular Function (MF)– e.g. ”DNA binding” and ”catalytic activity”
• Cellular Component (CC)– e.g. ”organelle membrane” and ”cytoskeleton”
• Biological Process (BP)– e.g. ”DNA replication” and ”response to stimulus”
Gene Ontology structure
GO structure, example 2
KEGG pathways
• KEGG PATHWAYS:– collection of manually drawn pathway maps representing our
knowledge on the molecular interaction and reaction networks, for a large selection of organisms
• 1. Metabolism– Carbohydrate, Energy, Lipid, Nucleotide, Amino acid, Other
amino acid, Glycan, PK/NRP, Cofactor/vitamin, Secondary metabolite, Xenobiotics
• 2. Genetic Information Processing• 3. Environmental Information Processing • 4. Cellular Processes• 5. Human Diseases • 6. Drug Development
KEGG pathway example 1
KEGG pathway example 2
Cluster analysis and GO
Analysis example:
• Partitioning clustering of genes into e.g. 10 clusters based on expression profiles
• Assignment of GO terms to genes in clusters
• Looking for GO terms overrepresented in clusters
Hypergeometric test
• The hypergeometric distribution arises from sampling from a fixed population.
10 balls
• We want to calculate the probability for drawing 7 or more white balls out of 10 balls given the distribution of balls in the urn
20 white ballsout of
100 balls
Yeast cell cycle
Time series
experiment:
Gene expression
profiles:
Time
YY
YY
Y
Y
Y
Time
Gene1
Gene2
Sampling
Exercise
Find it on the course page