Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results The Identification of Circadian Clock Genes By Data Mining Microarray Data Atreyi Banerjee and Martin Hunt The University of Leicester June 27, 2008
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Identification of Circadian Clock GenesBy Data Mining Microarray Data
Atreyi Banerjee and Martin Hunt
The University of Leicester
June 27, 2008
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
What is circadian rhythm?
Circadian circa (about) + dies (a day) Circadian rhythm is theself-sustained cycle with 24 hour period that controls rest/activitytime awareness, photosynthesis, etc. Common among eukaryotes(Neurospora, Drosophila, Mammals) Reserved for living organisms(daily traffic congestions is not a circadian rhythm) Circannual 1year period(e.g. migration)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Circadian rhythm properties
Circadian rhythm properties are conserved across plant and animalkingdom Basic properties of circadian rhythm: Endogenous freerunning period of 24 hours Synchronization of stimuli Period isunchanged with temperature Advantage: learn from studyingsimple organisms (Drosophila, Neurospora, Mouse) Mechanismsare similar but the genes are different The main cycling genes:PER, TIM, CLK, CYC, BMAL
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Drosophila
Affymetrix gene chip (Drosgenome 1) assay Identifying circadiangenes Clustering and Heatmap Promoter analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Drosophila circadian oscillator
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Circadian clock control in Drosophila
ADD REFERENCE
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Experimentations
Drosophila entrained in 12:12 hour light dark (LD) cycle Then leftin complete darkness and analysed every 4 hours The final datasetincluded replicas of 4 chips CT0, CT4, CT8, CT12, CT16 andCT20
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Promoter analysis
To detect genes having same regulatory mechanism Extracting the5’ untranslated region of the genes Finding out the overrepresented motifs in the sequences Finding out the cis-regulatorymodules (combination of binding sites) in sets of co-expressed orcoregulated genes Getting the putative transcription factor bindingsites (TFBS) Functional analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Effects of clock mutations on enhancers regulatingcircadian gene expression
Stempfl, T. et al. Genetics 2002;160:571-593
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
TOUCAN software
An interactive java display Map genes onto the Sequence set spaceFlexibilty of using any identifier(Affy ID, EMBL, Refseq etc)Perform statistical tests for finding regulatory sequences, selectingparts of sequences, finding CpG islands in metazoan genome
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Predict instances of known motifs with MotifScanner
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Significant motifs found in each cluster
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Predict cis-regulatory modules with MotifSampler
The co-expression of Dorsal 2 and Myf showing
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The cis-regulatory modules in each cluster
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The cis-regulatory module in genes listed with p-values
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Genscan output of cluster 1
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
List of unknown TFBS found in each cluster
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
de novo discovery of unknown TFBS
MotifSampler tool in TOUCAN used to find unknown motifs whichcould be novel transcription factors The 5’UTR sequences alsoextracted from Ensembl Biomart The over represented TFBS wereextracted from MATCH and OTFBS Dorsal 2 and Myf were overrepresented modules ARNT also found in cycle an important clockgene, was located Genscan predicted genes in each cluster
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• Identifying circadian clock genes
• Promotor Analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• Identifying circadian clock genes
• Promotor Analysis
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Microarray experiment
?
Data (spreadsheet)
?
Process data in R
?
Data analysis in R
?List of circadian genes
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Microarray experiment
?
Data (spreadsheet)
?
Process data in R
?
Data analysis in R
?List of circadian genes
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Four methods considered, all of which were implemented in R:
GeneCycle based
• The Fisher Method (Wichert et al. 2004)
• The Robust Method (Ahdesmaki et al. 2005)
“Sine wave” based
• The M&R Method (McDonald & Rosbash 2001)
• The Sine Method
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Fisher Method
Implemented by the R package GeneCycle, based on Fouriermethods and Fisher’s g test
Time Series:
CT0 = 1.2
CT4 = 4.9
CT8 = 9.5
CT12 = 0.4
CT16 = 1.5
CT20 = −42
- Fisher’s g test - p-value = 0.3213
Repeat this process for each time series
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Fisher Method: FDR
Oops! We’ve carried out over 6000 multiple tests.The solution: false discovery rate (FDR) control, implemented bythe R package fdrtool
Definition
The FDR value is the percentage of false-positives we expect to befound in our results
0.011, 0.021, 0.042, 0.045, 0.056, 0.065, 0.066, . . .
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Robust Method
Also implemented by the R package GeneCycle
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The M&R Method
The M&R Method
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Sine Method
The Sine Method
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Heatmap: The Fisher Method
heatmap of Fisher method
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Heatmap: The Robust Method
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Numbers
How many in genes in common between methods etc
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Fisher Vs Sine Methods
what’s so different about them?
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Conclusions
• Why only use sine waves as a model?
• Is FDR really better than multiple testing?
• Why use GeneCycle?
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Conclusions
• All methods find some circadian clock genes
• . . . and some false positives
• Best approach: use many methods
• There is always a new, better method around the corner . . .