1 Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms http://research.stowers-institute.org/efg/2004/ CAMDA Critical Assessment of Microarray Data Analysis Conference Earl F. Glynn Stowers Institute Arcady R. Mushegian Stowers Institute & Univ. of Kansas Medical Center Jie Chen Stowers Institute & Univ. of Missouri Kansas City
37
Embed
Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms
Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms. http://research.stowers-institute.org/efg/2004/CAMDA Critical Assessment of Microarray Data Analysis Conference November 11, 2004. Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Searching forPeriodic Gene Expression Patterns Using Lomb-Scargle Periodograms
A weak diurnal period is visible in “mean” data profile.
23
Bozdech’s Plasmodium dataset:
2. Apply Lomb-Scargle Algorithm
0 10 20 30 40
-2-1
01
i3518_1
Time [hours]
Ex
pre
ss
ion
N = 46
Time Interval Variability
log10(delta T)
Fre
qu
en
cy
-1.0 -0.5 0.0 0.5 1.0
01
02
03
04
0
0.00 0.05 0.10 0.15 0.20
05
10
15
20
25
Lomb-Scargle Periodogram
No
rma
lize
d P
ow
er
Sp
ec
tra
l D
en
sit
y
p = 0.05
p = 0.01
p = 0.001
p = 1e-04
p = 1e-05
p = 1e-06
Period at Peak = 45.7 hours
0.00 0.05 0.10 0.15 0.20
0.0
0.2
0.4
0.6
0.8
1.0
Peak Significance
Pro
ba
bili
ty
p = 1.48e-008 at Peak
Periodic Expression Patterns
0 10 20 30 40
-4-2
02
opfi17638
Time [hours]
Ex
pre
ss
ion
N = 46
Time Interval Variability
log10(delta T)
Fre
qu
en
cy
-1.0 -0.5 0.0 0.5 1.0
01
02
03
04
0
0.00 0.05 0.10 0.15 0.20
05
10
15
20
25
Lomb-Scargle Periodogram
No
rma
lize
d P
ow
er
Sp
ec
tra
l D
en
sit
y
p = 0.05
p = 0.01
p = 0.001
p = 1e-04
p = 1e-05
p = 1e-06
Period at Peak = 45.7 hours
0.00 0.05 0.10 0.15 0.20
0.0
0.2
0.4
0.6
0.8
1.0
Peak Significance
Pro
ba
bili
ty
p = 1.19e-008 at Peak
Examples of highly-significant periodic expression profiles.
24
Bozdech’s Plasmodium dataset:
2. Apply Lomb-Scargle Algorithm
0 10 20 30 40
-0.5
0.0
0.5
1.0
j167_5
Time [hours]
Ex
pre
ss
ion
N = 35
Time Interval Variability
log10(delta T)
Fre
qu
en
cy
-1.0 -0.5 0.0 0.5 1.0
05
10
15
20
25
0.00 0.05 0.10 0.15 0.20
05
10
15
20
25
Lomb-Scargle Periodogram
No
rma
lize
d P
ow
er
Sp
ec
tra
l D
en
sit
y
p = 0.05
p = 0.01
p = 0.001
p = 1e-04
p = 1e-05
p = 1e-06
Period at Peak = 17.8 hours
0.00 0.05 0.10 0.15 0.20
0.0
0.2
0.4
0.6
0.8
1.0
Peak Significance
Pro
ba
bili
ty
p = 0.998 at Peak
Aperiodic/Noise Expression Patterns
0 10 20 30 40
-1.0
-0.5
0.0
0.5
1.0
1.5
f35105_2
Time [hours]
Ex
pre
ss
ion
N = 45
Time Interval Variability
log10(delta T)
Fre
qu
en
cy
-1.0 -0.5 0.0 0.5 1.0
01
02
03
04
0
0.00 0.05 0.10 0.15 0.20
05
10
15
20
25
Lomb-Scargle Periodogram
No
rma
lize
d P
ow
er
Sp
ec
tra
l D
en
sit
y
p = 0.05
p = 0.01
p = 0.001
p = 1e-04
p = 1e-05
p = 1e-06
Period at Peak = 32 hours
0.00 0.05 0.10 0.15 0.20
0.0
0.2
0.4
0.6
0.8
1.0
Peak Significance
Pro
ba
bili
ty
p = 0.516 at Peak
25
Bozdech’s Plasmodium dataset:
2. Apply Lomb-Scargle Algorithm
0 10 20 30 40
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
f58149_1
Time [hours]
Ex
pre
ss
ion
N = 39
Time Interval Variability
log10(delta T)
Fre
qu
en
cy
-1.0 -0.5 0.0 0.5 1.0
05
10
15
20
25
30
0.00 0.05 0.10 0.15 0.20
05
10
15
20
25
Lomb-Scargle Periodogram
No
rma
lize
d P
ow
er
Sp
ec
tra
l D
en
sit
y
p = 0.05
p = 0.01
p = 0.001
p = 1e-04
p = 1e-05
p = 1e-06
Period at Peak = 48 hours
0.00 0.05 0.10 0.15 0.20
0.0
0.2
0.4
0.6
0.8
1.0
Peak Significance
Pro
ba
bili
ty
p = 8.54e-006 at Peak
Small “N”
N=39
0 10 20 30 40
-3-2
-10
12
n170_1
Time [hours]
Ex
pre
ss
ion
N = 32
Time Interval Variability
log10(delta T)
Fre
qu
en
cy
-1.0 -0.5 0.0 0.5 1.0
05
10
15
20
25
30
0.00 0.05 0.10 0.15 0.20
05
10
15
20
25
Lomb-Scargle Periodogram
No
rma
lize
d P
ow
er
Sp
ec
tra
l D
en
sit
y
p = 0.05
p = 0.01
p = 0.001
p = 1e-04
p = 1e-05
p = 1e-06
Period at Peak = 64 hours
0.00 0.05 0.10 0.15 0.20
0.0
0.2
0.4
0.6
0.8
1.0
Peak Significance
Pro
ba
bili
ty
p = 2.74e-005 at Peak
N=32
26
Bozdech’s Plasmodium dataset:
2. Apply Lomb-Scargle AlgorithmSignal and Noise Mixture
'p' histogram
log10(p)
Num
ber
of P
robe
s
-8 -6 -4 -2 0
050
100
150
200
Complete Bozdech set of 6875 probes
Periodic Probes Aperiodic Probes or Noise
histogram-log10p.pdf 2004-11-06 10:26
27
Bozdech’s Plasmodium dataset:
3. Apply Multiple-Hypothesis Testing
= 1E-4
Bonferroni
Holm
Hochberg
Benjamini &Hochberg FDR
None
More False Negatives
More False Positives
0 1000 2000 3000 4000 5000 6000 7000
-8-6
-4-2
0
Multiple Testing Correction Methods
Rank Order of Sorted p Values
Log1
0(p)
bonferroniholmhochbergfdrnone
(Using R's p.adjust methods)
p-adjust.pdf 2004-11-06 10:12
Significance
28
Bozdech’s Plasmodium dataset:
3. Apply Multiple-Hypothesis Testing
p Adjustment
Method
Significance Level
0.05 0.01 0.001 0.0001 0.00001
Bonferroni 3707 3050 1461 13 0
Holm 3995 3351 1705 13 0
Hochberg 4009 3359 1723 15 0
Benjamini & Hochberg FDR
5618 5315 4906 4358 3584
None 5648 5351 4961 4456 3823A priori plan: Use Benjamini & Hochberg FDR level of 0.0001.
Observed number of periodic probes consistent with biological observation of ~60% of Plasmodium genome being transcriptionally active during the intraerythrocytic developmental cycle.
Unclear how to apply Bozdech’s ad hoc “Overview” criteria for use with Lomb-Scargle method: “70% power in max frequency with top 75% of max frequency magnitude.”
The best 3711 Lomb-Scargle “p” values contained 3449 (92.9%) of the Overview probes.
• Dominant frequency band corresponds to 48-hr period
•Are “weak” bands indicative of complex expression, perhaps a diurnal component, or an asymmetric “duty cycle”?Period
33
SummaryLomb-Scargle Method Fourier Method
Weights data points Weights frequency intervals
No special requirement Requires uniform spacing
No special processing Missing data imputed
No special requirement 2N points for FFT; 0 padding
Known statistical properties Permutation tests needed to assess statistical properties
Use “p” values Ad hoc scoring rules
Need estimate of number of “independent frequencies” but explore using continuum
Usually only look at “independent” Fourier frequencies
34
Conclusions• Lomb-Scargle periodogram is effective tool
to identify periodic gene expression profiles
• Results comparable with Fourier analysis
• Lomb-Scargle can help when data are missing or not evenly spaced
We wanted to validate the Lomb-Scargle method before applying to our somitogenesis problem, since the Fourier technique would be difficult to use. Scargle (1982): “surprising result is that the … spectrum of a process can be estimated … [with] only the order of the samples ...”
35
Conclusions
• Conclusions should not be drawn using the individual p-value calculated for each profile. A multiple comparison procedure False Discovery Rate (FDR) must be used to control the error rate.
• Expression profiles may be more complex than simple cosine curves
• Power spectra of non-sinusoid rhythms are more difficult to interpret