Introduction Method Simulation study Discussion Bibliography Miscellaneous Functional data analysis for activity profiles from wearable devices Ian McKeague Joint work with Hsin-wen Chang Institute of Statistical Science, Academia Sinica September 16, 2019
35
Embed
Functional data analysis for activity profiles from ... · IntroductionMethodSimulation study DiscussionBibliographyMiscellaneous Functional data analysis for activity pro les from
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Functional data analysis for activity profilesfrom wearable devices
Ian McKeague
Joint work with Hsin-wen ChangInstitute of Statistical Science, Academia Sinica
September 16, 2019
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Outline
• Motivation: inference for sensor data from wearable devices
• Activity profiles based on sensor data (no pre-alignment)
• Empirical likelihood based confidence bands and functionalANOVA testing for mean activity profiles
• Monotonic functional data: no need for smoothing
• Application: accelerometer data from NHANES
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Wearable device data
• Inexpensive wearable sensors generate massive amounts ofreal-time data, with potentially exciting applications tophysiological monitoring and health care delivery (mHealth).
• Inferential methods for comparing treatments based onwearable device (outcome) data not well developed.
• Serious challenges: unmeasured time-dependent confounders(e.g., circadian and dietary patterns), highly non-stationary,difficult to align across subjects, missing data, . . . .
• Connection to precision medicine: reinforcement learning formHealth (Murphy et al., 2017):http://papers.nips.cc/paper/7179-action-centered-contextual-bandits.pdf
Tradeoff between exploration and exploitation.Assumes stationarity.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Example: blood pressure monitoring
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Example: sweat monitoring (really!)
Noninvasive alternative to blood glucose monitoring (Nyein et al. 2019)
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Example: real-time sweat measurements
Patches worn on the forehead, forearm, underarm, and back, andsweat parameters monitored simultaneously.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Example: physical activity monitoring
• National Health and Nutrition Examination Survey (NHANES)
• Accelerometer ‘counts’ recorded for 7 consecutive days in1-minute epochs using an ActiGraph device
• Goal: to compare groups of subjects using their activity profiles
• activity profile: the amount of time activity exceeds some level
• Typically in the physical activity literature, activity is classifiedusing selected thresholds (e.g., as “sedentary”)
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Example: gene therapy for mitochondrial disease
5, 000 children are born with mitochondrial disease each year in the US.
Columbia RCT: 40 patients. Accelerometer: activPAL.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
ActiGraph accelerometer readings (NHANES)
0 2000 4000 6000 8000 10000
020
0040
0060
00
time (unit = 1 minute)
inte
nsity
cou
nts
0 2000 4000 6000 8000 10000
020
0060
00
time (unit = 1 minute)
inte
nsity
cou
nts
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Activity profiles as monotonic functional data
Sensor readings X (t), t ∈ [0, τ ] generate an activity profile:
Ta = Leb({t ∈ [0, τ ] : X (t) > a}), a ∈ R.
Sensor readings X (t) over 25-minutes with activity Ta = 9 minutesabove level a = 0.1, and Ta = 16 minutes above level a = −0.1.
Note: Need to avoid pre-alignment of sensor data among subjects
(needed in standard FDA approaches).
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Raw and mean activity profiles from the NHANES data
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Functional ANOVA for activity profiles
Goal: Compare k mean activity profiles µj(a) = ETaj , j = 1, . . . , k.Functional ANOVA: tests µ1(·) = . . . = µk(·) vs. omnibus alternative
• Taj = Leb({t ∈ [0, τ ] : Xj(t) > a}), where
Xj = {Xj(t), t ∈ [0, τ ]}
for sensor readings in the jth group, τ is total study time
• Observe nj iid copies
{Taj1, . . . ,Tajnj , a ∈ [α1, α2]}
of the activity profile Taj . Weaker than iid observations of Xj .[α1, α2] is the range of device readings of interest
• Approach based on a nonparametric likelihood ratio procedure:empirical likelihood (EL)
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Empirical likelihood (EL)
• EL involves forming a ratio of two nonparametric likelihoodssubject to constraints on the parameters of interest
• Two early papers: [Thomas and Grunkemeier, 1975],[Owen, 1988]
• Produces highly accurate confidence regions [Owen, 2001] andtests with optimal power [Kitamura et al., 2012]
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Empirical likelihood
Observe X1, . . . ,Xn ∼ iid F , µ = µ(F ) a parameter of interest.
NP likelihood ratio:
R(µ0) =sup{L(F ) : µ(F ) = µ0}
sup{L(F )}
L(F ) =∏n
i=1 pi is the NP likelihood, pi = point mass (of F ) at Xi .
Hypothesis tests:
Accept µ(F ) = µ0 when R(µ0) ≥ r0 for some threshold r0.
Confidence regions: {µ : R(µ) ≥ r0}
Introduction Method Simulation study Discussion Bibliography Miscellaneous
EL for means
µ = E (X )
R(µ) = max
{n∏
i=1
npi :n∑
i=1
piXi = µ, pi ≥ 0,n∑
i=1
pi = 1
}
Chi-squared calibration: Wilks type theorem for −2 logR(µ0).
Introduction Method Simulation study Discussion Bibliography Miscellaneous
EL for quantiles
Estimating equation:
E (m(X , µ)) = 0, where for the α-quantile
m(X , µ) = 1{X ≤ µ} − α.
R(µ) = max
{n∏
i=1
npi :n∑
i=1
pim(Xi , µ) = 0, pi ≥ 0,n∑
i=1
pi = 1
}
Chi-squared calibration:
Wilks theorem still applies: replace Xi − µ0 by m(X , µ0).
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Activity profiles: relevant references
• Functional data literature• Wald-type ANOVA tests requiring curve registration
[Gorecki and Smaga, 2018]• EL-based tests in a concurrent linear model
[Wang et al., 2018], requiring curve registration and smoothing• Curve registration/alignment only useful on raw sensor data• Time warping alters the activity profiles!
• Physical activity literature• Only considers activity profiles at a few activity levels
• e.g., the time spent in sedentary behavior could berepresented by the accumulated amount of time below 100counts/minute [Matthews et al., 2008]
• The levels are typically chosen in an ad hoc fashion
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Our contribution
EL-based functional ANOVA test for comparing groups of subjectsbased on their activity profile data, i.e., an omnibus test of
H0 : µ1(·) = . . . = µk(·)
• greater efficiency using EL
• avoids issues in pre-aligning sensor data
• no smoothing needed (as activity profiles are monotonic)
• analyze entire activity profiles
• EL-based simultaneous confidence bands
• approach also applies to the quantiles of activity profiles
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Approach applies to functions of bounded variation
Example: Area covered by Arctic sea ice (Nature, Sept 2019)
Example: Canadian temperature data (Ramsay & Silverman)
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Canadian temperature data
Average daily temperature at 35 Canadian weather stations.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
EL-based ANOVA test for activity profiles
• For an activity level a, construct the local EL ratio as
R(a) =sup
{∏kj=1 L(Faj) : µ1(a) = . . . = µk(a)
}sup
{∏kj=1 L(Faj)
}L(Faj) is the NP likelihood based on observation of Taj
• To test H0 we propose the maximally selected EL statistic:
Kn = supa∈[α1,α2]
[−2 logR(a)] .
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Wilks type theorem for the EL-based ANOVA test
Suppose nj/n→ γj > 0 and infa∈[α1,α2]Var(Tja) > 0, for each j .
Then, under H0,
Knd−→ sup
a∈[α1,α2]
k∑j=1
wj(a)
[Uj(a)√wj(a)
− U(a)
]2
,
U(a) =k∑
j=1
√wj(a)Uj(a),
Uj are independent zero-mean Gaussian processes, and the weightswj(a) ∝ γj/Var(Tja) are normalized to sum to 1 across the groups.
Proof: Bracketing-entropy CLT for stochastic processes with monotone sample
paths furnishes Uj as the limit of the process√nj{µj(·)− µj(·)}/σj(·).
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Nonparametric bootstrap calibration
• The limiting distribution can be bootstrapped by replacingUj(a) by its nonparametric bootstrap
U∗j (a) =√nj{µ∗j (a)− µj(a)}/σj(a)
and replacing other unknowns by their estimates.
• µ∗j (a) is obtained by evaluating µj(a) after resampling with
replacement from {Taj1, . . . ,Tajnj}, with each Taji regarded asfunction of a
• Let M∗n denote the resulting bootstrap
• Simulate M∗n by repeatedly resampling
• Compare the empirical quantiles of these bootstrapped valuesM∗n with our test statistic Kn
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Simulation study
• Compare our approach with tests from R package fdANOVA:• Fmaxb: a maximally-selected F -statistic• GPF: an integrated F -statistic• TRP: random projections
that apply to generic functional data.
• Striking differences in performance if groups are unbalanced
• Simulation model:• Generate Xj(·) as positive part of a scaled OU process;
multiply the resulting Taj by an independent beta r.v.• k = 3, each group/scenario with distinct OU/beta parameters.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Table: Empirical rejection rates (percentages) for functional ANOVA testsunder various scenarios and sample sizes, based on 1000 Monte Carloreplications, 1000 bootstrap samples, and a nominal level of 5%.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
NHANES data revisited
Sample means of raw accelerometer readings (in 4 consecutive days)
comparing veterans aged 75-and-older (acqua) and veterans aged 65–74
(coral). Differences apparent even without curve alignment/smoothing.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Confidence bands of mean activity profiles
Right: EL (black), Wald-type (red), and MFD (blue) 95% simultaneousconfidence bands for the mean activity profile (estimate in gray) of veteransaged-75-and-older, showing that the EL band is narrower than the Wald-typeband and similar to the MFD band at most activity levels.
MFD (mean of functional data) band: uses local linear smoothing with
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Applying the various tests
Table: p-values from various functional ANOVA tests: veterans aged75-and-older (group 1), non-veterans aged 75-and-older (group 2),veterans aged 65–74 (group 3), and non-veterans aged 65–74 (group 4).
Comparison EL test GPF Fmaxb TRP
all groups < 0.001 < 0.001 < 0.001 < 0.001group 1 vs 2 0.010 0.060 0.016 0.033group 3 vs 4 0.345 0.416 0.365 0.579group 1 vs 3 < 0.001 < 0.001 < 0.001 < 0.001group 2 vs 4 < 0.001 < 0.001 < 0.001 < 0.001
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Conclusion
• We have developed a new functional ANOVA test based on amaximally-selected local empirical likelihood statistic
• Approach applies generally to functional data with samplepaths of bounded variation. Smoothing avoided.
• Simulation study shows that the new test is more accurateand more powerful than standard FDA approaches
• We applied the proposed method to wearable device datafrom NHANES and obtained more significant results thanexisting functional ANOVA tests
• Directions for future work: gaps in sensor observations,activity profiles regressed on high-dimensional predictors . . .
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Thank you!
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Degras, D. A. (2011).Simultaneous confidence bands for nonparametric regressionwith functional data.Statistica Sinica, 21(4):1735–1765.
Degras, D. A. (2017).Simultaneous confidence bands for the mean of functionaldata.Wiley Interdisciplinary Reviews: Computational Statistics,9(3):e1397.
Gorecki, T. and Smaga, L. (2018).fdANOVA: an R software package for analysis of variance forunivariate and multivariate functional data.Computational Statistics.https://doi.org/10.1007/s00180-018-0842-7.
Kitamura, Y., Santos, A., and Shaikh, A. M. (2012).
Introduction Method Simulation study Discussion Bibliography Miscellaneous
On the asymptotic optimality of empirical likelihood for testingmoment restrictions.Econometrica, 80(1):413–423.
Matthews, C. E., Chen, K. Y., Freedson, P. S., Buchowski,M. S., Beech, B. M., Pate, R. R., and Troiano, R. P. (2008).Amount of time spent in sedentary behaviors in the UnitedStates, 2003–2004.American Journal of Epidemiology, 167(7):875–881.
Owen, A. B. (1988).Empirical likelihood ratio confidence intervals for a singlefunctional.Biometrika, 75(2):237–249.
Owen, A. B. (2001).Empirical Likelihood.Chapman & Hall/CRC, Boca Raton.
Thomas, D. R. and Grunkemeier, G. L. (1975).
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Confidence interval estimation of survival probabilities forcensored data.Journal of the American Statistical Association, 70:865–871.
Wang, H., Zhong, P.-S., Cui, Y., and Li, Y. (2018).Unified empirical likelihood ratio tests for functionalconcurrent linear models and the phase transition from sparseto dense functional data.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 80(2):343–364.
Introduction Method Simulation study Discussion Bibliography Miscellaneous
Specifying [α1, α2]
• In practice α1 and α2 may be specified by practitioners basedon a range of accelerometer readings available in theparticular context
• They could also be chosen in a data-driven fashion, sayα1 = inf{a : µ(a) < 0.95τ} and α2 = sup{a : µ(a) > 0.05τ};this is what we use in our simulation studies and data analysis