Top Banner
Biometrics 68, 825–836 September 2012 DOI: 10.1111/j.1541-0420.2012.01744.x Evolutionary Factor Analysis of Replicated Time Series Giovanni Motta 1, and Hernando Ombao 2, ∗∗ 1 Department of Quantitative Economics, Maastricht University P.O.Box 616, 6200 MD Maastricht, The Netherlands 2 Department of Statistics, University of California Irvine Irvine, California 92697, U.S.A. email: [email protected] ∗∗ email: [email protected] Summary. In this article, we develop a novel method that explains the dynamic structure of multi-channel electroencephalo- grams (EEGs) recorded from several trials in a motor–visual task experiment. Preliminary analyses of our data suggest two statistical challenges. First, the variance at each channel and cross-covariance between each pair of channels evolve over time. Moreover, the cross-covariance profiles display a common structure across all pairs, and these features consistently appear across all trials. In the light of these features, we develop a novel evolutionary factor model (EFM) for multi-channel EEG data that systematically integrates information across replicated trials and allows for smoothly time-varying factor loadings. The individual EEGs series share common features across trials, thus, suggesting the need to pool information across tri- als, which motivates the use of the EFM for replicated time series. We explain the common co-movements of EEG signals through the existence of a small number of common factors. These latent factors are primarily responsible for processing the visual–motor task which, through the loadings, drive the behavior of the signals observed at different channels. The estimation of the time-varying loadings is based on the spectral decomposition of the estimated time-varying covariance matrix. Key words: Electroencephalography; Factor models; Local stationarity; Principal components. 1. Introduction In this article, we develop a novel statistical method based on time-varying principal components for analyzing replicated nonstationary multichannel electroencephalograms (EEGs) which were collected for the purpose of investigating the brain dynamics during processing of a motor–visual task. The dataset consists of 62 channel electroencephalograms (EEGs) recorded from one healthy subject in a hand-guided visual– motor experiment. These EEGs were recorded over several replicated identical trials. Preliminary analyses of the data demonstrate that the multiple channels are cross-correlated and that the EEGs exhibit nonstationary behavior. In Figure 1, we plot two series observed at two different locations; the time-varying variance of one series and the time-varying covariance between the two series. The covariance matrix is time-varying, indicating that the series have time-varying mo- ments and hence, are nonstationary. Moreover, the entries of the covariance matrix vary over time in a very similar way. This suggests the use of dimension reduction techniques that exploit the co-movements. For this reason, we projected the 62 series on the main three evolu- tionary (i.e., time-varying) principal components, obtaining the so-called evolutionary common components. The similar- ity between the original series and the estimated common components (that are extracted from the 62 × 62 time-varying covariance matrix using only three factors) confirm the valid- ity of our dimension reduction method for nonstationary time series. The multichannel EEGs display nonstationary behavior be- cause their variances and cross-covariances appear to change slowly over time (within a trial). To account for this feature, we shall adopt the concept of local stationarity introduced by Dahlhaus (1997) which provides a rigorous framework for the treatment of nonstationarity. The definition of local stationar- ity, based on rescaled time u = t T , where T is the length of the time series, gives a meaningful asymptotic theory of statistical inference. In addition to nonstationarity, the EEGs appear to be highly cross-correlated (strong correlation between chan- nels). In fact, when considered jointly as a multivariate time series, the EEGs exhibit a common behavior driven by some latent factors. Our proposed methodology shares a common goal with that developed in Prado, West, and Krystal (2001), namely the application of nonstationary latent factor models for multi- variate EEG series. In particular, as in Prado et al. (2001), we adopt time-varying parameters to model the nonstation- arity. However, there are three differences in the approaches. First, the time-varying coefficients in Prado et al. (2001) are stochastic, whereas in our approach they are deterministic. We believe that this is an important feature of our approach because it is reasonable to assume that the loadings that weight the latent factors change over time (during the ex- periment) in a deterministic way. Second, the loadings in the model by Prado et al. (2001) have an auto-regressive prior, whereas our loadings are completely nonparametric and thus our model is more flexible. However, if the parametric model is C 2012, The International Biometric Society 825
12

Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Oct 07, 2018

Download

Documents

trinhdang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Biometrics 68, 825–836

September 2012DOI: 10.1111/j.1541-0420.2012.01744.x

Evolutionary Factor Analysis of Replicated Time Series

Giovanni Motta1,∗ and Hernando Ombao2,∗∗

1Department of Quantitative Economics, Maastricht University P.O.Box 616, 6200 MD Maastricht,The Netherlands

2Department of Statistics, University of California Irvine Irvine, California 92697, U.S.A.∗email: [email protected]

∗∗email: [email protected]

Summary. In this article, we develop a novel method that explains the dynamic structure of multi-channel electroencephalo-grams (EEGs) recorded from several trials in a motor–visual task experiment. Preliminary analyses of our data suggest twostatistical challenges. First, the variance at each channel and cross-covariance between each pair of channels evolve over time.Moreover, the cross-covariance profiles display a common structure across all pairs, and these features consistently appearacross all trials. In the light of these features, we develop a novel evolutionary factor model (EFM) for multi-channel EEGdata that systematically integrates information across replicated trials and allows for smoothly time-varying factor loadings.The individual EEGs series share common features across trials, thus, suggesting the need to pool information across tri-als, which motivates the use of the EFM for replicated time series. We explain the common co-movements of EEG signalsthrough the existence of a small number of common factors. These latent factors are primarily responsible for processingthe visual–motor task which, through the loadings, drive the behavior of the signals observed at different channels. Theestimation of the time-varying loadings is based on the spectral decomposition of the estimated time-varying covariancematrix.

Key words: Electroencephalography; Factor models; Local stationarity; Principal components.

1. IntroductionIn this article, we develop a novel statistical method based ontime-varying principal components for analyzing replicatednonstationary multichannel electroencephalograms (EEGs)which were collected for the purpose of investigating thebrain dynamics during processing of a motor–visual task. Thedataset consists of 62 channel electroencephalograms (EEGs)recorded from one healthy subject in a hand-guided visual–motor experiment. These EEGs were recorded over severalreplicated identical trials. Preliminary analyses of the datademonstrate that the multiple channels are cross-correlatedand that the EEGs exhibit nonstationary behavior. InFigure 1, we plot two series observed at two different locations;the time-varying variance of one series and the time-varyingcovariance between the two series. The covariance matrix istime-varying, indicating that the series have time-varying mo-ments and hence, are nonstationary.

Moreover, the entries of the covariance matrix vary overtime in a very similar way. This suggests the use of dimensionreduction techniques that exploit the co-movements. For thisreason, we projected the 62 series on the main three evolu-tionary (i.e., time-varying) principal components, obtainingthe so-called evolutionary common components. The similar-ity between the original series and the estimated commoncomponents (that are extracted from the 62 × 62 time-varyingcovariance matrix using only three factors) confirm the valid-ity of our dimension reduction method for nonstationary timeseries.

The multichannel EEGs display nonstationary behavior be-cause their variances and cross-covariances appear to changeslowly over time (within a trial). To account for this feature,we shall adopt the concept of local stationarity introduced byDahlhaus (1997) which provides a rigorous framework for thetreatment of nonstationarity. The definition of local stationar-ity, based on rescaled time u = t

T, where T is the length of the

time series, gives a meaningful asymptotic theory of statisticalinference. In addition to nonstationarity, the EEGs appear tobe highly cross-correlated (strong correlation between chan-nels). In fact, when considered jointly as a multivariate timeseries, the EEGs exhibit a common behavior driven by somelatent factors.

Our proposed methodology shares a common goal with thatdeveloped in Prado, West, and Krystal (2001), namely theapplication of nonstationary latent factor models for multi-variate EEG series. In particular, as in Prado et al. (2001),we adopt time-varying parameters to model the nonstation-arity. However, there are three differences in the approaches.First, the time-varying coefficients in Prado et al. (2001) arestochastic, whereas in our approach they are deterministic.We believe that this is an important feature of our approachbecause it is reasonable to assume that the loadings thatweight the latent factors change over time (during the ex-periment) in a deterministic way. Second, the loadings in themodel by Prado et al. (2001) have an auto-regressive prior,whereas our loadings are completely nonparametric and thusour model is more flexible. However, if the parametric model is

C© 2012, The International Biometric Society 825

Page 2: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

826 Biometrics, September 2012

50 100 150 200 250 300 350 400 450 500

-2

-1

0

1

2

3

4

EEG series at location 18 for left condition at trial n. 10

50 100 150 200 250 300 350 400 450 500

-2

-1

0

1

2

3

4

EEG series at location 23 for left condition at trial n. 10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.7

0.8

0.9

1

1.1

1.2

1.3

1.4

t-v variance at location 18 for left condition

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.4

0.5

0.6

0.7

0.8

0.9

1t-v covariance between locations 18 and 23 for left condition

Figure 1. The subject was commanded to move the joystick many times, with a total of R = 118 trials. For each trial, a timeseries is recorded for an interval of one second. The start of the time series is 500 milliseconds before the command. The endof the time series is 500 milliseconds after the command. Here, one trial will have a total of T = 512 time points. On the leftside, we plot EEG data Yi (t, r) versus the estimated common components Xi (t, r), for i = 18, 23, t = 1, . . . , T and r = 10. Onthe right-top side, we plot the time-varying variance of Y18(t, r), whereas the right-bottom side is the estimated time-varyingcovariance between Y18(t, r) and Y23(t, r) in rescaled time u ∈ (0, 1). We only report the results for left condition. The shapesof the time-varying covariances corresponding to right condition are very similar to those reported in this figure. This figureappears in color in the electronic version of this article.

Page 3: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Evolutionary Factor Analysis of Replicated Time Series 827

adequate in representing the latent factors, then the approachin Prado et al. (2001) gives further significant interpretationof the dynamics of the factors. Third, the latent factors in ourmodel are allowed to vary across several trials. Our approachcan, thus, be easily extended to the situation where one needsto consider trial-specific effects by adding some random term(perhaps as a multiplier) to the time-varying loadings. Onewould imagine that this is also possible in the approach ofPrado et al. (2001) by adding another layer or hierarchy inthe Bayesian analysis.

A traditional tool for dimension reduction is the Princi-pal Components Analysis (PCA). For nonstationary time se-ries, Ombao and Ho (2006) employed the spectral represen-tation of a locally stationary multichannel signal and useda localized version of PCA in the frequency domain to ex-tract the relevant nonredundant information from massivehigh dimensional EEG signals. In a related approach, Ombao,von Sachs, and Guo (2005) used smooth localized complexexponentials (SLEX) waveforms, and developed a methodto extract nonredundant spectral information by applyinga time-varying eigenvalue-eigenvector decomposition to thetime-varying SLEX spectral density matrix.

In this article, we propose a new factor model for high di-mensional nonstationary time series under the setting wherethere are several multichannel time series obtained from repli-cated trials. The nonstationarity in the data is fully explainedby the factor loadings which are smooth functions of timeand estimated by the eigenvectors of a nonparametric esti-mator of the time-varying covariance matrix. Our approach,inspired by Motta, Hafner, and von Sachs (2011), has thefollowing optimal properties. From the point of view of mod-eling, our model does not assume any particular parametricform for the factors and hence does not suffer from modelmisspecification. From the estimation viewpoint, our method-ology is flexible as the underlying model is nonparametricand thus fully adaptive. We also benefit from an automatic(data-driven) bandwidth selection procedure (see Section 3.1below). From a practical view point, our method systemati-cally integrates variation across trials and could be extendedto allow for testing of differences across experimental condi-tions and patient groups. From the theoretical point of view,our methodology is consistent as it gives mean-squared con-sistent results. It combines information from several trials ina computationally efficient manner, in the sense that we ben-efit from a faster rate of convergence of the estimators. Theincrease in the speed of the rate is given by the square rootof the number of trials.

The main contributions of this article are as follows. We de-velop a parsimonious representation of the multivariate timeseries with evolutionary second-order properties (i.e., time-varying covariance). Moreover, we rigorously develop a non-parametric method for the estimation of and inference on thestochastic properties of the time-varying weights that load theunderlying factors.

Throughout the article we use bold-unslanted letters formatrices, bold-slanted letters for vectors and unbold (normal)letters for scalars. Random operators are denoted with super-script when they depend on diverging sizes (namely, T , P ,and R). When needed, vectors and matrices are indexed bysubscripts denoting their size. We denote by tr(·) the trace

operator, by rk(A) the rank of a matrix A, by In the iden-tity matrix of dimension n, by Om ,n the null matrix of di-mension m × n and by ‖·‖ the Frobenius (Euclidean) norm,that is, ‖A‖ =

√tr(A′A). We call trials the (independent)

R replications of the same experiment, and channels (or lo-cations, equivalently) the spots where the P electrodes arelocated. They are indexed, respectively, as r = 1, . . . , R, andi = 1, . . . , P . The time series index is t = 1, . . . , T , where T isthe sample size.

2. Background on Evolutionary Factor AnalysisLinear factor models are both intuitively appealing and highlyrelevant for analyzing EEG data because they are a model-based dimension-reduction approach for characterizing thecomplex temporal dynamics of a high dimensional multivari-ate time series using only a few common factors. Thus, thisapproach has a strong potential for providing a statistically-principled basis for making inference on massive high dimen-sional brain data.

In traditional factor models, the process is treated as beingtemporally stationary which appears to be restrictive because,over long time periods, factor loadings are highly unlikely toremain constant. A promising approach is to model uncon-ditional variances and covariances via nonparametric estima-tion, which imposes very little structure on the unconditionalcovariance matrix while being very easy to estimate.

Motta et al. (2011) generalized the asymptotic theory ofFactor Analysis given by Bai (2003) to the locally stationarycase from both the identification and the estimation points ofview. Given a P × 1 vector of observations Y , they considera factor model with P × q time-varying factor loadings Λ

Y TP (t) = XT

P (t) + ZP (t)

= ΛP

(t

T

)F (t) + ZP (t), t = 1, . . . , T, (1)

with P × 1 common components X , q × 1 factors F and P × 1idiosyncratic components Z . The common component X de-scribes the co-movements of all the series, the idiosyncraticcomponent Z is specific to each particular series. The ba-sic idea is to consider the loadings as smooth functions ofrescaled time, rendering the process nonstationary while thefactors remain stationary. However, the assumption that load-ings are smooth permits considering the process as locallystationary and enables us to estimate the model using non-parametric methods. Similarly to Ombao and Ho (2006), thetechnique employed to model the nonstationarity is the rescal-ing of time to the unit interval, but the difference is that model(1) has a factor structure. The time-varying covariance matrixΣP ( t

T) of Y T

P (t) in (1) is estimated consistently by smoothingthe cross-products of the observations. Then to estimate theloadings, the factors and the common components of model(1), Motta et al. (2011) applied a time-varying (i.e., a time-localized version of) PCA to the estimator ΣP ( t

T) of ΣP ( t

T)

in time domain. One limitation of the above approaches isthat they do not directly handle multivariate nonstationarytime series data that are obtained repeatedly across severalindependent trials. In this article, we develop a novel modelthat builds on the ideas and framework of cutting-edge workon evolutionary factor analysis.

Page 4: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

828 Biometrics, September 2012

3. Extracting Evolutionary Principal Componentsfrom Replicated Time Series

In this section, we set up a model for high-dimensional EEGdatasets observed over repeated trials. We assume that theobservations Y (t, r) at time t and rth trial can be representedby a factor model of the form,

Y P (t, r) = XP (t, r) + ZP (t, r),

t = 1, . . . , T, r = 1, . . . , R,

where T is the sample size and R is the number of trials. Theprocesses Y P (t, r), XP (t, r), and ZP (t, r) are P × 1 vectors ofobserved time series, common components and idiosyncraticerrors, respectively. We allow the common component to havea reduced rank structure with time-varying loadings. Moreprecisely, following Motta et al. (2011), we assume that thereexists a function ΛP (u) defined on the unit interval u ∈ (0, 1)such that,

XTP (t, r) = ΛP

(t

T

)F (t, r), (2)

where Λ is an P × q matrix of smoothly varying loadings,and where F is a zero-mean q × 1 vector of orthogonal com-mon factors with variance ΣF. Equation (2) implies that thecommon component XT

P (t, r) is a sequence of stochastic pro-cesses whose structure depends on T , the sample size. As aconsequence, Y T

P (t, r) is also nonstationary, and the model iswritten as a sequence of models,

Y TP (t, r) = XT

P (t, r) + ZP (t, r),

t = 1, . . . , T, r = 1, . . . , R.(3)

Hence, defining ΣZP = Var{ZP (t, r)} we can define, for all

u ∈ (0, 1), a matrix-valued smooth function ΣP (u) given by

ΣP (u) = ΣXP (u) + ΣZ

P , (4)

where ΣXP (u) := ΛP (u)ΣF Λ′

P (u), (5)

such that the covariance matrix of Y TP (t, r) can be defined as

ΣP

(t

T

):= Var

{Y T

P (t)}

= ΣXP

(t

T

)+ ΣZ

P , t = 1, . . . , T,

where ΣXP ( t

T) := ΛP ( t

T)ΣF Λ′

P ( tT

) is the time-varying co-variance matrix of the common components XT

P (t, r), t =1, . . . , T , r = 1, . . . , R. The loadings are estimated by theeigenvectors of an estimator of ΣP (u). To estimate the load-ings consistently, we need ΣP (u) to be continuously differ-entiable. This is guaranteed by assuming that the entries ofthe P × q matrix of loadings ΛP (u) are continuously differ-entiable in u. Moreover, to identify the number q of factors,we assume that for all u ∈ [0, 1], the P × q matrix of loadingsΛP (u) has rank q.

The covariance matrix ΣZP of the idiosyncratic components

is a sequence of covariance matrices with uniformly boundedeigenvalues, that is, the supP vZ

1P is uniformly bounded overP ∈ N, where vZ

1P denotes the largest eigenvalue of ΣZP .

Moreover, the factor process F (t, r) and the idiosyncraticerrors ZP (t, r) are orthogonal at all leads, lags, and trials,that is, E

{ZP (t, r)F (t − k, s)′

}= OP ,q for all r, s = 1, . . . , R,

all P ∈ N and all t, k ∈ Z. A more detailed list of assumptionson our model is given in the Web Appendix. Under theseassumptions, we say that the observations in (3) follow anevolutionary factor model (EFM).

Remark 1 (Evolutionary common components). The com-mon components process XT

P is a triangular array of P -dimensional random vectors whose structure does not onlydepend on t, but also on T . In the locally stationary set-ting, letting T tend to infinity does not mean extending thedata to the future anymore. In the rescaled time framework,letting T tend to infinity means that we have in the sampleXT

P (1, r), . . . , XTP (T, r) more and more ‘observations’ for each

value of Λ( tT

). For local estimation at time t this implies thatincreasing T is equivalent to increasing the sampling rate ina local neighborhood of t

Ton the domain [0, 1].

Remark 2 (Stochastic versus deterministic). In our model,the stochastic processes Y P (t, r), F (t, r), XP (t, r) andZP (t, r) depend on trial r, r = 1, . . . , R, whereas the deter-ministic functions ΛP (u), ΣF , ΣP (u), ΣX

P (u) and ΣZP do not.

Remark 3 (Stationary Factors). The assumption that theprocess F (t, r) is stationary process is not a serious con-straint. For example, suppose that F (t, r) is a nonstation-ary factor process with time-varying representation F (t, r) =Ξ

(tT

)η(t, r), where η(t, r) is a stationary orthonormal white

noise process and Ξ(u) is a smooth function of time. Thenthe common component X(t, r) can be rewritten as X(t, r) =Λ( t

T)Ξ( t

T) η(t, r) = Λ( t

T) η(t). Thus the process can be rep-

resented as in (2) with the factor process F (t, r) = η(t, r) sat-isfying the above assumptions.

Remark 4 (Loadings and covariances). It is well known thatEEG signals exhibit a high variability over trials; and that av-eraging over a large number of trials allows one to remove thenoise and thus to recover the underlying signal. EEG record-ings obtained following stimulus presentation have a low am-plitude in comparison with the background activity. Conse-quently, they are barely visible in single-trials. The usual wayof improving the ratio between the signal power and the back-ground EEG power, that is, the signal-to-noise ratio (SNR), isby averaging the response of several trials (see Chiappa 1997).

In this respect, the fact that the loadings do not depend onr is a crucial assumption. This implies that there is a dynamicbehavior within a trial but that this dynamic behavior doesnot change from one trial to the next. This assumption iscrucial for estimation (see later), as Λ is estimated from theaverage over trials of the estimated covariance matrices.

3.1 Estimation StepsIn this section, we describe the estimation steps. Then,in Section 3.2, we derive the asymptotic properties of ourestimators.

(1) At each time point, we estimate the covariance matrixby averaging over trials the cross-products of the ob-servations. That is, the P × P pre-estimator ΣT

P (t) ofthe covariance matrix ΣP at time t is the average (over

Page 5: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Evolutionary Factor Analysis of Replicated Time Series 829

trails) of the P × P matrices {Y TP (t, r)Y T

P (t, r)′}

ΣTP (t) :=

1R

R∑r=1

Y TP (t, r)Y T

P (t, r)′, t = 1, . . . , T.

Then, we define the estimator of ΣP (u) in (4) atrescaled time u as the smoothed version (over rescaledtime) of the estimates ΣT

P (s) for those values sT

around u:

ΣTRP (u; h) :=

1T

T∑s=1

ΣTP (s)Kh

(u − s

T

), u ∈ (0, 1),

(6)

where Kh (·) := 1hK( ·

h) is the rescaled version of a

second order kernel, and h ≡ hTR is the sequence ofsmoothing bandwidths that tends to zero as T, R → ∞.

(2) Before estimating the P × q matrix of loadings, we needto determine the number q of factors, that is, the num-ber of columns of the estimated matrix of loadings. Theestimation of q is based on the P × P diagonal ma-trix VTR

P (u) containing the P eigenvalues of ΣTRP (u).

This will be detailed in section 4, where we apply ourmethodology to the data.

(3) Extract the P × q matrix of eigenvectors ΛTRP (u) corre-

sponding to the largest eigenvalues of the P × P ma-trix 1

PΣTR

P (u) collected in the q × q diagonal matrixVPTR

q (u):

1P

ΣTRP (u)ΛTR

P (u) = ΛTRP (u)VPTR

q (u), u ∈ (0, 1).(7)

The matrix VPTRq (u) is the upper-left q × q corner of the

P × P matrix VTRP (u). Although VPTR

q (u) is of finite size(q × q), its asymptotic behavior depends on P .

(4) Define the q × 1 principal components at time t for ther-th trial, as the projection of the data Y T

P (t, r) at timet and trial r on the orthonormal eigenvectors ΛTR

P ( tT

),

FPTR

(t, r) =1P

ΛTRP

(t

T

)′Y T

P (t, r). (8)

(5) Define the estimated common components as the theproduct estimator

XTRP (t, r) = ΛTR

P

(t

T

)F

PTR(t, r).

Remark 5 (Bandwidth and Kernel selection). The estima-tor ΣP (u; h) in (6) depends on the bandwidth sequence h.In our application to EEG data, the bandwidth h was se-lected adaptively from the data using the local plug-in algo-rithm by Gasser, Kneip, and Kohler (1991) and Brockmann,Gasser, and Herrmann (1993). The technical computation ofthe bandwidth-selection procedure is described in Herrmann(1997). The basic idea of plug-in estimation is to obtain alarge-sample approximation to the mean integrated squarederror (MISE) of the estimator of the entries σi ,j (u; h) ofΣP (u; h); then to minimize the resulting analytical expressionwith respect to h to obtain the asymptotic optimal bandwidthh; and finally to replace the unknown terms in h by their es-timators. The smoother we used is the Epanechnikov kernel

which is of the form K(x) = 34 (1 − x2) 1I(|x| ≤ 1). Among all

nonnegative kernels with compact support, this kernel mini-mizes the asymptotic MISE of ΣP (u; h).

3.2 Asymptotic Estimation TheoryIn this section, we study the asymptotic properties ofthe estimators defined in Section 3.1. In particular, wefocus on the estimators ΣTR

P (u), VPTRq (u) and ΛTR

P

(tT

).

The properties of these three estimators are highlightedin Sections 3.2.1, 3.2.2, and 3.2.3, respectively, while theformal results and their proofs are given in the Ap-pendix and the Web Appendix, respectively. The prop-erties of the estimators F

PTR(t, r) and X

TRP (t, r) are

highlighted in Section 3.2.4 and detailed in the WebAppendix.

3.2.1 Evolutionary covariance. The first result is about con-sistency of our nonparametric estimator of the time-varyingcovariance ΣP (u). In our Proposition 1 (see Appendix), weshow that the estimator ΣTR

P (u; h) converges to ΣP (u) as Tand R tend to infinity. We emphasize that, for the consistencyof the estimator ΣTR

P (u; h) we only need that T → ∞, whereasthe divergence of the number of trials R is only assumed toimprove even further the quality of the estimates.

Remark 6 (Increased rate of convergence with multipletrials). The result in Proposition 1 shows that the possibil-ity of observing a collection of time series generated from thesame experiment (multiple trials) increases the speed of con-vergence of Σ to Σ by

√R. As a consequence, we do not need

to smooth too much within a trial because we have many tri-als and thus we essentially smooth by averaging over trials. Bynot smoothing too much, we control bias and preserve highresolution in rescaled time. In the Web Appendix we showthat if hT is the optimal bandwidth for Σ, then the optimalbandwidth for Σ is hTR = R− 1

5 hT . Hence the multi-trial sit-uation helps us to get estimates which are better localized intime.

The estimation of the covariance matrix is the first step to-ward estimating factor models, and it is from this estimatorthat all the other estimators are derived. The consistency ofthe estimators of the number of factors and the loadings de-pends on, respectively, the consistency of the eigenvalues andthe eigenvectors of ΣTR

P (u). The eigenvalues are used to esti-mate the number of factors, whereas the eigenvectors are es-timates of the evolutionary loadings. The orthonormal eigen-vectors are such that,

P −1ΛTRP (u)′ΛTR

P (u) = Iq for all u ∈ (0, 1); (9)

then by (7) we can rewrite the q × q matrix of estimated eigen-values as,

1P

VPTRq (u) = P −1ΛTR

P (u)′{ 1

PΣTR

P (u)}

ΛTRP (u). (10)

In section 3.2.2 below, we illustrate two important prop-erties of the matrix VPTR

q (u) in (10). First, the size of thismatrix is an estimate of the number of factors. Second, itsrescaled version 1

PVPTR

q (u) converges to a well defined matrixas T and P tend to infinity.

Page 6: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

830 Biometrics, September 2012

Remark 7 (Double Asymptotics). The asymptotic resultspresented in Sections 3.2.2–3.2.4 below (and detailed in theAppendix and the Web Appendix) hold for P, T → ∞. We usethe concept of double asymptotics, where both cross-sectionsize and sample size go to infinity simultaneously. We applythe techniques in Motta et al. (2011) to our context where weneed to systematically integrate common information acrossall replicated trials. We emphasize that for the consistency ofthe estimators we only need that P, T → ∞, whereas the di-vergence of the number of trials R is only assumed to improveeven further the quality of the estimates. We also emphasizethat P, T or R are allowed to grow to infinity without anyrestriction. For ease of presentation, from now on we will skipthe dependency of the estimators on the bandwidth sequencehTR.

3.2.2 Evolutionary eigenvalues. Let v1P (u) ≥ v2P (u) · · · ≥vP P (u) be the set of P eigenvalues of ΣP (u), the evolutionarymatrix of the observations. In Proposition 2 (see Appendix)we show that under the assumptions of our model, only thelargest q eigenvalues v1P (u), v2P (u), . . . , vqP (u) diverge as Pincreases while the remaining (P − q) stay bounded. This is animportant property that we utilized to estimate the number offactors, because the eigenvalues of ΣP (u) are estimates of theeigenvalues of ΣP (u). Analogously, let v1P (u) ≥ v2P (u) · · · ≥vPP (u) be the set of P eigenvalues in decreasing order ofΣP (u). Estimating q is equivalent to fix the size of the matrixVPTR

q (u) = diag{v1P (u), v2P (u), . . . , vqP (u)}. In Section 4.2.2,we consider a sequence of estimated evolutionary covariancesand look at the number q of eigenvalues that diverge as thesize of the estimated covariance increase.

In Proposition 3 (see Appendix) we show that the matrix1PVPTR

q (u) converges to V(u), a well-defined q-dimensionaldiagonal matrix. To give an idea of the proof of Proposition3, we can decompose the overall error ‖ 1

PVPTR

q (u) − V(u)‖ asthe following:

(i) estimation error: ‖ 1PVPTR

q (u) − 1PVP

q (u)‖;(ii) approximation error: ‖ 1

PVP

q (u) − V(u)‖.The estimation error in (i) depends on the difference betweenthe largest q eigenvalues of 1

PΣP (u) and the largest q eigen-

values of 1PΣP (u), collected in the matrix 1

PVP

q (u). The factthat it tends to zero follows from the consistency of the esti-mator ΣTR

P (u). In contrast to the stochastic estimation error,the approximation error in (ii) is purely deterministic; it onlydepends on P and neither on T nor on R, and comes fromthe approximate factor structure of our model, namely theassumption of uniformly bounded eigenvalues.

The next result is about the asymptotic behavior of thetime-varying eigenvectors.

3.2.3 Evolutionary eigenvectors. Analogously to the sta-tionary case, the loadings can only be estimated upto a transformation because only the product ΣX

P (u) =ΛP (u)ΣF ΛP (u)′ is identifiable.

In our Theorem 1 (see Appendix), we show that the P × q

matrix of estimated loadings ΛTRP (u) converges to a linear

transformation ΛP (u)H(u) of the true loading matrix ΛP (u),where H(u) is a q × q invertible matrix.

3.2.4 Estimated factors and common components. Finally,in the Web Appendix we also prove the consistency of the

estimated factors and common components. In particular,we show that the vector F PTR(t, r) of estimated fac-tors converges to H( t

T)−1

F (t, r), where H( tT

)−1 is the in-verse of H( t

T). Moreover, the estimated common compo-

nent XTRi (t) := λTR

i ( tT

)′F TR(t, r) converges in probability toXT

i (t, r), the common component of the ith series at timet. Note that, unlike the estimators of ΛP ( t

T) or F (t, r), the

estimated common component is identified because the in-determinacy of ΛP ( t

T) and F (t, r) due to the q × q trans-

formation matrix H( tT

) cancels out in the product betweenλTR

i ( tT

)′ (that converges to λi ( tT

)′H(

tT

)) and F TR(t, r) (that

converges to H( tT

)−1F (t, r)).

Remark 8 (Asymptotic framework with bounded numberof channel locations). Our results are based on assumptionsthat reflect empirical observations on the EEG data, whichare well represented by an approximate factor model where:(i) a large P is needed to recover the common structure (cross-section direction), whereas (ii) a large R is useful to improvethe quality of the estimates (time direction) and for infer-ence. Principal components analysis is a dimension reductiontechnique that finds “a natural justification” when (i) thedata are thought as being generated according to an approx-imate factor model, and (ii) P is large. A framework withbounded P would be justified if the data would show strongevidence for a reduced rank structure of the covariance of theobservations. That is, if the P × P covariance matrix of thedata ΣP has q nonzero eigenvalues and P − q eigenvalues thatare exactly zero, rather than bounded. This is rarely observedin empirical applications. For example, in our application wefind that ΣP (r), the 62 × 62 sample covariance of the obser-vations, has rank 62 for all r = 1, . . . , 118. The case in whichΣP (the covariance matrix of Y ) has rank q can be well rep-resented by Y ≡ X = ΛF or, put differently, Z ≡ 0. In thiscase the approximation error is identically zero for any P ,and the q eigenvectors corresponding to the q largest eigenval-ues of ΣP consistently estimate (up to transformation) Λ forany P .

4. Analysis of the Multichannel EEG Data4.1 Data Description and PreprocessingWe analyzed the multichannel EEG data, collected in avisual–motor experiment, using our novel EFM. The EEGsignals were recorded over a montage of P = 62 channels (seeFigure 1) at the sampling rate of 512 Hz. A band-pass fil-ter of 0.02–100 Hz was applied before analysis. The subjectwas required to make quick displacements of a hand-heldjoystick from a central position either to the right or to theleft from center as instructed by a visual cue. The visual cuewas randomly selected for each trial. There were a total ofR = 118 trials for each of the two conditions. Each trial hadlength of T = 512 time points recorded over a period of 1000milliseconds—covering (−500, 500) milliseconds where the ref-erence point of 0 is the time of presentation of the visualcue. For each trial, both linear and quadratic trends were re-moved and the EEGs were further filtered using a 4th orderlow-pass Butterworth filter with stopband at 60 Hz. Fromthe observed time series, we estimated the time-varying co-variances in Figure 1 (right-bottom side) according to the

Page 7: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Evolutionary Factor Analysis of Replicated Time Series 831

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

10 20 30 40 50 600

5

10

15

20

25

30

35

40Sequences of eigen values of the evolutionary covariance matrix

10

20

30

40

50

60

70time-varying eigenvalues (top) time-varying traceratio (bottom)

Figure 2. Left-top: 62 time-varying eigenvalues contained in the P × P diagonal matrix VP (u), u ∈ (0, 1). Left-bottom:time-varying trace ratio ρj (u), u ∈ (0, 1), j = 1 . . . , q, defined in (11). The lowest curve is ρ1(u), the curve in the middle isρ2(u), and the highest curve is ρ3(u). Because ρ3(u) ≥ 0.75 for all u, we select q = 3 factors. Right: sequences of eigenvaluesVp , average over time of Vp (u), obtained from the estimated time-varying covariances Σp (u); p = 5, . . . , P = 62. There areq = 3 diverging eigenvalues. We only report the eigenvalues for left condition. The shapes of the eigenvalues corresponding toright condition are very similar to those reported in this figure.

methodology described in Section 3.1. To estimate the load-ings and the common components, we need to select the num-ber of factors.

4.2 Determining the Number of FactorsTo estimate the number of factors, we adopted two criteria,both based on the estimated evolutionary eigenvalues VP (u)of the evolutionary covariance ΣP (u). The first criterion isbased on the explained variance, whereas the second one de-fines the estimated number of factors as the number of di-verging eigenvalues of the covariance matrix.

4.2.1 Time-varying trace ratio. The plots in Figure 2 sug-gest that there are a total of three primary underlying brainfactors involved in visual–motor processing. To have a rela-tive measure of explained variance we define the time-varyingtrace ratio as,

ρj (u) :=

∑j

k=1 vk (u)∑P

i=1 vi (u), j = 1, . . . , q = 3, (11)

which is plotted on the left-bottom side of Figure 2. The plotssuggest that three factors explain more than 75% of the globalvariability. These three factors appear to be consistent withrecording the relevant neural processes for these visual–motoractions in Bedard and Sanes (2009). These electrode sites in-

clude the frontal leads (FC) to measure activity related topremotor processing, the central leads (C) to measure activityrelated to motor performance, and the parietal (P) and oc-cipital (O) leads to measure activity related to visual–motortransformations.

4.2.2 Diverging eigenvalues of the evolutionary covariance.To estimate the number of factors q, we apply the result ofProposition 2 in the Appendix (see also Section 3.2.2 above).

We computed, for a grid of time points, the evolutionarycovariance matrix estimator ΣP (u) in (6). Then we computedthe eigenvalues of the upper-left p × p sub-matrices Σp (u),p = 1, . . . , P . We report on the right side of Figure 2 the plotof the average over rescaled-time of the eigenvalues vj p (u)of Σp (u), j = 1, . . . , p, p = 1, . . . , P . In the plot we indicatedon the horizontal axis the number of cross-sectional units p,which obviously is maximum when the whole sample P = 62is considered. There are q = 3 diverging eigenvalues, whereasall the others stay bounded as P increases.

4.3 Spatio-temporal Factor LoadingsBecause we know the exact location of the 62 electrodes,we can draw conclusions about the locations of the corre-sponding underlying sixty-two (time-varying) loadings. It isin this sense that the loadings are (temporally and) spatiallyvarying. Some typical time-varying loadings of the factors

Page 8: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

832 Biometrics, September 2012

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

t-v loadings at locations 18 (top) and 23 (bottom) for left condition

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

t-v loadings at location 18 (top) and 23 (bottom) for right condition

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 3. Estimated time-varying loadings over q = 3 factors. The entries are the functions |λij (u)| for i = 18, 23, j = 1, 2, 3,u ∈ (0, 1). Bold lines: j = 1; solid lines: j = 2; dashed lines: j = 3. Left: left move; right: right move.

associated with the left and right movement are displayedin Figure 3. Then, the absolute value of the weights were col-lapsed into four distinct time periods: (−500, 0) milliseconds;(0, 50) milliseconds; (50, 250) milliseconds and (250, 500) mil-liseconds. In Figure 4, we plot the resulting spatially andtemporally varying loadings. As the loadings correspondingto the first factor are constant over time, we focus on Fac-tors 2 and 3. Moreover, we only plot the two central timeperiods, that is, (0, 50) milliseconds and (50, 250) millisec-onds. The first period corresponds to prestimulus—this was a500 millisecond interval before the visual stimulus was pre-sented; the second corresponds to the very short period imme-diately following stimulus presentation; the third and fourthcorrespond to the post-stimulus response with the fourth pe-riod, typically capturing the moment when the participantactually responded by moving the joystick. Very often, event-related potentials display interesting patterns at about 100and 200 milliseconds poststimulus. Studies have reported theN100 and P 200 components (or the N1 − P 2 complex) thatappear following audio, somato-sensory, and visual stimuli(see Warnke, Remschmidt, and Hennighausen 1994). Thus,the third time period corresponds to processing the instruc-tion presented. Hand-movement, in response to the stimu-lus, takes place around 300 − 350 milliseconds post-stimuluspresentation.

The weights corresponding to factor 1 appear to be uni-formly distributed over the scalp topography. This suggeststhat factor 1 corresponds to the process that engages the en-tire brain cortex. Moreover, the weights visually appear tobe almost constant over time suggesting that factor 1 can beinterpreted to correspond to “baseline” brain activity thatis persistent in most cognitive tasks. These spatial distri-butions on the scalp are consistent across the leftward andrightward movement—suggesting that they represent a stable,physiological process. However, our formal test detected somestatistically significant differences between the leftward andrightward movement but these differences appear to be mildrelative to those captured by factor 2 and factor 3. While stillan issue of debate within the neuroscience community, thisfactor can be interpreted as a “default” brain connectivity.Here, we note that our model was able to objectively extractthis feature from the data. This could perhaps serve as anaddition to a growing list of empirical evidence of the defaultnetwork.

In contrast, the weights for factor 2 appear to be con-centrated in the temporal and fronto-central regions andprovide connections between the left and right brain hemi-spheres. This very interesting result suggests contra-lateralconnectivity or co-activation associated with this visuo–motortask. Our model was able to capture this activation in the

Page 9: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Evolutionary Factor Analysis of Replicated Time Series 833

Factor 2

Factor 3

Time (0,50) ms Time (50,250) ms Time (0,50) ms Time (50,250) ms

Figure 4. Spatio-temporal factor-loadings. The first row shows the factor loadings that weight the second factor, while thesecond row shows those weighting the third factor. The first two columns correspond to left condition, whereas the last twoto right condition. This figure appears in color in the electronic version of this article.

cortical regions that are primarily involved in visual and mo-tor processing. The weights for factor 3 are concentrated onregions that were not implicated by factor 2 and capture theanterior–posterior pathways—in contrast to the contra-lateralinter-hemispheric pathway captured by factor 2. These resultsare also verified by data from other neuroimaging modal-ity which we now describe. Several empirical data suggestfunctional changes in the contra-lateral primary motor cor-tex, supplementary motor area (SMA), premotor cortex ofboth hemispheres. However, to the best of our knowledge,this is the first analysis that has objectively demonstratedand confirmed these results using factor analysis of time se-ries data. This is consistent with the analysis of fMRI ex-periments from Roland et al. (1980) that show that SMA isa higher order supra-motor center involved in the generationand programming of complex movements. We also report con-vergent findings from electrical stimulation and lesion studiesin animals suggesting that voluntary motor control is hierar-chically organized within the cortex (see Goldberg 1985, Wise1985).

We also formally tested for the differences between theloadings for the rightward versus leftward movements us-ing the Wilcoxon rank sum test. In the same spirit as inFigure 4, the weights were collapsed into four distinct timeperiods (the same as before). More precisely, let λleft

ij (t, r)and let λright

ij (t, r) be the loadings corresponding to loca-tion i, factor j, period t, trial r, and directions left andright, respectively (with i = 1, . . . , P = 62, j = 1, 2, 3, t =1, 2, 3, 4, r = 1, . . . , R = 118). Then, for each i, j and t wetested the two-sided hypothesis that the (absolute value ofthe) R × 1 vectors λleft

ij (t) = { λleftij (t, 1), . . . , λleft

ij (t, R)}′ andλright

ij (t) = { λrightij (t, 1), . . . , λright

ij (t, R)}′ come from distribu-tions with equal medians. We applied the test individuallywith a confidence level α = 0.05, as well as simultaneously (amultiple comparison) using the Bonferroni’s correction, with

Table 1Number of locations where the differences between left loadingsand right loadings are statistically significant, according to theWilcoxon rank sum test, at the confidence level α (in bracketsare the number of locations where the differences between leftloadings and right loadings are statistically significant at the

level α/P )

Factor 1 Factor 2 Factor 3

[−500,0] 12 (0) 2 (0) 13 (0)[0,50] 0 (0) 2 (1) 2 (0)[50,250] 26 (15) 23 (4) 10 (1)[250,500] 0 (0) 21 (8) 21 (0)

a confidence level α/P . The two sets of data are assumed tocome from continuous distributions that are identical exceptpossibly for a location shift, but are otherwise arbitrary.

The results are reported in Tables 1 and 2. The num-ber of locations i where the differences between |λleft

ij (t)|and |λright

ij (t)| are statistically significant are reported in Ta-ble 1. In Table 2 we report, only for those locations iwhere the test rejects the null of equal distribution be-tween left and right condition, three quantiles of the distribu-tions of the R × 1 vectors δij (t), where δij (t) = {

∣∣λrightij (t, 1)| −

|λleftij (t, 1)|, . . . , |λright

ij (t, R)| − |λleftij (t, R)|}′.

Table 1 indicates that the most significant differences be-tween the leftward and rightward movement appear after thestimulus. For this reason, we show in Table 2 only the resultsfor the intervals 50–250 and 250–500 milliseconds.

In the interval 50–250 milliseconds poststimulus presenta-tion, factor 2 loadings at the left anterior channels were largerfor the rightward condition. Interestingly, on the same an-terior channels, the loadings for the leftward condition wasactually larger. For the contra-lateral pathway, it appears that

Page 10: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

834 Biometrics, September 2012

Table 2Indexes of the locations where the right loadings are significantly different the from left loadings, see Table 1. For these indexes,we report three quantiles of the distributions of the differences between right loadings and left loadings (in bold are the locations

where the differences between left-loadings and right-loadings are statistically significant at the level α/P ).

Factor 1 Factor 2 Factor 3

Period i Q0.1 Q0.5 Q0.9 i Q0.1 Q0.5 Q0.9 i Q0.1 Q0.5 Q0.9

[50,250] 1 −0.51 s0.11 0.72 3 −0.79 0.36 1.31 6 −1.06 −0.24 0.933 −0.60 −0.07 0.28 4 −1.44 −0.34 0.99 12 −1.23 −0.16 0.664 −0.39 0.17 0.65 6 −0.84 0.36 1.29 13 −1.12 −0.20 0.625 −0.53 0.19 0.75 7 −1.21 0.45 1.62 14 −1.19 −0.20 1.016 −0.68 −0.08 0.39 8 −1.35 −0.19 1.12 22 −1.03 −0.17 0.517 −0.75 −0.24 0.25 9 −1.22 −0.24 0.89 24 −1.02 −0.11 0.748 −0.35 0.32 0.77 10 −1.05 −0.23 0.72 25 −1.19 −0.19 0.959 −0.42 0.25 0.74 13 −0.75 0.11 1.12 41 −0.84 0.19 1.14

10 −0.45 0.18 0.72 14 −0.84 0.23 1.12 61 −0.93 −0.15 0.7614 −0.69 −0.10 0.45 15 −1.02 0.29 1.16 62 −1.12 −0.18 0.6815 −0.80 −0.20 0.29 18 −1.24 −0.19 0.8716 −0.97 −0.26 0.20 19 −0.90 −0.23 0.7017 −0.36 0.24 0.92 23 −0.72 0.16 1.0218 −0.23 0.15 0.73 25 −1.27 −0.11 0.8419 −0.37 0.11 0.55 27 −1.00 −0.15 0.8123 −0.51 −0.08 0.43 29 −0.81 −0.13 0.5324 −0.61 −0.18 0.24 33 −0.98 −0.21 0.6125 −0.69 −0.21 0.29 34 −1.28 −0.29 0.5826 −0.38 0.12 0.61 42 −1.02 −0.18 0.6527 −0.31 0.08 0.55 43 −1.15 −0.30 0.7128 −0.30 0.06 0.40 44 −0.55 0.02 1.0433 −0.45 −0.08 0.34 45 −0.70 0.12 0.8834 −0.50 −0.14 0.39 48 −0.59 0.16 0.8935 −0.40 0.12 0.4143 −0.51 −0.13 0.4444 −0.44 0.05 0.47

[250,500] 1 −1.05 −0.17 0.60 6 −1.24 −0.16 0.853 −0.63 0.28 1.29 9 −1.08 0.12 1.454 −1.57 −0.43 0.59 10 −0.90 0.23 1.105 −1.14 −0.22 0.80 14 −1.13 −0.28 0.816 −0.67 0.32 1.38 15 −1.41 −0.21 0.847 −1.05 0.37 1.45 16 −1.61 −0.22 1.068 −1.94 −0.39 0.70 17 −1.29 0.30 1.799 −1.51 −0.44 0.69 18 −1.06 0.39 1.33

10 −1.11 −0.18 0.62 19 −0.89 0.16 0.9912 −0.80 0.17 0.98 23 −1.01 −0.09 0.7013 −0.56 0.21 1.07 24 −1.07 −0.27 0.8414 −0.71 0.43 1.28 25 −1.18 −0.22 1.1015 −0.86 0.24 1.30 27 −0.92 0.28 1.1117 −1.41 −0.14 0.84 32 −0.88 0.16 1.2518 −1.24 −0.21 0.68 38 −0.84 −0.17 0.4622 −0.61 0.16 1.09 39 −1.12 −0.16 0.7923 −0.64 0.27 1.07 41 −0.78 0.26 1.2325 −1.23 −0.23 0.92 42 −0.76 0.10 1.1631 −0.53 0.18 0.61 47 −0.94 −0.14 0.6234 −1.21 −0.16 0.83 50 −0.61 0.13 0.8543 −1.08 −0.22 0.70 57 −0.75 0.10 0.94

the factor 2 loadings are dominantly larger for the leftwardcondition. The most interesting differences were captured byfactor 2 in the interval 250–500 milliseconds poststimulus pre-sentation. The largest differences were observed on the lefttemporal–parietal channels where the loadings for the left-ward motion were larger than the rightward motion. Theseresults are reported only for a single subject. However, the

model is promising and can be extended to analyzing mul-tivariate time series data from multiple subjects. In termsof the temporal dynamics of the brain process or the evolu-tion of the weights over time, we note that factors 1 and 2for both conditions are more-or-less stationary but that thenonstationarity is mostly captured by factor 3. We observethat in both conditions, the weights are concentrated mostly

Page 11: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

Evolutionary Factor Analysis of Replicated Time Series 835

on the fronto-parietal regions starting from 50 millisecondspoststimulus presentation.

5. ConclusionsThe contribution of our article is threefold: modeling, estima-tion, and application. First, in terms of modeling, we devel-oped a novel statistical model for nonstationary multivariatetime series, that systematically integrates common informa-tion across (i) channels within a trial, and (ii) several trialsfrom an experimental design.

Second, we developed estimation theory for our multitrialEFM. In particular, we showed that the estimated loadingmatrix converges to a linear transformation of the true loadingmatrix. To prove the consistency of the estimators, we onlyneeded that the time series length per trial tends to infinity,whereas increasing the number of trials R was demonstratedto further improve the quality of the estimates.

Finally, to investigate the brain network involved in avisual–motor task using EEGs, we provided an application ofour proposed model. There are several potential applicationsof our model. It can be used to test for differences in brainconnectivity by directly comparing the spatio-temporal load-ings. It can be further generalized in a clinical setting (withseveral patient groups) to determine differences in brain net-works between groups.

We note that in neuroscience investigations, there are sev-eral imaging modalities utilized to study brain processes—such as functional magnetic resonance brain images (fMRI),magnetoencephalograms, and EEGs. Each of these modali-ties has its own advantages and limitations. On the one hand,fMRI is known to offer excellent spatial resolution (in the or-der of 1 mm3). However, has poor temporal solution becausethey are recorded every 1000–2500 milliseconds. Thus, fMRIdoes not provide adequate information on the brain’s tempo-ral properties. On the other hand, EEGs have excellent tem-poral resolution (with one observation per 2–4 milliseconds).However, EEGs are not highly localized in space (offering in-formation on the cm3 scale rather than the mm3 scale). EEGscontinue to be used in many studies because they catch tran-sient properties of brain signals. While this current work is animportant starting point for studying brain processes, futurework will entail inclusion of all available spatial information(obtained from anatomical constraints and functional activa-tion results from fMRI) directly in our EFM.

6. SummaryIn this article, we developed a novel statistical model for non-stationary multivariate time series recorded over replicatedtrials. Our model systematically integrates common informa-tion across (i) channels within a trial and (ii) several trialsfrom an experimental design. Estimation theory, based ontime-varying PCA, is rigorously developed for our multi-trialEFM. We prove consistency of the proposed nonparametricestimators of the (deterministic) time-varying covariance ma-trix and time-varying loadings. We also show the consistencyof the estimators of the (stochastic) factors and common com-ponents. To investigate the underlying brain network impli-cated in this particular visual–motor task, we utilized ourmodel to analyze a multi-channel EEG data recorded overreplicated trials.

7. Supplementary MaterialsWeb Appendices referenced in Section 3 and in the Appendix,are available with this article at the Biometrics website onWiley Online Library.

Acknowledgements

This research was supported by a Marie Curie Intra EuropeanFellowship within the 7th European Community FrameworkProgramme (G.M.) and by the National Science Foundation(H.O.).

References

Bai, J. (2003). Inferential theory for factor models of large dimensions.Econometrica 71, 135–171.

Brockmann, M., Gasser, T., and Herrmann, E. (1993). Locally adaptivebandwidth choice for kernel regression estimators. Journal of theAmerican Statistical Association 88, 1302–1309.

Bedard, P., and Sanes, J. N. (2009). Gaze and hand position effectson finger-movement-related human brain activation. Journal ofNeurophysiology 101, 834–842.

Chiappa, K. H. (1997). Evoked Potentials in Clinical Medicine.Philadelphia, Lippincott-Raven.

Dahlhaus, R. (1997). Fitting time series models to nonstationary pro-cesses. The Annals of Statistics 25, 1–37.

Gasser, T., Kneip, A., and Kohler, W. (1991). A flexible and fastmethod for automatic smoothing. Journal of the American Sta-tistical Association 86, 643–652.

Goldberg, G. (1985). Supplementary motor area structure and function:Review and hypothesis. Behavioral Brain Science 8, 567–588.

Herrmann, E. (1997). Local bandwidth choice in kernel regression es-timation. Journal of Computational and Graphical Statistics 6,35–54.

Motta, G., Hafner, C., and von Sachs, R. (2011). Locally stationary fac-tor models: Identification and nonparametric estimation. Econo-metric Theory 27, 1279–1319.

Ombao, H. and Ho, R. M. (2006). Time-dependent frequency domainprincipal components analysis of multichannel non-stationary sig-nals. Computational Statistics and Data Analysis 50, 2339–2360.

Ombao, H., von Sachs, R., and Guo, W. (2005). Slex analysis of mul-tivariate nonstationary time series. Journal of the American Sta-tistical Association 100, 519–531.

Prado, R., West, M., and Krystal, A. (2001). Multichannel electroen-cephalographic analyses via dynamic regression models with time-varying lag-lead structure. Applied Statistics 50, 95–109.

Roland, P., Larsen, B., Lassen, N. A., and Skinhoj, E. (1980). Sup-plementary motor area and other cortical areas in organizationof voluntary movements in man. Journal of Neurophysiology 43,118–136.

Warnke, A., Remschmidt, H., and Hennighausen, K. (1994). Ver-bal information processing in dyslexia—Data from a follow-upexperiment of neuro-psychological aspects and eeg. Acta Pae-dopsychiatrica 56, 203–208.

Wise, S. (1985). The primate primary motor cortex: past, present andpreparatory. Annual Review of Neuroscience 8, 1–19.

Received December 2010. Revised December 2011.Accepted December 2011.

Appendix

In this appendix, we present the consistency results of theestimated evolutionary covariance (Proposition 1), and show

Page 12: Evolutionary Factor Analysis of Replicated Time Serieshombao/Downloads/EvolFactorAnalysis-Motta-Omba… · Evolutionary Factor Analysis of Replicated Time Series 827 adequate in representing

836 Biometrics, September 2012

the properties of the evolutionary eigenvalues (Propositions 2and 3) and eigenvectors (Theorem 1). The list of assump-tions on our model and the proofs are provided in the WebAppendix.

The first result is about consistency of our nonparamet-ric estimator of the time-varying covariance ΣP (u). Theconsistency of the estimators of the factors and the load-ings depends on the consistency of the eigenvalues (seeProposition 3 and the eigenvectors (see Theorem 1) ofΣTR

P (u). For clarity of presentation, in what follows we seth ≡ hTR.

Proposition 1: Under Assumptions 1–2,

supu∈(0,1)

α(R, T )1P

∥∥ΣTRP

(u; h

)− ΣP (u)

∥∥ = Op (1),

where α(R, T ) = min(√

RTh, h−1√

R ), and where ΣTRP (u; h)

and ΣP (u) are defined, respectively, in (6) and (4). The Proofis given in the Web Appendix A.

The following proposition shows that under our assump-tions, only q eigenvalues of the evolutionary covariance ma-trix of the observations diverge as P increases while the re-maining P − q stay bounded. The Proof is given in the WebAppendix B.

Proposition 2: Under Assumptions 1–2, the first q eigen-values of ΣP (u) diverge, as P → ∞, uniformly over u ∈ [0, 1],

limP →∞

infu∈[0,1]

vP j (u) = ∞ for all j = 1, . . . , q,

whereas the remaining eigenvalues are uniformly bounded by v,that is,

lim supP →∞

supu∈[0,1]

vP j (u) ≤ v for all j > q.

Proposition 3: Under Assumptions 1–2,

sup(0,1)

β(R, T, P

)∥∥ 1PVPTR

q (u; h) − V(u)∥∥ = Op (1),

where β(R, T, P ) = min(√

RTh, h−1√

R, P, �P ), VPTRq (u) is

defined in (10), V(u) is the q-dimensional diagonal matrix con-taining the eigenvalues of ΣΛ(u)ΣF , and where

�P := supu∈(0,1)

∥∥Λ′P (u)ΛP (u)

P− ΣΛ(u)

∥∥. (A.1)

The Proof of this Proposition is similar to the Proof ofProposition 2 by Motta et al. (2011) and thus omitted. Thenext result shows that the scaled norm of the distance betweenthe estimated loading matrix and a linear transformation ofthe true loading matrix converges to zero in probability. TheProof of Theorem 1 is given in the Web Appendix C.

Theorem 1: Under Assumptions 1–2,

supu∈(0,1)

γ(R, T, P )1√P

∥∥ΛTRP (u; h) − ΛP (u)H(u)

∥∥ = Op (1)(A.2)

where γ(R, T, P ) = min(√

RTh, h−1√

R ,√

P , �−1P ), ΛTR

P (u; h)is defined in (7), H(u) := {ΣF } 1

2 Υ(u){V(u)}− 12 , and where

Υ(u) is the q × q matrix containing the orthonormal eigenvec-tors of the q × q matrix {ΣF } 1

2 ΣΛ(u){ΣF } 12 .