Bayesian M/EEG source reconstruction with spatio-temporal priors

model 5

YNIMG-04839; No. of pages: 18; 4C:

www.elsevier.com/locate/ynimg

ARTICLE IN PRESS

NeuroImage xx (2007) xxx–xxx

Bayesian M/EEG source reconstruction with spatio-temporal priors

Nelson J. Trujillo-Barreto,a,⁎ Eduardo Aubert-Vázquez,a and William D. Pennyb

aBrain Dynamics Department, Cuban Neuroscience Centre, P.O. Box 6412/6414, Ave. 25, Esq. 158, No. 15202, Cubanacán, Playa, Havana, CubabWellcome Department of Imaging Neuroscience, UCL, London, UK

Received 9 March 2007; revised 30 June 2007; accepted 27 July 2007

This article proposes a Bayesian spatio-temporal model for sourcereconstruction of M/EEG data. The usual two-level probabilisticmodel implicit in most distributed source solutions is extended byadding a third level which describes the temporal evolution of neuronalcurrent sources using time-domain General Linear Models (GLMs).These comprise a set of temporal basis functions which are used todescribe event-related M/EEG responses. This places M/EEG analysisin a statistical framework that is very similar to that used for PET andfMRI. The experimental design can be coded in a design matrix, effectsof interest characterized using contrasts and inferences made usingposterior probability maps. Importantly, as is the case for single-subject fMRI analysis, trials are treated as fixed effects and theapproach takes into account between-trial variance, allowing validinferences to be made on single-subject data. The proposed probabil-istic model is efficiently inverted by using the Variational Bayesframework under a convenient mean-field approximation (VB-GLM).The new method is tested with biophysically realistic simulated dataand the results are compared to those obtained with traditional spatialapproaches like the popular Low Resolution Electromagnetic Tomo-grAphy (LORETA) and minimum variance Beamformer. Finally, theVB-GLM approach is used to analyze an EEG data set from a faceprocessing experiment.© 2007 Elsevier Inc. All rights reserved.

Keywords: M/EEG source localization; Spatio-temporal priors; GLM;Bayesian models; Variational Bayes; Ensemble learning

Introduction

This article describes a model-based spatio-temporal deconvolu-tion method for M/EEG source reconstruction. The underlyingforward or “generative”model incorporates two mappings. The firstspecifies a time-domain General Linear Model (GLM) at each pointin source space. This relates effects of interest at each generator tosource activity at that generator. This is identical to the “mass-univariate” approach that is widely used in the analysis of fMRI

⁎ Corresponding author. Fax: +53 7 208 6707.E-mail address: [email protected] (N.J. Trujillo-Barreto).Available online on ScienceDirect (www.sciencedirect.com).

1053-8119/$ - see front matter © 2007 Elsevier Inc. All rights reserved.doi:10.1016/j.neuroimage.2007.07.062

Please cite this article as: Trujillo-Barreto, N.J., et al., Bayesian M/EEG sdoi:10.1016/j.neuroimage.2007.07.062

(Frackowiak et al., 2003). Additionally, effects of interest areconstrained to be similar at nearby generators through use of a spatialprior. The second mapping relates source activity to sensor activityat each time point using the usual spatial-domain lead-field matrix.

There are two potential benefits of the approach. First, as wewill show, the use of temporal (as well as spatial) priors can resultin more accurate source reconstructions. This may allow signals tobe found that cannot otherwise be detected. Second, it provides ananalysis framework for M/EEG that is very similar to that used inPET and fMRI. The experimental design can be coded in a designmatrix, the model fitted to data, and various effects of interest canbe characterized using “contrasts” (Frackowiak et al., 2003). Theseeffects can then be tested for statistically using posterior probabilitymaps (PPMs), as described in Friston and Penny (2003).Importantly, the model does not need to be refitted to test formultiple experimental effects that are potentially present in anysingle data set. Source parameters are estimated once only using aspatio-temporal deconvolution rather than separately for eachtemporal component of interest.

The new method is to be contrasted with approaches whichfollow a single-pass serial processing strategy in which either (i)spatial processing first proceeds to create estimates at each sourcelocation and then temporal models are applied at these “virtualdepth electrodes” (Darvas et al., 2004; Kiebel and Friston, 2004;Brookes et al., 2004), or (ii) time-series methods are applied insensor space to identify components of interest using, e.g., timewindowing (Rugg and Coles, 1995) or time–frequency estimation(Durka et al., 2005), and source reconstructions are then based onthese components. The algorithm we propose comprises amultiple-pass strategy in which temporal and spatial parameterestimates are improved iteratively to provide an optimized andmutually constrained solution.

The new algorithm is similar to existing distributed sourcesolutions in employing spatial priors but differs from the standardgenerative models implicit in source reconstruction by having anadditional level that embodies temporal priors. The spatial prior weuse is the spatial Laplacian employed in, for example, LowResolution Electromagnetic Tomography (LORETA) (Pascual-Marqui et al., 1994). This uses an L2-norm, which embodies abelief that sources are diffuse and highly distributed. These are to be

ource reconstruction with spatio-temporal priors, NeuroImage (2007),

mailto:[email protected]

http://dx.doi.org/10.1016/j.neuroimage.2007.07.062


https://www.researchgate.net/publication/8216155_Mapping_human_brain_function_with_MEG_and_EEG_methods_and_validation_Neuroimage_23S289-S299?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/220049282_Electrophysiology_of_Mind_Event-Related_Brain_Potentials_and_Cognition?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/8513090_Statistical_parametric_mapping_for_event-related_potentials_II_A_hierarchical_linear_model?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/7833590_Multichannel_matching_pursuit_and_EEG_inverse_solutions?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/6485103_Posterior_probability_maps_and_SPMs?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/8192749_A_general_linear_model_for_MEG_beamformer_imaging?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

2 N.J. Trujillo-Barreto et al. / NeuroImage xx (2007) xxx–xxx

ARTICLE IN PRESS

contrasted with priors based on L1-norms (Fuchs et al., 1999), Lp-norms (Auranen et al., 2005), Variable Resolution ElectromagneticTomography (VARETA) (Valdés-Sosa et al., 2000), models withmultiple priors (Mattout et al., 2006) or models employing BayesianModel Averaging (BMA) (Trujillo-Barreto et al., 2004), which canaccommodate more focal sources. In this paper, we use a singleLaplacian spatial prior, as it is the simplest available and because wewant to focus on the benefit of using temporal priors in addition tospatial priors. In the future, we envisage augmenting the approachwith more flexible spatial priors.

The use of both spatial and temporal constraints is not uniquewithin the source reconstruction community. Indeed, there havebeen a number of approaches that also make use of temporal priors.Baillet and Garnero (1997), in addition to considering edge-preserving spatial priors, have proposed temporal priors thatpenalize quadratic differences between neighboring time points.Schmidt et al. (2000) have extended their dipole-like modellingapproach using a temporal correlation prior which encouragesactivity at neighboring latencies to be correlated. Similarly,Daunizeau et al. (2005) propose magnitude priors and temporalsmoothness priors based on second derivatives. Galka et al. (2004)have proposed a spatio-temporal Kalman filtering approach whichis implemented using linear autoregressive models with neighbor-hood relations. This work has been extended by Yamashita et al.(2004), who have developed a “Dynamic LORETA” algorithm inwhich the Kalman filtering step is approximated using a recursivepenalized least squares solution. The algorithm is, however,computationally costly, taking several hours to estimate sourcesin even low-resolution source spaces. Compared to theseapproaches, our algorithm perhaps embodies stronger dynamicconstraints. However, the computational simplicity of fittingGLMs, allied to the efficiency of our inference procedure, resultsin a relatively fast algorithm. Moreover, the GLM can accom-modate damped sinusoidal and wavelet approaches that are idealfor modelling the transient and nonstationary responses in M/EEG.

The manuscript is organized as follows. In the Theory section,we describe the model and relate it to the existing literature ondistributed solutions. The success of the approach rests on our abilityto characterize neuronal responses, and task-related differences inthem, using GLMs. We describe how this can be implemented forthe analysis of evoked responses and show how the model can beinverted to produce source estimates using Variational Bayes (VB).In the Results section, the framework is applied to simulated dataand data from an EEG study of face processing.

Methods

Notation

Bold and regular lowercase variable names denote vectors andscalars, respectively. Bold uppercase names denote matrices withdimensions denoted by regular uppercase names. By convention,all vectors are assumed to be column vectors, whether itcorresponds to a row or a column of a matrix will be denoted byusing a dot (“·”) as a subscript indicating the non-singletondimension. That is, xi· (x·i) is a column vector containing theelements of the ith row (column) of matrix X. In what follows, N(x;μ, Σ) denotes a multivariate normal density over x, having meanμ and covariance Σ. The precision of a Gaussian variate is theinverse (co)variance. A gamma density over the scalar randomvariable x is written as Ga(x;a,b). We also use ||x||2=xTx, denote


the trace operator as tr(X), use diag(x) to denote a diagonal matrixwith diagonal entries given by the vector x and the symbol ⊗ forKronecker’s product.

Probabilistic generative model

The aim of the M/EEG inverse problem (or source reconstruc-tion) is to estimate the primary current density (PCD) J fromM/EEG measurements Y. If we have m=1,…,M sensors, g=1,…,Ggenerators, r=1,…,R trials (repetitions) and t=1,…,T time bins,then J and Y are multivariate time series of dimensions G×RT andM×RT, respectively. In order to keep notation simple, we will firstdescribe a single-trial model (R=1) and then will generalize to themultiple-trial case.

The applications in this paper use a cortical source space inwhich PCD orientations are constrained to be perpendicular togray/white matter interface. Each entry in J therefore correspondsto the scalar value (magnitude and sign) of the PCD vector atparticular locations and time points. This is related to sensormeasurements by solving the forward problem (FP) of the M/EEGthat uses Maxwell’s equations governing electromagnetic fields(Baillet et al., 2001).

Because measurements always have attached uncertainties, it isnatural to take a probabilistic approach. In this case, we are notinterested in a particular solution, but in the ensemble of possiblesolutions. That is, one always starts with a probability distributionrepresenting a priori information, and the use of observationsnarrows this distribution. The solution of the inverse problem isnot a particular model but the (posterior) probability distributionover the model space.

Most established distributed source reconstruction or “imaging”methods (Darvas et al., 2004) implicitly rely on the followinghierarchical model.

Y ¼ KJþ E

J ¼ Z ð1Þ

in which random fluctuations E correspond to sensor noise andsource activity J is generated by random innovations Z. Here wehave assumed that the signal at the sensors has been averaged overtrials to give the ERP Y. This corresponds to the two-levelprobabilistic generative model (PGM)

pð ydtjJ;WÞ ¼jT

t¼1Nð ydt;Kjdt;W

�1Þ

pðJjαÞ ¼jT

t¼1Nðjdt; 0;α�1D�1Þ ð2Þ

also shown schematically in Fig. 1, where j·t and y ·t are the sourceand sensor column vectors at time t and Ω−1 is the sensor noisecovariance. The matrix D reflects the choice of spatial prior and αis a spatial precision variable.

Our approach is then based on the following three-level model

Y ¼ KJþ E

JT ¼ XWþ Z

W ¼ R ð3Þ



https://www.researchgate.net/publication/7403109_MEG_source_localization_under_multiple_constraints_An_extended_Bayesian_framework?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/7786343_Bayesian_analysis_of_the_neuromagnetic_inverse_problem_with_lp-norm_priors?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/8229197_A_solution_to_the_dynamical_inverse_problem_of_EEG_generation_using_spatiotemporal_Kalman_filtering?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/8649528_Bayesian_model_averaging_in_EEGMEG_imaging?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/8662723_Recursive_Penalized_Least_Squares_Solution_for_Dynamical_Inverse_Problems_of_EEG_Generation?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/3321395_Electromagnetic_Brain_Mapping?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/7245325_Bayesian_Spatio-Temporal_Approach_for_EEG_Source_Reconstruction_Conciliating_ECD_and_Distributed_Models?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/216211775_A_Bayesian_Approach_to_Introducing_Anatomo-Functional_Priors_in_the_EEGMEG_Inverse_Problem?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/12874015_Linear_and_Nonlinear_Current_Density_Reconstructions?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/286232633_Variable_Resolution_Electric-Magnetic_Tomography?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

Fig. 2. Graphical representation of the proposed probabilistic generativemodel. It comprises three levels. The third level specifies a spatial prior onthe regression coefficients of the temporal GLM proposed for the primarycurrent density. This temporal model is specified in the second level,whereas the first level encodes the observation equation given by the EEGforward model.

Fig. 1. Graphical representation of the probabilistic generative modelimplicit in most distributed source reconstruction methods. It comprises twolevels. The second level specifies a spatial prior on the primary currentdensity and the first level incorporates the observation equation given by theEEG forward model.

3N.J. Trujillo-Barreto et al. / NeuroImage xx (2007) xxx–xxx

ARTICLE IN PRESS

Here we have random innovations Z which are “temporalerrors,” i.e., lack of fit of the temporal model, and R which are“spatial errors,” i.e., lack of fit of a spatial model. In this case, thespatial models are simply zero mean Gaussians with covariancesαk−1Dk

−1. That is, before observing data we believe that W=R, i.e.,that the regression coefficients are given by a random variationwith zero mean and spatial regularity αk

−1Dk−1. This belief will be

updated after observing our M/EEG data. We can regard XW as anempirical prior on the expectation of source activity.

The above equations can be re-expressed as the probabilisticgenerative model

pðYjJ;WÞ ¼jT

t¼1Nðydt;Kjdt;W

�1Þ ð4Þ

pðJjW;ΛÞ ¼jT

t¼1NðjTdt; xtdW;Λ�1Þ ð5Þ

pðWjαÞ ¼jK

k¼1NðwT

kd; 0;α�1k D�1

k Þ ð6Þ

The first level, Eq. (4), is identical to the standard model. In thesecond level, however, source activity at each generator isconstrained using a T×K matrix of temporal basis functions, X.The PGM is shown schematically in Fig. 2.

The precision of the source noise is given by Λ. In this paper,Λ=diag(λ), where the diagonal element λg is the noise precisionat the gth generator. That is, event-related source activity isdescribed by the time-domain GLM and remaining source activitywill correspond to unmodelled responses. The quantity Λ−1 cantherefore be thought of as the variance of spontaneous and/orinduced activity in source space. The regression coefficients Wdetermine the weighting of the temporal basis functions.

The third level of the model is a spatial prior that reflects ourprior uncertainty about W. The kth row of W, wk·, is a map ofregression coefficients in source space. It provides a generator-specific weighting of the k-th column of the design matrix, i.e., ofthe k-th putative experimental effect or temporal basis function.Each regression coefficient map is constrained by setting Dk tocorrespond to the usual L2-norm spatial prior. The spatial prior thatis usually on the PCD now appears at a superordinate level.Different choices of Dk result in different weights and different


neighborhood relations. This lends the model a higher degree offlexibility by allowing the different effects to be assigned differentspatial priors.

The applications in this paper use Dk=D=LTL, where L is adiscrete surface Laplacian as defined by Huiskamp (1991), whichimplements second-order differences on geodesic distances. Theparameter αk then controls the spatial smoothness of the kth mapwk. This is important because it allows different responsecomponents to have different spatial characteristics, e.g., responsecomponents with longer time scales may be more spatially diffuse(Buzsaki and Draguhn, 2004).

The first level of the model assumes that there is Gaussiansensor noise e·t, with zero mean and covariance Ω−1. Thiscovariance can be estimated from prestimulus or baseline periodswhen such data are available (Sahani and Nagarajan, 2004).Alternatively, we assume that Ω=diag(σ) where the mth elementof σ is the noise precision on the mth sensor, and provide a schemefor estimating σm, should this be necessary. For this, we also placeconjugate Gamma priors on the precision variables σ, λ and α

pðsÞ ¼jM

m¼1Gaðrm; brm ; crmÞ

pðkÞ ¼jG

g¼1Gaðkg; bkg ; ckg Þ

pðαÞ ¼jK

k¼1Gaðαk ; bαk ; cαk Þ ð7Þ

This allows the inclusion of further prior information into thesource localization. For example, instead of using baseline periodsto estimate a full covariance matrixΩ−1, we could use these data toestimate the noise variance at each sensor. This information couldthen be used to set bσm and cσm, allowing noise estimates duringperiods of interest to be constrained softly by those from baseline



https://www.researchgate.net/publication/8490406_Neuronal_Oscillations_in_Cortical_Networks?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/221617938_Reconstructing_MEG_Sources_with_Unknown_Correlations?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/222474331_Difference_formulas_for_the_surface_Laplacian_on_a_triangulated_surface?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


ARTICLE IN PRESS

periods. Similarly, we may wish to enforce stronger or weakerspatial regularization on wkU by setting bαk

and cαk appropriately.The applications in this paper, however, use uninformative gammapriors by setting all scale and shape parameters in (7) to 1000 and0.001 (mean 1 and variance 1000), respectively. This means that σ,λ and α will be estimated solely from the data Y.

In summary, the addition of the superordinate level to ourgenerative model induces a partitioning of source activity intosignal and noise. This empirical Bayes perspective means that theconditional estimates of source activity J are subject to “bottom-up” constraints provided by the data, and “top-down” predictionsfrom the third level of our model. We will use this heuristic later tounderstand the update equations used to estimate source activity.

Temporal models

The usefulness of the present spatio-temporal approach rests onour ability to characterize neuronal responses using GLMs.Fortunately, there is a large literature that suggests this is possible.The type of temporal model necessary will depend on the M/EEGresponse one is interested in. These components could be (i) singletrials, (ii) evoked components (steady-state or ERPs; Rugg andColes, 1995) or (iii) induced components (Tallon-Baudry et al.,1996). In this paper we focus on single trials and ERPs leavingsteady-state and induced components the subject of futurepublications.

The basis functions will form columns in the GLM designmatrix, X (see Eq. (3) and Fig. 2). Basis functions could bederived from damped sinusoids (Demiralp et al., 1998) orprincipal components (Trejo and Shensa, 1999; Friston et al.,2006) but in this paper we use a wavelet representation. That is,given an M/EEG signal, f

f ¼XKk¼1

wkxk ð8Þ

where xk are wavelet basis functions and wk are waveletcoefficients. Wavelets are derived by translating and dilating amother wavelet and provide a tiling of time–frequency space thatgives a balance between time and frequency resolution. The Q-factor of a filter or basis function is defined as the centralfrequency to bandwidth ratio. Wavelet bases are chosen to provideconstant Q (Unser and Aldroubi, 1996). This makes them goodmodels of nonstationary signals, such as ERPs and induced EEGcomponents (Tallon-Baudry et al., 1996). If K=T, then themapping f→w is referred to as a wavelet transform, and for KNTwe have an overcomplete basis set. More typically, we have K≤T.

In the ERP literature, the particular subset of basis functionsused is chosen according to the type of ERP component one wishesto model. Popular choices are wavelets based on B-splines (Unserand Aldroubi, 1996). In statistics, however, it is well known that anappropriate subset of basis functions can be automatically selectedusing a procedure known as “wavelet shrinkage” or “waveletdenoising.” This relies on the property that natural signals such asimages, speech or neuronal activity can be represented using asparse code comprising just a few large wavelet coefficients.Gaussian noise signals, however, produce Gaussian noise inwavelet space. This comprises a full set of wavelet coefficientswhose size depends on the noise variance. By “shrinking” thesenoise coefficients to zero using a thresholding procedure (Donohoand Johnstone, 1994; Clyde et al., 1998), and transforming back


into signal space, one can denoise data. This amounts to defining atemporal model. We will use this approach for the empirical workreported in this paper. We also note that it is possible to incorporatethe wavelet shrinkage methods into the probabilistic generativemodel by modifying Eq. (7). This has been implemented formodels of fMRI data with spatial wavelet priors (Flandin andPenny, 2007). In this paper, however, wavelet shrinkage isimplemented outside of the model by using a standard thresholdingprocedure (Donoho and Johnstone, 1994).

Multiple-trials model

When considering R independent trials or repetitions, our PGMcan be written as

pðYjJ;WÞ ¼jR

r¼1jT

t¼1Nðyd tr;Kjdtr;W

�1Þ

pð JjW;ΛÞ ¼jR

r¼1jT

t¼1NðjTdtr; xrtdW;Λ�1Þ

pðWjαÞ ¼jK

k¼1NðwT

kd; 0;α�1k D�1

k Þ ð9Þ

where Y (M×RT) and J (G×RT) are multivariate time seriesobtained by concatenating all trials of the measured ERP and theestimated PCD, respectively; and xrtU is the K×1 vector ofregressors for the tth time bin and the rth trial. Here we haveassumed that the effect of interest W is the same in all trials. Thistreats trials as fixed rather than random effects, as is the case forstandard analyses of single-subject fMRI data (Frackowiak et al.,2003). Thus, the multiple-trial design matrix X in this case isconstructed by block repeating our design matrix for a single trial

X ¼ 1R � X ð10Þwhere 1R denotes a column vector of ones with length R. Note thatthe multiple-trial PGM and the corresponding hierarchical modelhave the same form as for the single-trial case, if we just use X= Xin Eq. (3), and take the index t to run over time and across trials inEqs. (4) and (5) (t=1,…,RT). Thus, for simplicity, we will keep thesame notation as before. In summary, multiple trials are treated byforming concatenated data and design matrices.

Bayesian inference

To make inferences about the sources underlying M/EEG, weneed to invert our PGM to produce the posterior density p(J|Y).This is straightforward in principle and can be achieved usingstandard Bayesian methods (Gelman et al., 1995). For example,one could use Markov Chain Monte Carlo (MCMC) to producesamples from the posterior. This has been implemented efficientlyfor dipole-like inverse solutions (Schmidt et al., 1999) in whichsources are parameterized as spheres of unknown number, extentand location. It is, however, computationally demanding fordistributed source solutions, taking several hours for source spacescomprising GN1000 generators (Auranen et al., 2005). In thiswork we adopt the computationally efficient approximate inference



https://www.researchgate.net/publication/7321504_Bayesian_estimation_of_evoked_and_induced_responses?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/13102821_Bayesian_Inference_Applied_to_the_Electromagnetic_Inverse_Problem?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/2795063_Ideal_Spatial_Adaptation_by_Wavelet_Shrinkage?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==





https://www.researchgate.net/publication/2985010_A_Review_of_Wavelets_in_Biomedical_Applications?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==



https://www.researchgate.net/publication/14448472_Stimulus_Specificity_of_Phase-Locked_and_Non-phase-Locked_40_Hz_Visual_Responses_in_Human?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==



https://www.researchgate.net/publication/13573139_Analysis_of_event-related_potentials_ERP_by_damped_sinusoids?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/2650341_Multiple_Shrinkage_and_Subset_Selection_in_Wavelets?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/6642609_Bayesian_fMRI_data_analysis_with_sparse_spatial_basis_function_priors?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/13209842_Feature_Extraction_of_Event-Related_Potentials_Using_Wavelets_An_Application_to_Human_Performance_Monitoring?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/271076919_Bayesian_Data_Analysis?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


ARTICLE IN PRESS

framework called Variational Bayes (VB) (Lappalainen andMiskin, 2000; Beal, 2003; Friston et al., 2007).

Variational Bayes framework

This is a recent development from the machine learningcommunity and is based on the Variational Free Energy method ofFeynman and Bogoliubov. The central quantity of interest is theposterior distribution p(θ|Y). This implies estimation of both theparameters and the uncertainties associated with their estimation.Given a PGM of the data, the log-evidence or marginal likelihoodcan be written as

log p Yð Þ ¼Z

q θð Þlog p Yð Þdθ ¼Z

q θð Þlog pðY; θÞqðθÞqðθÞpðθjYÞ� �

dθ

¼ F þ KL q θð Þjjp θjYÞ� ð11Þð½here, q(θ) is the approximate posterior. We have

F ¼Z

q θð Þlog pðY; θÞqðθÞ dθ ð12Þ

which is known (to physicists) as the negative variational freeenergy and

KL½q θð ÞjjpðθjYÞ� ¼Z

q θð Þlog qðθÞpðθjYÞ dθ ð13Þ

is the KL divergence (Cover and Thomas, 1991) between theapproximate posterior q(θ) and the true posterior p(θ|Y).

The aim of VB learning is to maximize F and so make theapproximate posterior as close as possible to the true posterior. Onegeneric procedure for ensuring that the integrals in F are tractableis to assume that the approximating density factorizes over groupsof parameters (mean-field approximation)

qðθÞ ¼ jiqðθiÞ ð14Þ

where θ is the ith group of parameters.

Fig. 3. Hierarchical model representing the spatio-temporal deconvolutionembodied by the VB estimator of the PCD. The estimated PCD receivescontributions from two terms: (i) a “top-down” prediction from the temporalGLM and (ii) a “bottom-up” prediction from the spatial lead-field model,both weighted by their respective precisions.

Approximate posteriors

For our source reconstruction model we assume the followingfactorization of the approximate posterior

qðJ;W;α;l;sÞ ¼ qðJÞqðWÞqðαÞqðsÞqðlÞ ð15Þ

We also assume that the approximate posterior for theregression coefficients factorizes over generators

qðWÞ ¼jG

g¼1qðwdgÞ ð16Þ

This approximation was used in the spatio-temporal model forfMRI described in Penny et al. (2005). Because of the spatial prior(Eq. (6)), the regression coefficients in the true posterior p(W|Y)will clearly be correlated. Our perspective, however, is that this istoo computationally burdensome for current personal computers totake account of. Moreover, as we shall see, updates for ourapproximate factorized densities q(w·g) do encourage the approx-imate posterior means to be similar at nearby generators, therebyachieving the desired effect of the prior.


Now that we have defined the probabilistic model and ourfactorization of the approximate posterior, we can use VB to deriveexpressions for each component of the approximate posterior. Wedo not present details of these derivations in this paper. Similarderivations have been published elsewhere (Penny et al., 2005).The following sections describe each distribution and the updatesof its sufficient statistics required to maximize the lower bound onthe model evidence, F.

Primary current densityUpdates for the sources are given by

qðJÞ ¼jT

t¼1qðjdtÞ ð17Þ

qðjdtÞ ¼ Nðjdt ; jdt; ΣJÞ ð18Þ

ΣJ ¼ ðKT VK þ ΛÞ�1 ð19Þ

jdt ¼ ΣJðKT Vydt þ ΛWTxTtdÞ ð20Þ

where ĵ·t is the tth column of Ĵ and Ω , Λ and Ŵ are estimatedparameters defined in the following sections. We have not assumedthat q(J) factorizes over time, but this ‘falls out’ of the equations,primarily because we have assumed that the additive noise Efactorizes over time (i.e., IID observation noise). Given that thesource covariance matrix does not change with time, Eq. (20) canbe rewritten in a more compact form

J ¼ ΣJðKT VYþ ΛWTXT Þ ð21Þ

This expression shows that our source estimates are the result ofa spatio-temporal deconvolution. The spatial contribution to theestimate is KTY and the temporal contribution is ŴTXT. From theperspective of the hierarchical model, shown in Fig. 3, these are the“bottom-up” and “top-down” predictions. Importantly, each



https://www.researchgate.net/publication/8102533_Bayesian_fMRI_time_series_analysis_with_spatial_priors?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/6739682_Variational_free_energy_and_the_Laplace_approximation_NeuroImage?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/34000771_Variational_algorithms_for_approximate_Bayesian_inference?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


ARTICLE IN PRESS

prediction is weighted by its relative precision. Moreover, theparameters controlling the relative precisions Ω and Λ areestimated from the data. This means that our source estimatesderive from an automatically regularized spatio-temporal decon-volution. This property is shared by the spatio-temporal model forfMRI, described in Penny et al. (2005).

An important characteristic of the source update in Eq. (21) isthat the source estimate at each time point depends on the posteriorcovariance matrix between different sources, ΣJ. This allows thedeconvolution algorithm to accommodate correlation betweensources and should be contrasted with the minimum variancebeamformer estimate (Darvas et al., 2004)

jTgd ¼ bTd gY ð22Þwhere the 1×T time series at generator g is given by projecting M/EEG data Y onto the spatial filter b·g

T. This projection is repeatedseparately for all source locations g. The spatial filter, which isderived by assuming that different sources are uncorrelated, isgiven by

bd g ¼ ðkTd gVkd gÞ�1Vkd g ð23Þ

As a result, the beamformer is unable to localize correlatedactivity (Sahani and Nagarajan, 2004). We will return to thisimportant issue in the Results section.

We end this section by noting that statistical inferences aboutcurrent sources are more robust than point predictions. Thisproperty has been used to great effect with Pseudo-z beamformerstatistics (Robinson and Vrba, 1999), sLORETA (Pascual-Marqui,2002) and VARETA (Bosch-Bayard et al., 2001) source recon-structions, which divide current source estimates by their standarddeviations. This approach can be adopted in the current frameworkas the standard deviations are readily computed from the diagonalelements of ΣJ. Moreover, we can threshold these statistic imagesto create posterior probability maps (PPMs), as introduced byFriston and Penny (2003).

Regression coefficientsUpdates for the regression coefficients are given by

qðwdgÞ ¼ Nðwd g; wd g; Σwd g Þ

Σwd g ¼ ð kgXTXþ diagðdggÞdiagð αÞÞ�1

wd g ¼ Σwg ð kgXT jg d þ diagð αÞrgÞ ð24Þwhere α is the estimated parameter defined later on, dij=[dij1,…,dijK]

T is a K×1 vector containing the (i, j)th element of all the Dk

matrices and rg is the weighted sum of neighboring regressioncoefficient estimators and is given by

rg ¼XG

g V¼1;g Vpg

diagðdgg VÞ wd g V ð25Þ

The update for ŵ·g in Eq. (24) therefore indicates that theregression coefficient estimates at a given generator regress towardthose at nearby generators. This is the desired effect of the spatial


prior and it is preserved despite the factorization in the approximateposterior. This equation can again be thought of in terms of thehierarchical model where the regression coefficient estimate is acombination of a “bottom-up” prediction from the level below,XTĵg ·, and a “top-down” prediction from the prior, rg. Again, eachcontribution is weighted by its relative precision.

The update for the covariance in Eq. (24) shows that the onlyoff-diagonal contributions are due to the design matrix. If thetemporal basis functions are therefore chosen to be orthogonal thenthis posterior covariance will be diagonal, thus making apotentially large computational saving. One benefit of the proposedframework, however, is that non-orthogonal bases can beaccommodated. This may allow for a more natural and compactdescription of the data.

Precision of temporal modelsUpdates for the precision of the temporal model are given by

qðkÞ ¼jG

g¼1Gaðkg; bkg ; ckg Þ

1

bkg¼ 1

bkgþ 12

XTt¼1

ð jgt � wTd gxtdÞ2 þ ðΣJÞgg þ xTtd Σwdgxtd�

h

ckg ¼12T þ ckg

kg ¼ bkg ckg ð26Þ

Where (ΣJ)gg is the gth diagonal element of ΣJ. In thecontext of ERP analysis, these expressions amount to an estimateof the variance of spontaneous and/or induced activity atgenerator g, λ g

−1 given by the squared error between the evokedcomponent estimate, ŵ·g

Txt·, and source estimate, ĵgt at the givengenerator, averaged over time, and the other approximateposteriors.

Precision of forward modelUpdates for the precision of the sensor noise are given by

qðsÞ ¼jM

m¼1qðrmÞ

qðrmÞ ¼ Gað brm ; crmÞ

1

brm¼ 1

brmþ 12

XTt¼1

ðymt � kTmd

jdtÞ2 þ12kTmdΣjdtkmd

crm ¼ T2þ crm

rm ¼ brm crm ð27Þ

These expressions amount to an estimate of observation noisevariance at the mth sensor, σm

−1, given by the squared error betweenthe forward model and sensor data, averaged over time and theother approximate posteriors.



https://www.researchgate.net/publication/10911466_Standardized_low_resolution_brain_electromagnetic_tomography_SLORETA_Technical_details?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==







ARTICLE IN PRESS

Precision of spatial priorUpdates for the precision of the spatial prior are given by

qðαÞ ¼jK

k¼1qðαkÞ

qðαkÞ ¼ Gað bαk ; cαk Þ

1

bαk

¼ 1bαk

þ jjDkwTkdjj2 þXG

g¼1

dggkðΣwd g Þkk

cαk ¼G2þ cαk

αk ¼ bαk cαk ð28Þ

where (Σw·g)kk is the kth diagonal element of Σw·g

. Theseexpressions amount to an estimate of the “spatial noise variance”αk−1, given by the discrepancy between neighboring regression

coefficients, averaged over space and the other approximateposteriors.

To summarize, our source reconstruction model is fitted to databy iteratively applying the update equations until the change in thenegative free energy F, is less than some user-specified tolerance.This procedure is summarized in the pseudo-code in Fig. 4 and willlead to a local maximum of F (for a general discussion ofconvergence issues, see Friston et al. (2007).

This amounts to a process in which sensor data is spatiallydeconvolved, time-series models are fitted in source space, andthen the precisions (accuracy) of the temporal and spatial modelsare estimated. This process is then iterated and results in a spatio-temporal deconvolution in which all aspects of the model areoptimized to maximize a lower bound on the model evidence.The algorithm can be efficiently implemented as described inAppendix A.

Results

This section describes the application of the approach presentedhere to (i) biophysically realistic simulated data and (ii) EEG froma face-processing experiment. In all cases, we used the same sensorand source spaces. The sensor space was defined using M=128electrodes from the BioSemi ActiveTwo System. The source spacethen consisted of a mesh of nodes (generators) corresponding to thevertices of the triangles obtained by tessellation of the gray/whitematter interface of the realistic digital brain phantom developed atthe Montreal Neurological Institute (MNI) (Collins et al., 1998).The tessellation comprised 12,000 triangles and G=6004 vertices.

Fig. 4. Pseudo-code for the Variational Bayes algorithm. Iterative update ofthe approximate posterior components results in increasing the lower bound,F, on the model evidence.


We used the three concentric sphere model to calculate theelectric lead field (Rush and Driscoll, 1969). The centre and radiusof the spheres were fitted to the scalp, skull and cerebral tissue ofthe same brain. In what follows we refer to the spatio-temporalapproach as “VB-GLM.”

ERP simulation

We used our generative model to simulate ERP-like activity byusing the waveforms and spatial profiles shown in Fig. 5. As canbe seen, the two waveforms are temporally correlated (Corr=0.86)with main peaks that mimic an ERP component at about t=200 mspost-stimulus. These waveforms were derived from a neural massmodel describing activity in a distributed network of cortical areas(David and Friston, 2003), which lends these simulations a degreeof biological plausibility. The two spatial profiles in turn consistedof Gaussian blobs with identical maximum amplitudes of 10, andidentical full width at half maximums (FWHM) of 20 mm. Thespatial extent of the activated areas was constrained by taking ageodesic neighborhood of 3 nodes around the centre of eachGaussian and setting to zero the activity outside.

These temporal and spatial profiles were then used respectivelyas design matrix and regression coefficients to generate data fromour model. Ten trials of sensor data were generated using signal-to-noise ratios (SNR) of 10 and 40 at the sensor and source levels,respectively. Here, we defined SNR as the ratio of the signalstandard deviation to noise standard deviation. Signal epochs of512 ms were then produced with a sampling period of 4 ms, givinga total of 5120 ms of EEG (1280 time bins).

We estimated the sources underlying the sample ERP (i) withan overspecified temporal model that incorporated two spuriousregressors in addition to the ones used to generate the data and (ii)with a temporal model that consisted of Battle-Lemarie waveletsobtained by application of the wavelet shrinkage algorithm to thefirst eigenvector of the simulated sensor data.

Overspecified temporal modelThe four regressors that form the overspecified temporal model

for a single trial are shown in Fig. 6A. Note that Regressors 1 and 2contain the source waveforms that were used to generate the data.These four regressors were then concatenated to form the designmatrix that models the 10 simulated trials (Fig. 6B). It is importantto note that this design matrix is not orthogonal because the fourregressors are temporally correlated.

The model was then fitted to the data using VB-GLM. Asshown in Fig. 7, the true effects (regression coefficients) areaccurately recovered, whereas the spurious regression coefficientsare shrunk toward zero. The shrinking effect is evident whenlooking at the estimated spatial precision, αk, for each regressioncoefficient shown in the upper panel of Fig. 7. These determinehow precisely the effects are constrained around zero. A largeprecisions implies a strong shrinkage toward zero. As can be seen,the precisions corresponding to the spurious regressors are fourorders of magnitude greater than the precisions corresponding tothe true regressors. These results are a consequence of the spatialprior and the iterative spatio-temporal deconvolution, anddemonstrate that source reconstruction with temporal priors isrobust to model mis-specification. This also shows that VB-GLM,in contrast to, for instance, traditional beamforming approaches(see section “VB-GLM vs. minimum variance beamformer”), iscapable of localizing temporally correlated sources.



https://www.researchgate.net/publication/8605533_A_Neural_Mass_Model_for_MEGEEG_coupling_and_neuronal_dynamics?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/17409712_EEG_Electrode_Sensitivity-An_Application_of_Reciprocity?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

Fig. 5. Temporal (left) and spatial (right) profiles of the biophysically realistic sources used for generating simulated data. The two waveforms incorporate anegative component at t≈200 ms.

Fig. 6. Definition of the overspecified temporal model used for sourcereconstruction of simulated data. (A) Four regressors used for constructingthe design matrix of the temporal GLM of the PCD. Regressors 1 and 2 arethe same as used for generating the data, while 3 and 4 are spuriousregressors. (B) Image of the overspecified multiple-trial design matrix. Notethat all regressors are temporally correlated.


ARTICLE IN PRESS


Additionally, note that the reconstructed effect sizes arediminished with respect to the true ones. This could be due tothe increasing source amplitude bias with depth that is inherent toall distributed inverse solutions (Trujillo-Barreto et al., 2004). Thebias is less in the case of Regression Coefficient 1 because the left-frontal and left-occipital sources are closer to the sensors than theleft-temporal and right-postcentral sources in Regression Coeffi-cient 2. These depth effects can be overcome with sparse priors(Trujillo-Barreto et al., 2004). It is possible, however, that this biasis not only a depth effect but it could also reflect the well-knownoverconfidence problem of mean-field approximations. In our case,this is most likely the consequence of assuming that theapproximate posterior for the regression coefficients factorizesover generators, resulting in posterior certainties that are slightlytoo high.

Wavelet temporal modelThe simple example we have just described, although useful

for illustrative purposes, is of limited use because, in practice, wecan never know which are the “true” regressors.

We now describe how the temporal model can be constructedgenerically using the data at hand. We first partition the sensordata into two halves. The first five trials were used to fit thetemporal model, while the remaining five trials were used forsource reconstruction. We then extracted the first eigenvector ofthe ERP calculated from the first five trials using a singular valuedecomposition (SVD) and fitted a Battle-Lemarie wavelet modelto this time series. The upper panel of Fig. 8 shows thecorresponding time series estimate. This employed K=33 basisfunctions, as determined by application of the wavelet shrinkagealgorithm (Donoho and Johnstone, 1994; Clyde et al., 1998).The corresponding single-trial design matrix of our temporalmodel is shown in the lower panel of Fig. 8. This matrix wasextended to the multiple-trial case using Eq. (10), and then usedfor source reconstruction.

Because it is impractical to present the source estimates forall time instants, we will show results for a single time point. Fig.9 shows the true and estimated PCD, averaged over trials andnormalized to the respective maximum absolute values, fort=200 ms. Note that, at this latency, both spatial profiles are







Fig. 7. Results of the VB-GLM approach with an overspecified temporal model. Upper panel: Estimated spatial precisions, αˆk, for each regression coefficient.Lower panels: Estimated regression coefficients ŵk. Coefficients 1 and 2 correspond to the regressors used to generate the simulated data and are correctlyreconstructed (compare to Fig. 5). Coefficients 3 and 4 corresponding to the spurious regressors, are shrunk toward zero. The maximum of the scale has thefollowing values (from left to right): 5.72, 5.72, 0.03, and 0.02. Note that the maximum of the scale for Regression Coefficient 2 has been set to the maximum ofRegression Coefficient 1 for comparison.


ARTICLE IN PRESS

activated and correspond to negative peaks of activity. This patternis satisfactorily recovered by VB-GLM.

Spatial vs. spatio-temporal approachIn order to quantitatively assess the effect of the temporal prior,

we compared the results of our full VB-GLM approach to a limitedversion where the temporal prior is not used. This is achieved byusing X= IT and Z=0M×T. In this case, the three-level PGMdepicted in Fig. 2 reduces to the two-level one in Fig. 1, whichunderlies the majority of the “instantaneous” approaches reportedin the literature. In our case, given the spatial Laplacian prior thatwe have assumed for the regression coefficients, this reducedmodel can be considered to be equivalent to an “instantaneous”LORETA solution (Pascual-Marqui et al., 1994).

We used two measures for comparison. First, we calculated thereceiver operating characteristic (ROC) for the two approaches.This is a plot of the sensitivity versus 1 minus the specificity, andwas generated by declaring a generator to be active if the effect sizewas larger than some arbitrary threshold. Although ROC curveshave been extensively applied to evaluate the detection accuracy ofdiagnostic imaging techniques, they do not provide an explicitassessment of localization accuracy. Therefore, we also calculated


the distance-based localization receiver operating characteristic(DL-ROC) (Biscay-Lirio et al., 1992). This curve describes thevariation of localization error over the range of arbitrary thresholdsused (see Appendix A).

The results for t=200 ms are shown in Fig. 10 (right panel). Ascan be seen, the VB-GLM outperforms the instantaneous approachin both detection and localization accuracy. The results alsoindicate that increased sensitivity can be achieved while maintain-ing high specificity. Additionally, the source reconstructionobtained with VB-GLM is less blurred and contains less ghosting(the curse of traditional linear inverse solutions).

Another perspective on these simulations is given by thetemporal evolution of the activity for the generator of maximumamplitude at t=200 ms. This is shown in Fig. 11. We see that theestimated VB-GLM time course is much smoother than with theinstantaneous approach. This is clearly a consequence of thetemporal prior used.

VB-GLM vs. minimum variance beamformerWe have demonstrated that VB-GLM is capable of recovering

highly correlated sources, which has been reported not to be thecase for the minimum variance beamformer (MV-BF) (Sahani and



https://www.researchgate.net/publication/21511586_Localization_error_in_biomedical_imaging?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

Fig. 8. Definition of the wavelet temporal model used for sourcereconstruction of simulated data. (A) Fitting of the data first eigenvectorwith K=33 Battle-Lemarie wavelets as calculated using wavelet shrinkage.(B) Image of the corresponding single-trial design matrix. The leftmostcolumns contain lower frequencies with progressively higher frequencies tothe right. This design matrix is replicated over trials using Eq. (10).


ARTICLE IN PRESS

Nagarajan, 2004). This motivated an empirical comparisonbetween the two approaches. For this, Eqs. (22) and (23) for theMV-BF were implemented and applied to the simulated datadescribed previously. Additionally, data from a simulationcomprising a single left-occipital source were analyzed with bothmethods. The time series for this single source simulation wasgiven by Regressor 1 in Fig. 6. The rest of the simulation settingswere kept as before. The initial MV-BF estimates were found tocontain erroneously large values near the centre of the sphere usedfor the forward calculation (Sekihara et al., 2001). This is becausethe norm of the lead field becomes very small in that region. Toavoid these artefacts, a normalized lead-field matrix was used inEq. (23) (Van Veen et al., 1997; Robinson and Vrba, 1999).

The true and estimated PCDs at t=200 ms for the two simulateddata sets and for both the VB-GLM and MV-BF approaches areshown in Fig. 12. All maps have been normalized to theirrespective maximum absolute values. As can be seen, VB-GLMoutperforms MV-BF in the two cases. In the single-source case, theMV-BF reconstruction, although giving activity in the area of thetrue activation, it is significantly more blurred than the VB-GLM


solution. Moreover, MV-BF is unable to recover multiplecorrelated sources, as expected.

A quantitative comparison was also carried out by calculatingthe ROC and DL-ROC curves for all cases. The results are shownin Fig. 13. In all cases, VB-GLM showed higher sensitivity for anylevel of specificity. This is critical in the case of multiple correlatedsources, for which MV-BF performed very poorly. Note also thateven in the single-source case, where MV-BF showed its bestdetection accuracy (ROC), its localization accuracy (DL-ROC) wasvery low.

Face ERPs

This section presents an analysis of a face processing ERP dataset from Henson et al. (2003). Details of the experimentalparadigm as well as the full data set can be found at www.fil.ion.ucl.ac.uk/spm/data/mmfaces.html.

Experimental paradigmThe experiment involved randomized presentation of 86 faces

and 86 scrambled faces, as described in Fig. 14. Half of the facesare familiar and half unfamiliar, creating three event-types(conditions) in total, although only the basic contrast of faces vs.scrambled faces is described here. The faces condition in this casewas obtained by collapsing over familiarity.

The scrambled faces were created by 2-D Fourier transforma-tion, random phase permutation, inverse transformation andoutline-masking of each face. Thus, faces and scrambled facesare closely matched for low-level visual properties such as spatialfrequency power density. The subject had to judge the left–rightsymmetry of each stimulus (face and scrambled) around animaginary vertical line through the centre of the image. Faces werepresented for 600 ms, every 3600 ms.

The EEG data were acquired on a 128-channel BioSemiActiveTwo system (see Fig. 5), sampled at 1024 Hz, pluselectrodes on the left earlobe, right earlobe, and two each tomeasure HEOG and VEOG. The data were referenced to theaverage of left and right earlobe electrodes and epoched from−200 ms to +600 ms. These epochs were then detrended andexamined for artefacts, defined as time points that exceeded anabsolute threshold of 120 μV (mainly in the VEOG). A total of 29of the 172 trials were rejected.

Data analysisThe epochs were averaged according to the two trial types faces

(F) and scrambled faces (S) to produce condition specific ERPs, forvisualization purposes. The first clear difference F–S was maximalaround 170 ms, appearing as an enhancement of a negativecomponent (peak N170) at occipito-temporal channels, orenhancement of a positive peak near Cz (e.g., channel C1). Theseeffects are shown as a differential topography and as time series inFig. 15.

The source reconstruction method (VB-GLM) was then appliedto the single-trial (unaveraged) data. Before applying the model,the data were first down-sampled by a factor of 4, and the 128samples following stimulus onset were extracted. These steps weretaken as we used WaveLab to generate the wavelet bases (for theGLM) which uses a pyramidal algorithm to compute coefficients,thus requiring the number of samples to be a power of two.

We then extracted the first eigenvector of the ERP for eachcondition using SVD and fitted Battle-Lemarie wavelet models to


http://www.fil.ion.ucl.ac.uk/spm/data/mmfaces.html

http://www.fil.ion.ucl.ac.uk/spm/data/mmfaces.html


https://www.researchgate.net/publication/11899565_Reconstructing_spatio-temporal_activities_of_neural_sources_using_an_MEG_vector_beamformer_technique?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


https://www.researchgate.net/publication/13937409_Localization_of_Brain_Electrical_Activity_via_Linearly_Constrained_Minimum_Variance_Spatial_Filtering?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

Fig. 9. True and VB-GLM source estimates for the simulated data, based on the wavelet temporal model. The left panel indicates the time point of interest(t=200 ms). At this latency, the two simulated sources show simultaneous negative peaks. The right panel shows the true PCD at t=200 ms, as well as thecorresponding VB-GLM estimate, averaged across trials. The two maps have been normalized to their respective maximum absolute values.


ARTICLE IN PRESS

these time series. Fig. 16 shows the corresponding time seriesestimates. We employed K=23 and K=30 basis functions forconditions F and S, respectively, as determined by waveletshrinkage (Donoho and Johnstone, 1994). These functions werethen used to construct the single-trial design matrices for the twoconditions, each comprising the Battle-Lemarie basis (see Fig. 16).These matrices were then repeated for all trials to produce themultiple-trial condition-specific design matrices of dimensions11,008×23 and 11,008×30 (see Eq. (10)). Finally, the full designmatrix for all trials and for the two conditions was constructed as ablock diagonal matrix, where each block contained the multiple-trials design matrix for each condition. This matrix, which fullyintegrates the experimental design, is of dimension 22,016×53.

All trials for faces and scrambled faces were then concatenatedto form a vector of 22,016 elements at each electrode. The sensordata matrix was then of dimension 128×22,016. The source spaceused was the same as for the simulations.

We then applied the source reconstruction algorithm andobtained a solution after 6 min of processing. The estimated PCDsaveraged across trials for conditions F and S at t=170 ms, are shownin Fig. 17. The two solutions have been normalized to the maximumof the solution for condition F. As can be seen, the spatial distributionof the sources in both cases show bilateral activity in the fusiformarea, with the cluster of maximum activation in the right hemisphere.The temporal waveforms corresponding to the generators withmaximum activity in the two conditions are also shown in Fig. 17(right panel). As expected, maximum differences between condi-tions are obtained for t=170 ms.

The overall effect of faces was obtained by applying theappropriate contrast to the fitted source reconstruction and isshown more clearly in Fig. 18. The image in the upper panel showsdifferences between conditions at each generator, normalized to the


maximum positive difference. The overall activation pattern showsa number of clusters of positive and negative differences. Byconvention we have constrained the PCD to be perpendicular to thecortical surface and directed outward. Then positive differencescan be interpreted as an increased outward or a decreased inwardPCD, while negative differences can be associated with decreasedoutward or an increased inward PCD.

In order to better characterize the effect of faces, the positiveand negative differences were normalized to their respectivemaximum values and thresholded at 30% and 80%. The results areshown in the lower panels of Fig. 18. At 30%, four main clustersappear at (i) right fusiform, (ii) left fusiform (iii) right temporal,and (iv) anterior frontal regions. With an 80% threshold, only theright fusiform and right temporal activations are present. Theseactivations are consistent with previous fMRI and MEG analyses(Henson et al., 2003) and the classical “core model” for facerecognition and perception (Haxby et al., 2002; Gobbini andHaxby, 2007).

Discussion

This paper has described a model-based spatio-temporaldeconvolution approach to source reconstruction. Sources arereconstructed by inverting a forward model comprising a temporalprocess as well as a spatial process. This approach relies on the factthat EEG and MEG signals are extended in time as well as space.

It rests on the notion that MEG and EEG reflect the neuronalactivity of a spatially distributed dynamical system. Depending onthe nature of the experimental task, this activity can be highlylocalized or highly distributed and the dynamics can be more, orless, complex. At one extreme, listening for example to simpleauditory stimuli produces brain activations that are highly localized




https://www.researchgate.net/publication/6986085_Neural_systems_for_recognition_of_familiar_faces?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==


Fig. 10. Spatial effects: VB-GLM vs. instantaneous solution. The left panels show ROC and DL-ROC curves corresponding to VB-GLM and the instantaneoussolution for t=200 ms. In this case, the instantaneous solution was obtained by reducing the full VB-GLM approach to the traditional two-level model of Fig. 1(X= I, Z=0). The right panels show the corresponding PCD estimate averaged across trials. The two maps have been normalized to their respective maximumabsolute values.

Fig. 11. Temporal effects: VB-GLM vs. instantaneous solution. The figureshows the time course of the VB-GLM and instantaneous estimations for thegenerator with maximum activation at t=200 ms. The two time courseshave been normalized to the maximum absolute value of the true activity.The use of temporal priors leads to smoother estimated time courses.


ARTICLE IN PRESS


in time and space. This activity is well described by a single dipolelocated in brainstem and reflecting a single burst of neuronalactivity at, e.g., t=20 ms post-stimulus. More complicated tasks,such as oddball paradigms, elicit spatially distributed responsesand more complicated dynamics that can appear in the ERP asdamped sinusoidal responses. In this paper we have taken the viewthat by explicitly modelling these dynamics one can obtain bettersource reconstructions.

Compared to previous spatio-temporal models (Baillet andGarnero, 1997; Schmidt et al., 2000; Galka et al., 2004; Yamashitaet al., 2004; Daunizeau et al., 2005), our algorithm perhapsembodies stronger dynamic constraints. But the computationalsimplicity of fitting GLMs, allied to the efficiency of variationalinference, results in a relatively fast algorithm. Also, the GLM canaccommodate damped sinusoidal and wavelet approaches that areideal for modelling transient and nonstationary responses.

The dynamic constraints implicit in our model help toregularize the solution. Indeed, with M sensors, G sources, T timepoints and K temporal, if KbMT/G the inverse problem is nolonger underdetermined. In practice, however, spatial regulariza-tion will still be required to improve estimation accuracy.

The method proposed in the present paper embodies well-knownphenomenological descriptions of evoked responses. A similarmethod has recently been proposed in Friston et al. (2006), but theapproaches differ in a number of respects. First, in Friston et al.(2006), scalp dataYare (effectively) projected onto a temporal basis












Fig. 12. Qualitative comparison between VB-GLM andMV-BFmethods. The figure shows the true PCD spatial distribution for the single source (upper row) andthe multiple correlated sources (lower row) at t=200 ms, as well as the corresponding VB-GLM and MV-BF source reconstructions. All maps have beennormalized to their respective maximum absolute values.


ARTICLE IN PRESS

setX and source reconstructions aremade in this reduced space. Thisresults in a computationally efficient procedure based on restrictedmaximum likelihood (ReML), but one in which the between-trialvariance is not taken into account. This will result in inferencesabout W and J which are overconfident. If one is interested inpopulation inferences based on summary statistics (i.e., Ŵ) from agroup of subjects, then this does not matter. If, however, one wishesto make within-subject inferences, then VB-GLM is the preferredapproach. Second, in Friston et al. (2006), the model has beenaugmented to account for trial-specific responses. This treats eachtrial as a “random effect” and provides a method for makinginferences about induced responses. The algorithm described in thispaper, however, is restricted to treating trials as fixed effects. Thismirrors standard first-level analyses of fMRI in which multiple trialsare treated by forming concatenated data and design matrices.

Acknowledgments

Will Penny is supported by the Wellcome Trust and Nelson J.Trujillo-Barreto is supported by a Science Link programme fromthe British Council. The authors would also like to thank RikHenson for providing the EEG data, and Karl Friston fordiscussing similarities and differences between the algorithm inthis paper and the approach described in Friston et al. (2006).

Appendix A

Implementation details

A practical difficulty with the update equations for the PCDis that the covariance matrix ΣJ is of dimension G×G. Even


low-resolution source grids typically contain GN1000 elements.This therefore presents a problem. A solution is found, however,with use of a singular value decomposition (SVD). First, wedefine a modified lead-field matrix K=Ω1/2KΛ−1/2

and computeits SVD

K ¼ USVT ¼ UV ðA1Þ

where V is an M×G matrix, the same dimension as the leadfield K. It can then be shown using the matrix inversion lemma(Golub and Van Loan, 1996) that

ΣJ ¼ Λ�1=2ðIG � PÞ Λ�1=2

P ¼ VT ðIM þ SST Þ�1V ðA2Þ

which is simple to implement computationally, as it onlyrequires inversion and square root of diagonal matrices.

Source estimates can be computed as shown in Eq. (20). Inprinciple, this means the estimated sources over all time points andsource locations are given by

J ¼ ΣJKT VYþ ΣJΛW

TXT ðA3Þ

In practice, however, it is inefficient to work with such a largematrix during estimation. We therefore do not implement Eqs. (19)and (20) but, instead, work in the reduced space Ĵx=ĴX which are




Fig. 14. Description of the face processing experiment. The experiment involved randomized presentation of 86 faces and 86 scrambled faces. Half of the faceswere familiar and half unfamiliar, creating three event-types (conditions) in total. The subject had to judge the left–right symmetry of each stimulus (face andscrambled) around an imaginary vertical line through the centre of the image. Faces were presented for 600 ms, every 3600 ms.

Fig. 13. Quantitative comparison between VB-GLM and MV-BF methods. The figure shows the ROC and DL-ROC curves corresponding to VB-GLM andMV-BF methods for the single-source (upper row) and the multiple correlated sources (lower row) at t=200 ms.


ARTICLE IN PRESS

Please cite this article as: Trujillo-Barreto, N.J., et al., Bayesian M/EEG source reconstruction with spatio-temporal priors, NeuroImage (2007),doi:10.1016/j.neuroimage.2007.07.062


Fig. 15. Face processing ERP. Left panel shows the topographic map of the ERP data at t=170 ms (N170). At this latency, the difference of faces–scrambledfaces is maximum. The time courses for the two conditions at electrode C1 are shown in the right panel.

Fig. 16. Definition of the wavelet temporal model used for source reconstruction of the face processing data set. Upper panels: Fitting of the data first eigenvectorfor conditions faces and scrambled faces, respectively, with K=23 and K=30 Battle-Lemarie wavelets as calculated by wavelet shrinkage. Lower panels:Images of the design matrices (for a single trial) for the two conditions.


ARTICLE IN PRESS

Please cite this article as: Trujillo-Barreto, N.J., et al., Bayesian M/EEG source reconstruction with spatio-temporal priors, NeuroImage (2007),doi:10.1016/j.neuroimage.2007.07.062


Fig. 17. Left panels show the VB-GLM source estimates for the two conditions at t=170 ms, averaged across trials. The two images have been normalized withrespect to the maximum activity in the faces condition. The right panel shows the normalized average of the source time courses for the generator with maximumactivity in the two conditions.


ARTICLE IN PRESS

the sources projected onto the design matrix. These projectedsource estimates are given by

JX ¼ JX

¼ ΣJKT VYXþ ΣJ ΛWTXTX

¼ AKXYXþAΛWXTX

ðA4Þ

where YX and XTX can be pre-computed and the intermediatequantities are given by

AKX ¼ ΣJKT V

¼ ð Λ�1KT � P Λ

�1=2KT ÞV

AΛW ¼ ΣJ ΛWT

¼ ðWT � Λ�1=2

P Λ1=2

WT Þ

ðA5Þ

Because these matrices are only of dimension G×M and G×K,respectively, ĴX can be efficiently computed. The term XTĵg · in Eq.(24) is then given by the gth row of ĴX.

The intermediate quantities can also be used to compute modelpredictions as

Y ¼ K J

¼ KAKXYþKAΛWXTðA6Þ


The entry (m,t) in Ŷ then corresponds to the kmT·ĵ·t term in Eq.

(27). Other computational savings are as follows. For Eq. (27), weuse the result

kTmd ΣJd tkmd ¼ 1

rm

XMmV¼1

s2mVmVu2mmV

ðs2mVmVþ 1Þ ðA7Þ

where sij and uij are the (i,j)th entries in S and U, respectively. ForEq. (26) we use the result

fΣJggg ¼1kg

1� kgXMg V¼1

s2g Vg Vv2gg V

s2g Vg Vþ 1

!ðA8Þ

where vij is the (i,j)th entry in V.

Distance based localization receiver operating characteristic(DL-ROC)

Consider a continuous image I within which a source S isrequired to be localized and let D be the detection region definedby the classifier. In our case, the classifier was defined by declaringa generator to be active (included in the source) if the estimatedPCD at the given generator exceeded a specified threshold. Theregion D then defines the labelling of each generator X by theclassifier as included (X \∈D) or not included (X∈ I\D) in thesource.



Fig. 18. Overall effect of faces. The upper panel shows the normalized differences of faces minus scrambled faces at t=170 ms. In the lower panels, positive andnegative differences have been normalized to the maximum positive and negative activities, respectively. The images show the sources that survive a 30% (lowerleft) and an 80% (lower right) threshold.


ARTICLE IN PRESS

Now denote by A\B, A∪B and A∩B the difference, union andintersection operations on sets A, B of generators in I. The setsD\S, S\D, D∩S and (I∩D)∪ (I∩S) are formed by the generatorswith false positive (FP), false negative (FN), true positive (TP) andtrue negative (TN) classifications, respectively. Thus, the regionsD\S and S\D contain all the incorrectly classified generators.

Let D2(X,A) be the distance of a generator X from a (non-empty) region A defined as:

D2ðX ;AÞ ¼ infYaA

d2ðX ; Y Þ ðA9Þ

where d2 is the geodesic distance between generators, and “inf”denotes the greatest lower bound of a set of numbers, or infimum.And let dI, be the diameter of I, i.e., dI= sup d2(X,Y), where “sup”denotes the least upper bound, or supremum, with respect to all thepixels X and Y of I. We can then calculate the re-scaled distances d(X,Y)=d2(X,Y)/dI and d(X,A)=d2(X,A)/dI, for any generators X, Yand any region A. Based on this, supremum measures of falsepositive (FPLE) and false negative localization error (FNLE), forspecified regions D and S are then defined as:

FPLEðD; SÞ ¼ supXaD

ðX ; SÞ

FNLEðD; SÞ ¼ supXaS

ðX ;DÞ ðA10Þ

Average and integral measures of FPLE and FNLE can also bedefined (Biscay-Lirio et al., 1992).

For a given source S, the detection region D and therefore themeasures FPLE(D,S) and FNLE(D,S) depend on the classifier’s


decision threshold C, which determines the level of certainty usedby the classifier to consider a generator X as belonging to thesource (i.e., X∈D). Then, by analogy with conventional ROCmethodology, the variation of the measures of localization errorover the range of the classifier’s decision thresholds can bedescribed with the curve (FPLE(C),1−FNLE(C)). This is calledthe distance-based localization receiver operating characteristic(DL-ROC).

References

Auranen, T., Nummenmaa, A., Hammalainen, M., Jaaskelainen, I.,Lampinen, J., Vehtari, A., Sams, M., 2005. Bayesian analysis of theneuromagnetic inverse problem with lp norm priors. NeuroImage 26 (3),870–884.

Baillet, S., Garnero, L., 1997. A Bayesian approach to introducinganatomofunctional priors in the EEG/MEG inverse problem. IEEETrans. Biomed. Eng. 374–385.

Baillet, S., Mosher, J.C., Leahy, R.M., November 2001. Electromagneticbrain mapping. IEEE Signal Process. Mag. 14–30.

Beal, M., 2003. Variational algorithms for approximate Bayesian inference.PhD thesis, Gatsby Computational Neuroscience Unit, UniversityCollege London.

Biscay-Lirio, R., Galán-García, L., Valdés-Sosa, P., Virués-Alba, T., Neira-Blaquier, L., Rojas-Vigoa, J., 1992. Localization error in biomedicalimaging. Comput. Biol. Med. 22 (4), 277–286.

Bosch-Bayard, J., Valde´s-Sosa, P., Virue´s-Alba, E., Aubert-Va´zquez, E.,John, R., Harmony, T., Riera-Díaz, J., Trujillo-Barreto, N., 2001. 3Dstatistical parametric mapping of variable resolution electromagnetictomography (VARETA). Clin. Electroencephalogr. 32 (2), 47–66.

Brookes, M., Gibson, A., Hall, S., Furlong, P., Barnes, G., Hillebrand, A.,Singh, K., Halliday, I., Francis, S., Morris, P., 2004. A general linearmodel for MEG beamformer imaging. NeuroImage 23 (3), 936–946.























ARTICLE IN PRESS

Buzsaki, G., Draguhn, A., 2004. Neuronal oscillations in cortical networks.Science 304, 1926–1929.

Clyde, M., Parmigiani, G., Vidakovic, B., 1998. Multiple shrinkage andsubset selection in wavelets. Biometrika 85, 391–402.

Collins, D.L., Zijdenbos, A.P., Kollokian, V., Sled, J.G., Kabani, N.J.,Holmes, Colin J., Evans, Alan C., 1998. Design and construction of arealistic digital brain phantom. IEEE Trans. Med. Imag. 17 (3), 463–468.

Cover, T.M., Thomas, J.A., 1991. Elements of Information Theory. JohnWiley, p. 22.

Darvas, F., Pantazis, D., Kucukaltun Yildirim, E., Leahy, R., 2004. Mappinghuman brain function with MEG and EEG: methods and validation.NeuroImage 25, 383–394.

Daunizeau, J., Mattout, J., Clonda, D., Goulard, B., Benali, H., Lina, J.M.,2005. Bayesian spatio-temporal approach for EEG source reconstruc-tion: conciliating ECD and distributed models. IEEE Trans. Biomed.Eng. 53, 503–516.

David, O., Friston, K.J., 2003. A neural mass model for MEG/EEG:coupling and neuronal dynamics. NeuroImage 20 (3), 1743–1755.

Demiralp, T., Ademoglu, A., Istefanopoulos, Y., Gulcur, H.O., 1998.Analysis of event-related potentials (ERP) by damped sinusoids. Biol.Cybern. 78, 487–493.

Donoho, D.L., Johnstone, I.M., 1994. Ideal spatial adaptation by waveletshrinkage. Biometrika 81, 425–455.

Durka, P.J., Martínez-Montes, E., Valdés-Sosa, P., Blinowska, J., 2005.Multichannel matching pursuit and EEG inverse solutions. J. Neurosci.Methods 148, 49–59.

Frackowiak, R.S.J., Friston, K.J., Frith, C., Dolan, R., Price, C.J., Zeki, S.,Ashburner, J., Penny, W.D., 2003. Human Brain Function, 2nd edition.Academic Press.

Friston, K.J., Penny, W.D., 2003. Posterior probability maps and SPMs.NeuroImage 19 (3), 1240–1249.

Friston, K., Henson, R., Phillips, C., Mattout, J., 2006. Bayesian estimationof evoked and induced responses. Hum. Brain Mapp. 27, 722–735.

Friston, K.J., Mattout, J., Trujillo-Barreto, N.J., Ashburner, J., Penny, W.,2007. Variational free energy and the Laplace approximation. Neuro-Image 34 (1), 220–234.

Fuchs, M., Wagner, M., Kohler, T., Wischman, H.A., 1999. Linear andnonlinear current density reconstructions. J. Clin. Neurophysiol. 16 (3),267–295.

Galka, A., Yamashita, O., Ozaki, T., Biscay, R., Valdés-Sosa, P., 2004. Asolution to the dynamical inverse problem of EEG generation usingspatiotemporal Kalman filtering. NeuroImage 23 (2), 435–453.

Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 1995. Bayesian DataAnalysis. Chapman and Hall, Boca Raton.

Gobbini, M.I., Haxby, J.V., 2007. Neural systems for recognition of familiarfaces. Neuropsychologia 45 (1), 32–41.

Golub, G.H., Van Loan, C.F., 1996. Matrix Computations, 3rd edition. JohnHopkins Univ. Press.

Haxby, J.V., Hoffman, E.A., Gobbini, M.I., 2002. Human neural systems forface recognition and social communication. Biol. Psychiatry 51, 59–67.

Henson, R.N.A., Goshen-Gottstein, Y., Ganel, T., Otten, L.J., Quayle, A.,Rugg, M.D., 2003. Electrophysiological and hemodynamic correlates offace perception, recognition and priming. Cereb. Cortex 13, 793–805.

Huiskamp, G., 1991. Difference formulas for the surface Laplacian on atriangulated surface. J. Comput. Phys. 95, 477–496.

Kiebel, S.J., Friston, K.J., 2004. Statistical parametric mapping for event-related potentials: II. A hierarchical temporal model. NeuroImage 22 (2),503–520.


Lappalainen, H., Miskin, J.W., 2000. Ensemble learning. In: Girolami, M.(Ed.), Advances in Independent Component Analysis. Springer-Verlag.

Mattout, J., Phillips, C., Penny, W.D., Rugg, M., Friston, K.J., 2006. MEGsource localisation under multiple constraints: an extended Bayesianframework. NeuroImage 30 (3), 753.

Pascual-Marqui, R., 2002. Standardised low resolution electromagnetictomography (sLORETA): technical details. Methods Find. Exp. Clin.Pharmacol. 24, 5–12.

Pascual-Marqui, R.D., Michel, C.M., Lehman, D., 1994. Low resolutionelectromagnetic tomography: a new method for localizing electricalactivity of the brain. Int. J. Psychophysiol. 18, 49–65.

Flandin, G., Penny, W.D., 2007. Bayesian fMRI data analysis with sparsespatial basis function priors. NeuroImage 34 (3), 1108–1125.

Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J., 2005. Bayesian fMRI timeseries analysis with spatial priors. NeuroImage 24 (2), 350–362.

Robinson, S., Vrba, J., 1999. Functional neuroimaging by synthetic aperturemagnetometry (SAM). Recent Advances in Biomagnetism. TohokuUniv. Press, Sendai, Japan.

Rugg, M.D., Coles, M.G.H., 1995. Electrophysiology of Mind: Event-related Potentials and Cognition. Oxford Univ. Press.

Rush, S., Driscoll, D., 1969. EEG electrode sensitivity—an application ofreciprocity. IEEE Trans. Biomed. Eng. 16 (1), 15–22.

Sahani, M., Nagarajan, S.S., 2004. Reconstructing MEG sources withunknown correlations. In: Saul, L., Thrun, S., Schoelkopf, B. (Eds.),Advances in Neural Information Processing Systems, vol. 16. MIT,Cambridge, MA.

Schmidt, D.M., George, J.S., Wood, C.C., 1999. Bayesian inference appliedto the electromagnetic inverse problem. Hum. Brain Mapp. 7, 195–212.

Schmidt, D.M., Ranken, D.M., George, J.S., Wood, C.C., 2000. Spatial–temporal Bayesian inference for MEG/EEG. 12th International Con-ference on Biomagnetism, Helsinki, Finland, August.

Sekihara, K., Nagarajan, S.S., Poeppel, D., Marantz, A., Miyashita, Y., 2001.Reconstructing spatio-temporal activities of neural sources using anMEG vector beamformer technique IEEE Trans. Biomed. Eng. 48,760–771.

Tallon-Baudry, C., Bertrand, O., Delpuech, C., Pernier, J., 1996. Stimulusspecificity of phase-locked and non phase-locked 40 Hz visual responsesin human. J. Neurosci. 16 (13), 4240–4249.

Trejo, L., Shensa, M.J., 1999. Feature extraction of event-related potentialsusing wavelets: an application to human performance monitoring. BrainLang. 66, 89–107.

Trujillo-Barreto, N.J., Aubert-Vázquez, E., Valdés-Sosa, P.A., 2004.“Bayesian model averaging in EEG/MEG imaging”. NeuroImage 21(4), 1300–1319.

Unser, M., Aldroubi, A., 1996. A review of wavelets in biomedicalapplications. Proc. IEEE 84, 626–638.

Valdés-Sosa, P., Marti, F., Garcia, F., Casanova, R., 2000. Variableresolution electric–magnetic tomography. In: Aine, C.J., Okada, Y.,Stroink, G., Swithenby, S.J., Wood, C.C. (Eds.), Biomag 96′:Proceedings of the Tenth International Conference on Biomagnetism,vol. II. Springer-Verlag, New York, pp. 373–376.

Van Veen, B.D., van Drongelen, W., Yuchtman, M., Suzuki, A., 1997.Localization of brain electrical activity via linearly constrainedminimum variance spatial filtering. IEEE Trans. Biomed. Eng. 44,867–880.

Yamashita, O., Galka, A., Ozaki, T., Biscay, R., Valdés-Sosa, P., 2004.Recursive penalised least squares solution for dynamical inverseproblems of EEG generation. Hum. Brain Mapp. (21), 221–235.


























































































https://www.researchgate.net/publication/224773133_Elements_of_Information_Theory_Wiley?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==

https://www.researchgate.net/publication/224773133_Elements_of_Information_Theory_Wiley?el=1_x_8&enrichId=rgreq-d00d91c2-efe3-43ad-b0c9-0005a7212c69&enrichSource=Y292ZXJQYWdlOzU5Mzk4MzE7QVM6OTk5MjU4NTc2NjkxMjVAMTQwMDgzNTU5MDQwNA==



Bayesian M/EEG source reconstruction with spatio-temporal priors

Documents