Gaussian process linking functions for mind, brain, and ...

COLL

OQ

UIU

MPA

PER

PSYC

HO

LOG

ICA

LA

ND

COG

NIT

IVE

SCIE

NCE

SN

EURO

SCIE

NCE

Gaussian process linking functions for mind, brain,and behaviorGiwon Bahga, Daniel G. Evansa , Matthew Galdoa , and Brandon M. Turnera,1

aDepartment of Psychology, The Ohio State University, Columbus, OH 43210

Edited by Nikolaus Kriegeskorte, Columbia University, New York, NY, and accepted by Editorial Board Member Dale Purves June 17, 2020 (received forreview November 5, 2019)

The link between mind, brain, and behavior has mystified philoso-phers and scientists for millennia. Recent progress has been madeby forming statistical associations between manifest variablesof the brain (e.g., electroencephalogram [EEG], functional MRI[fMRI]) and manifest variables of behavior (e.g., response times,accuracy) through hierarchical latent variable models. Within thisframework, one can make inferences about the mind in a statisti-cally principled way, such that complex patterns of brain–behaviorassociations drive the inference procedure. However, previousapproaches were limited in the flexibility of the linking func-tion, which has proved prohibitive for understanding the complexdynamics exhibited by the brain. In this article, we propose adata-driven, nonparametric approach that allows complex linkingfunctions to emerge from fitting a hierarchical latent representa-tion of the mind to multivariate, multimodal data. Furthermore, toenforce biological plausibility, we impose both spatial and tem-poral structure so that the types of realizable system dynamicsare constrained. To illustrate the benefits of our approach, weinvestigate the model’s performance in a simulation study andapply it to experimental data. In the simulation study, we ver-ify that the model can be accurately fitted to simulated data,and latent dynamics can be well recovered. In an experimen-tal application, we simultaneously fit the model to fMRI andbehavioral data from a continuous motion tracking task. Weshow that the model accurately recovers both neural and behav-ioral data and reveals interesting latent cognitive dynamics, thetopology of which can be contrasted with several aspects of theexperiment.

model-based cognitive neuroscience | joint modeling | Gaussianprocess | dimensionality reduction

In the present technological state, the mind remains a latentconstruct that cannot be directly observed. However, clever

experimental designs as well as technological advancements haveenabled the collection of many important manifest variablessuch as response time, the blood oxygenated level-dependent(BOLD) response in functional MRI (fMRI), and the elec-troencephalogram (EEG). Each of these measures ostensiblyreflects signatures of mental operations, potentially revealinginsight into latent cognitive dynamics. Despite these innova-tions, the key to understanding what the mind is and ultimatelyhow it produces complex patterns of thought and decisions liesin our ability to explain the structure between manifest vari-ables of the brain and manifest variables of behavior. The setof possible links connecting these variables is known as linkingpropositions (1–4).

Linking propositions have been a productive route forwardbecause they facilitate quantitative assessments of the contri-bution of physiological variables to psychological processes. Forexample, Teller (2) devised a set of linking propositions by speci-fying logical relations among physiological variables and psycho-logical states. Teller’s families of linking propositions were speci-fied to be axiomatic in the sense that they relied on strict equalitystatements, which are impossible to satisfy in the context ofmeasurement noise (3). Recognizing these practical limitations,Schall (3) developed the concept of statistical linking functions,

where probability distributions could replace axiomatic state-ments to accommodate the uncertainty associated with manifestvariables. Finally, Forstmann et al. (5, 6) concretized statisti-cal linking functions by performing null hypothesis tests amongpatterns of neural data and the parameters of computationalmodels. Later, Forstmann et al. (7) further articulated the con-cept of reciprocity in linking functions, where latent representa-tions of the mind (i.e., as instantiated by computational models)could inform the analysis of manifest variables of brain andbehavior.

There now exist many techniques for imposing linking func-tions, varying along several dimensions such as the manner inwhich they impose directionality (e.g., neural data constraina computational model, a computational model guides analy-sis of neural data, or equal reciprocity between all variables),the manner in which the mind is represented (e.g., a latentrepresentation or set of transformation equations), and thecomplexity of their implementation (8). One promising frame-work that exploits the aforementioned concepts of statisticallinking and reciprocity is the joint modeling framework (9)shown in Fig. 1. Constraints about the manifest variables, suchas structural connectivity or experimental design, are used tospecify the structure of a generative model. In turn, the gener-ative model is specified to simultaneously describe all availablemanifest variables, regardless of their modality. The mannerin which the latent states are connected to the manifest vari-ables forms the linking function. By fitting the model to datausing hierarchical Bayesian methods (10), one can infer themost plausible set of linking functions that connect the manifestvariables.

Although joint models have proved effective in linking subject-specific (e.g., structural connectivity; ref. 9), modality-specific(e.g., combining fMRI and EEG; ref. 11), and trial-specific (e.g.,single-trial BOLD response; refs. 12–14) information, so far alljoint modeling applications have imposed a linearity assumptionamong the latent variables. While linearity may be a reasonable

This paper results from the Arthur M. Sackler Colloquium of the National Academyof Sciences, “Brain Produces Mind by Modeling,” held May 1–3, 2019, at the Arnoldand Mabel Beckman Center of the National Academies of Sciences and Engineering inIrvine, CA. NAS colloquia began in 1991 and have been published in PNAS since 1995.From February 2001 through May 2019, colloquia were supported by a generous giftfrom The Dame Jillian and Dr. Arthur M. Sackler Foundation for the Arts, Sciences, &Humanities, in memory of Dame Sackler’s husband, Arthur M. Sackler. The complete pro-gram and video recordings of most presentations are available on the NAS website athttp://www.nasonline.org/brain-produces-mind-by.y

Author contributions: G.B. and B.M.T. designed research; D.G.E. collected data; G.B. andB.M.T. contributed new reagents/analytic tools; G.B., D.G.E., and M.G. analyzed data; andG.B., D.G.E., M.G., and B.M.T. wrote the paper.y

The authors declare no competing interest.y

This article is a PNAS Direct Submission. N.K. is a guest editor invited by the EditorialBoard.y

Published under the PNAS license.y1 To whom correspondence may be addressed. Email: [email protected]

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1912342117/-/DCSupplemental.y

www.pnas.org/cgi/doi/10.1073/pnas.1912342117 PNAS Latest Articles | 1 of 9

Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1

http://orcid.org/0000-0003-3809-157X

http://orcid.org/0000-0002-1279-3859

http://orcid.org/0000-0003-0966-9301

http://www.nasonline.org/brain-produces-mind-by

https://www.pnas.org/site/aboutpnas/licenses.xhtml

mailto:[email protected]

https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1912342117/-/DCSupplemental


https://www.pnas.org/cgi/doi/10.1073/pnas.1912342117

http://crossmark.crossref.org/dialog/?doi=10.1073/pnas.1912342117&domain=pdf&date_stamp=2020-11-20

Fig. 1. Framework for connecting mind, brain, and behavior. Shown is a schematic of the joint modeling framework, where experimental information andstructural information specify the structure of a generative model of brain function. The generative model is used to jointly explain all available manifestvariables such as response times, blood oxygenated level-dependent response, or electroencephalogram activity. In the present article, a latent Gaussianprocess is used to link the manifest variables, enabling the most plausible linking function to emerge from fitting the model to data.

assumption in some cases, because the brain is highly dynamic(15), this assumption is surely violated in many realistic settings.In this article, we expand the general structure used in joint mod-eling with Gaussian processes to eliminate the specification ofa particular parametric functional form of the linking function.This specification enables potentially complex, nonlinear linkingfunctions to emerge directly from data.

Gaussian Process as a Linking FunctionTo allow for flexibility in the linking function within the jointmodeling approach, we use a Gaussian process (GP) to modelthe latent variables that represent the cognitive states underlyingobserved neural and behavioral data. In so doing, this Gaussianprocess joint modeling (GPJM) framework can be viewed as atemporal and nonparametric extension of the covariance-basedlinking function (8, 14). Gaussian processes have proved effec-tive on a variety of statistical and machine-learning problems,including problems in fMRI analyses (16, 17). In this section, weprovide a brief conceptual introduction to Gaussian processesand the GPJM.

Gaussian Process Regression. Our goal is to find a function thatbest describes the data assuming Gaussian noise, y = f(x) + ε,where ε∼N (0,σ2

ε ) is a normal distribution with mean 0 andcovariance σ2

ε . Typical linear regression methods model the func-tion f as a sum of prespecified bases X weighted by regressioncoefficients β; e.g., f =Xβ.

On the contrary, Gaussian process regression directly modelsthe function f as a sample from a distribution over functions.Specifically, the distribution over functions is assumed to bea multivariate Gaussian distribution with mean m and covari-ance matrix k , both defined as a function of input vector x

(18, 19). Given an input vector x = (x1, . . . , xk )′, a function fis modeled as a Gaussian process by f(x)∼GP (m(x), k(x, x′)),where

k(x, x′) =E[{f(x)−m(x)}

{f(x′)−m(x′)

}]. [1]

Here the covariance matrix k(x, x′) is called the kernel orcovariance function.

In a Gaussian process with noisy observations assumingGaussian-distributed error, mean predictions about unobservedinput X ∗, denoted f∗, are modeled using a joint multivariate nor-mal distribution prior with the kernel K , the observed input X ,and output y (18, 19):[

yf∗

]∼N

(0,

[K (X ,X ) +σ2

ε I K (X ,X ∗)K (X ∗,X ) K (X ∗,X ∗)

]). [2]

The predictive mean f∗ and predictive covariance matrix Σ∗ areanalytically derived as

f∗=K (X ∗,X )[K (X ,X ) +σ2

ε I]−1y [3]

Σ∗=K (X ∗,X ∗)

−K (X ∗,X )[K (X ,X ) +σ2

ε I]−1K (X ,X ∗), [4]

where K (M ,N ) = [k(mi ,nj )]ij is the kernel whose elements aredetermined by the i th row of matrix M (denoted mi) and thej th row of matrix N (denoted nj ). This fact suggests that appro-priate choice of kernels is important for the performance ofthe model.

2 of 9 | www.pnas.org/cgi/doi/10.1073/pnas.1912342117 Bahg et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1


COLL

OQ

UIU

MPA

PER

PSYC

HO

LOG

ICA

LA

ND

COG

NIT

IVE

SCIE

NCE

SN

EURO

SCIE

NCE

A

B

Fig. 2. The structure of Gaussian process joint models. (A) The generative mechanism of the Gaussian process joint models. Given the time vector t, thelinking function describing latent cognitive states is modeled by Gaussian processes with a temporal kernel as a function of t (top right). The linkingfunction, denoted X, generates a cognitive state kernel K(X, X) and is introduced as the covariance function of Gaussian processes modeling neural andbehavioral data (bottom left). Depending on the nature of neural data, K(X, X) can be applied after transformation (e.g., convolution with a hemodynamicresponse function for fMRI data). In this study, we use automatic relevance determination (ARD) (20) so that each dimension of the latent cognitive statecan contribute to the generation of data with different weights of influence. Here, kernel parameters capture sensitivity of data to each latent cognitivedimension using dimension-wise length-scale parameters (bottom right). (B) Details of the ARD-based feature selection mechanism. A small length-scaleparameter means that the covariance between two input points is more penalized by their distance and therefore differences in the input values areinterpreted as more important (bottom). Meanwhile, a greater length-scale parameter can make even distant input points covary and therefore minordifferences in the input are treated as negligible (top).

Some popular kernels are Matern class, radial basis func-tion (RBF), and rational quadratic (RQ) kernels. Each kernelgenerates different types of functions based on their func-tional form. For example, Matern 1/2 kernels generate nondif-ferentiable functions, whereas RBF kernels generate infinitelydifferentiable, smooth functions. Hyperparameters (which typ-ically include a scaling factor often called “variance”) ofthe kernel also affect the characteristic evaluating covarianceacross input points.† In particular, all of the kernels intro-duced above have a hyperparameter called “length scale” thatscales the distance between two points. This hyperparameterplays an important role in determining the influence or “rel-evance” of each latent dimension in generating data, which isdiscussed later.

Gaussian Process Joint Models. GPJMs inherit the motivationof Bayesian joint modeling approaches accounting for brain–behavior reciprocity (9, 11–14, 21): imposing common statis-tical constraints for both neural and behavioral data when

†To avoid confusion between the kernel variance and the variance of the noisyobservations, we refer to the latter as “observation noise” and the former as“variance.”

informing theoretical and mechanistic explanations of cogni-tive processes. However, instead of summarizing the relationshipbetween neural and behavioral measures by collapsing acrosstemporal information, the GPJM framework aims to learn latenttemporal dynamics that simultaneously govern both neural andbehavioral data. The GPJM framework is based on Gaussianprocess latent variable models (GPLVMs) and their hierarchi-cal extensions (22–25), which pursue commonly shared latentrepresentations across simultaneous observations from multiplemeasurement sources. GPLVMs have been successfully used inmany machine-learning applications (26–31).

Joint models of neural and behavioral data consist of threecomponents: a neural submodel, a behavioral submodel, and alinking function. As illustrated in Fig. 2, in GPJM a latent vari-able X with a GP prior enables a linking function that constrainsthe generative process for neural and behavioral data, X ∼N

(0, kT (t, t′) +σ2

T I), where 0 is a column vector with all zeros,

kT (t, t′) is a kernel defined by a time vector t = (t1, . . . , tT )′ oflength T , and σ2

T is set as a noise parameter. For example, anRBF kernel is defined as

kT (t, t′) =

[s2t exp

(− (ti − tj )

2

2l2t

)]ij

, [5]

Bahg et al. PNAS Latest Articles | 3 of 9

Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1

where i , j ∈{1, 2, . . . ,T}, and st and lt are the hyperparame-ters of the RBF kernel (variance and length scale, respectively).Ideally, the latent cognitive dynamics could be considered noise-free. Here we decided to give a minimal constraint such thatσ2T = 10−6.Given the latent cognitive dynamics X , a neural submodel

describes the relationship between the latent dynamics at thehyperlevel and the neural data. Specifically, a neural submodelof the GPJM assumes that the neural data are described by aGaussian process with an RBF kernel KN (X ,X ),

YN ∼N(0,KN (X ,X ) +σ2

N I), and

KN (X ,X ) =

[s2N exp

(− (xi − xj )

2

2l2N

)]ij

, [6]

where xk (k ∈{1, . . . ,T}) indicates the k th row of X , and sNand lN are the hyperparameters of the RBF kernel (variance andlength scale, respectively). Note that the temporal dynamics ker-nel KN (X ,X ) requires additional adjustments depending on thenature of the neural data (GPJM and Technical Issues in Brain–Behavior Modeling). To capture the residual noise in YN , a noiseparameter σ2

N scales an identity matrix I .A behavioral submodel provides a formal description of how

cognitive dynamics produce behavioral responses. In typical jointmodeling applications, cognitive models are used to describe thecomputations underlying decision processes. However, for thisparticular study we chose to remain agnostic about the com-putations that underlie the behavioral data generating processand instead use a Gaussian process as a statistical model todescribe the continuous joystick trajectories, derived from thecognitive dynamics. For our purposes, the behavioral submodelincorporates the temporal dynamics using a Matern 1/2 kernel,

YB ∼N(0,KB (X ,X ) +σ2

B I), and

KB (X ,X ) =

[s2B exp

(−‖xi − xj‖

lB

)]ij

, [7]

where ‖a‖means the norm or length of the vector a , and sB andlB are the hyperparameters of the Matern 1/2 kernel (varianceand length scale, respectively). As in the neural data, residualerror in YB is accounted for by the noise parameter σ2

B .In Eqs. 6 and 7, length-scale parameters lN and lB scale

the distance between pairs of input points and control theircovariance. Fig. 2B illustrates the role of length-scale parame-ters: When the length-scale parameter is large, the covariancebetween two input points is weakly penalized by the distancebetween them. Hence, large length-scale parameters allow dis-tant pairs of points to covary and make sampled functions rela-tively constant across input values (Fig. 2B). By contrast, whenthe length-scale parameter is small, only nearby input points cancovary whereas distal points are uncorrelated. Automatic rele-vance determination (ARD) (20) extends this idea to each latentinput by assigning one length-scale parameter per dimension.By applying ARD to the kernels of the behavioral and neu-ral submodels, the GPJM can learn how much each dimensioncontributes to the data generation process, in other words, therelevance of the latent input in the context of the data (Fig. 2A,bottom right).

GPJM and Technical Issues in Brain–Behavior Modeling. One of themajor issues in joint modeling approaches is the mismatch of thetask-related timeline between neural and behavioral data. Thisproblem is more salient when applying the model to data fromfMRI experiments due to the hemodynamic lag between the neu-ral response and the experimental event. Hemodynamic activi-ties are characterized by a delayed increase in response, attaining

a maximum value approximately 5–6 s after an event (e.g., stim-ulus presentation), a slow decrease in the neural response, andfinally a drop in the neural response below the baseline, followedby a recovery to a baseline state. Because of this temporal pro-file, the hemodynamic lag obscures precise temporal profiles ofneural activity.

The interaction between underlying neural activities andhemodynamics in the brain is typically modeled by convolv-ing neural activations with a hemodynamic response func-tion (HRF), which facilitates general linear model analysesof fMRI data (32). Early joint modeling approaches havetaken advantage of this assumption and used event-wise neu-ral activation estimates (33) for trial-level summaries of neuralactivity (12, 13).

In the Gaussian process framework, one way to enforce tem-porally lagged, convolved neural signals is to convolve the GPkernel with a secondary function—an HRF in our case. Thisapproach is justified from the fact that convolution of a Gaus-sian process with another function is another Gaussian process(34–36). In the GPJM, we use a kernel convolved with an HRFas a key component of the neural submodel. Here, the latent cog-nitive dynamics are expected to be relatively well aligned to thereal-time neural activity because the resulting kernel instantiateshemodynamic lag.

To instantiate hemodynamic lag, we use the canonical double-gamma HRF,

h(t) =ta1−1ba11 exp (−b1t)

Γ(a1)− c

ta2−1ba22 exp (−b2t)Γ(a2)

, [8]

where t represents time, and Γ(x ) = (x − 1)! is the gammafunction. Although estimating shape parameters of the HRF istheoretically possible, here we rely on the canonical shape byfixing the shape parameters to values based on previous workby Glover (37) and commonly used in brain analysis softwarepackages: a1 = 6, a2 = 16, b1 = b2 = 1, c = 1/6.

Another methodological issue in joint modeling approaches isapplying plausible spatiotemporal constraints on the neural data.For computational convenience, previous joint modeling appli-cations have assumed statistical independence between regionsof interest (ROIs) (e.g., refs. 11 and 12), which has clear lim-itations given the structural and functional connectivity of thebrain. An ideal statistical representation would incorporate bothspatial (e.g., nearby and well-connected ROIs should corre-late) and temporal (e.g., activity at one time point in one ROIshould affect activity at later time points in other, “downstream”ROIs) dependencies among ROIs. To instantiate these con-straints, we separately define kernels for space kS (s, s′) and timekT (t, t′) and combine them using the Kronecker product (e.g.,refs. 38–40) such that

vec(YN )∼N(0, kS (s, s′)⊗ kT (t, t′)

), [9]

where YN is a (M ×V ) matrix with M measurements and Vvoxels (or ROIs), vec(M ) represents a vectorization operatorstacking columns of a matrix M as a single column vector, and⊗ is the Kronecker-product operator. In using the Kroneckerproduct, we are assuming that although space and time haveindependent influences on the neural dynamics, their conjunc-tion can be appropriately enforced when modeling the neuraldata. When we cannot assume independence between the spa-tial and temporal aspects of our data, nonseparable kernels areuseful alternative options (e.g., ref. 41).

In the following two sections, we examine the potential ofGaussian processes as a linking function connecting neural andbehavioral measures in two ways. First, we examine our ability


Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1


COLL

OQ

UIU

MPA

PER

PSYC

HO

LOG

ICA

LA

ND

COG

NIT

IVE

SCIE

NCE

SN

EURO

SCIE

NCE

to correctly recover the spatiotemporal dynamics in a simulatedexample using the Kronecker-product method. Next, we applyour GPJM to data from a real fMRI experiment in which subjectsare asked to provide a continuous report of motion coherence.

Simulation: The Kronecker Method for SpatiotemporalModelingBefore applying the GPJM to data, it is important to evaluateour ability to accurately fit data and recover latent dynamics. Inparticular, we examine whether it is possible to recover accu-rate spatiotemporal dynamics of the Kronecker-product kernelgoverning the distribution of behavioral and neural data. In thissection, we perform a simulation study with similar structure tothat of the experiment we report below.

The latent cognitive states Ci(t) (i = {1, 2}) were defined bytwo sine waves with different frequencies such that C1(t) =sin(t/8) and C2(t) = sin(t/4). These state vectors composed alatent state matrix X and were used to generate synthetic neu-ral and behavioral data. For the neural data, we assumed 27adjoining voxels in a three-dimensional (3D) space, and for thebehavioral data we assumed one joystick trajectory data vec-tor. As in the neural (temporal) kernel KN (X ,X ), we usedan RBF kernel for describing the spatial covariance. Note thatthe variance parameter of the spatial kernel (corresponding tos2N in Eq. 6) was fixed to one to avoid identifiability issues;specifically, this constraint is necessary because the two vari-ance parameters corresponding to the spatial and temporalkernels cannot be distinguished after applying the Kroneckerproduct.

Fig. 3A illustrates a subset of the three kernels used to gen-erate the data (Left) and estimated from the data (Right): spatial(Top), temporal (Middle), and resulting Kronecker product (Bot-tom). Fig. 3B illustrates the true latent dynamics (Top) andthe recovered latent dynamics (Bottom) as a joint trajectorythrough a two-dimensional space. Colors gradually changingacross points specify the relative position in the cyclic temporaldynamics. Fig. 3C shows the data (black) and model predic-tions (neural, red; behavioral, blue) for the behavioral data (Top)and two selected voxels (Middle and Bottom). These plots revealthat 1) the spatiotemporal GPJM can capture the spatial rela-tionship among voxels using the Kronecker method and 2) themodel can estimate the latent dynamics that explain the gen-erative process of the data with high accuracy. Unlike linear

models, which may only suffer from invariance to orthogonaltransformations, nonlinear models such as GPJM have manymore degrees of freedom. Hence, because the latent dynam-ics do not have a naturally unique solution, we cannot expectour estimated latent dynamics to perfectly match the dynamicsused to generate the data. Instead, to evaluate the quality of theextracted latent dynamics, one must comprehensively considermodel fit to data and the topology of the recovered dynamics(e.g., the frequency of cyclical information and shape). The com-plexity of this issue will become more apparent in the context ofexperimental data.

Another issue worth considering is whether the GPJM can out-perform a standard GP-based model fitted to a single modality ofinformation. In SI Appendix, we discuss an additional simulationwhere we compared the GPJM’s performance to two unimodalGP models fitted to each stream of data separately. Analogous tocovariance-based joint models (12), the GPJM’s fits to the jointstream of data are nearly as good as both of its unimodal coun-terparts, while still obeying the joint statistical structure in bothstreams of data within one cohesive model.

fMRI ExperimentAs a proof of concept, we next apply the GPJM to experimentaldata from an ongoing study. In the experiment, the participantwas presented with a cloud of randomly moving dots, with asubset of those dots moving in a coherent direction. The propor-tion of dots exhibiting coherent movement had the possibility ofchanging every second. The proportion of dots formed the inde-pendent variable of coherence, which was defined by nine pointsalong a left–right axis. The participant was instructed to reportthe average direction of the dots from the beginning of the trialto the current point in time using a joystick.

We used the GPJM defined in the simulation study above tocapture dynamics of the neural and behavioral data in our task.However, to scale the model to fit the data, we made a few sim-plifications of the full spatiotemporal GPJM: 1) We performedan ROI-based analysis, 2) neural data were introduced into themodel after averaging signals across voxels within a set of 16ROIs, and 3) we used only a temporal kernel rather than thefull spatiotemporal Kronecker-product method. Fig. 4A showsthe ROIs, which were preselected using a general linear model-ing (GLM) analysis (see Materials and Methods and SI Appendixfor full details).

A B C

Fig. 3. Simulation results. (A) Spatial, latent temporal, and Kronecker-product kernels used to generate synthetic data (Left) and those estimated fromfitting the model (Right). Only a subsection of each kernel is presented here for visual clarity. (B) Latent representations used to generate the data (Top)and estimated from fitting the model to data (Bottom). Each representation is shown as a two-dimensional trajectory, arbitrarily color coded accordingto moments within one of two cycles. (C) Data (black) and model predictions (blue, behavioral; red, neural) are shown for behavioral (Top) and tworepresentative neural (Middle and Bottom) time series. For the model predictions, solid lines are the mean prediction and the surrounding bands are the95% credible interval.


Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1



A B C

D

Fig. 4. Fits to experiment data. (A) Regions of interest used within the GPJM. One region of interest located in cerebellum is not shown here. (B) Selectedtime series data from neural (Top and Middle) and behavioral (Bottom) data, along with model fits (neural, red; behavioral, blue), along with 95% credibleintervals. (C) A 3D representation of the latent temporal dynamics emerging from the data after fitting the GPJM. Orange dots correspond to gray areas inD, and green, blue, and red lines correspond to projections onto different latent dimensions. (D) Temporal dynamics of the three latent dimensions extractedfrom the GPJM. Gray highlighted areas in D have corresponding time series information, demarcated by orange dots in C. Note that the latent dynamics inC and D are filtered using the Savitzky–Golay filter with a window size of 21 and the third-order polynomial approximation for visual clarity.

Fits to Data. Fig. 4 summarizes the results of the model fit todata. Although we tested GPJMs using three, four, five, and sixlatent dimensions, Fig. 4 summarizes the result of the three-dimensional model for visual clarity (but see SI Appendix forthe four-dimensional model). Fig. 4A shows the ROIs that werefirst extracted from our GLM analysis, from which the neuraltime series were extracted. Fig. 4B shows some example neural(Fig. 4B, Top and Middle) and behavioral (Fig. 4B, Bottom) timeseries data (black) along with corresponding model fits (blue,behavior; red, neural). Fig. 4C illustrates the latent temporaldynamics that emerged in the GPJM using a 3D representa-tion. Fig. 4D shows the time series information for each of thethree latent dimensions. Some abnormal deviations of the latentdimensions are highlighted (gray) in Fig. 4D, and these devia-tions have corresponding paths in Fig. 4C (demarcated by orangedots).

Topology of Latent Dynamics. As a final examination of the GPJM,we can examine what the topology of the extracted latent dynam-ics provides as a function of aspects of our experiment. Althoughinterpreting these latent dynamics is quite challenging as theyemerge from a complex mixture of brain, behavior, and spa-tiotemporal dynamics, some interesting characterizations of thelatent dynamics can be made based on the data to which themodel is being applied. For example, when revisiting differentpoints in the latent dimension through time, there may be someconsistency in the pattern of brain data or the type of responsesbeing generated by the participant (e.g., ref. 12), and such consis-tencies could be exploited when predicting future performance(SI Appendix).

Fig. 5 illustrates how the latent dynamics are associated witha few different key aspects of the experiment. Within Fig. 5 A–C, the same latent dynamics from Fig. 4C are shown, but colorcoded according to one of three variables: (Fig. 5A) stimuluscoherence, (Fig. 5B) joystick position or behavioral response,or (Fig. 5C) functional coactivation. Fig. 5 A and B shows thatalthough the responses for this participant closely tracked thestimulus coherence, there are interesting differences in how the

topology describes these variables. As two prominent examples,when the stimulus was exhibiting near zero coherence and thebehavioral response was strongly to the left, two significant depar-tures from the central aspect of the topology are created, generat-ing two orange ribbons in Fig. 5B in the top left region. Similarly,just below these two ribbons is a situation in which a strongrightward response was made to a near-zero coherence value.

We can also examine whether the topology reflects differencesin the functional coactivation of the neural time series data.For this analysis, we applied a k-means clustering algorithm toa multidimensional scaling representation of the neural data toidentify similar profiles of functional connectivity. Although wetested a few different cluster settings, Fig. 5C shows the resultswith three clusters, where the time points of the latent dynam-ics associated with the three clusters are colored red, yellow,and green, respectively. Although there is considerable overlapin the spatial location of these clusters, the yellow cluster tendstoward the middle of the topology, whereas the green clustertends to be associated with departures from this central struc-ture. After the clusters were identified, Fig. 5 D and E showsthe functional connectivity profiles associated with each clusterby expressing the pairwise functional correlations among sets ofROIs. For this analysis, we chose a cluster as a reference fromwhich to visualize significant changes from one set of coacti-vations to another. Fig. 5E shows the functional coactivationsamong all ROIs within the reference cluster, cluster 3. Fig. 5Dshows the coactivations in clusters 1 and 2, but here only coac-tivations that are significantly different from the coactivationpattern of cluster 3 are shown for visual clarity (see SI Appendixfor raw coefficient values). Although there is some consistencybetween clusters 1 and 3, the coactivation pattern is generallystronger within cluster 1 relative to cluster 3. However, cluster2 is characterized by a strong negative association between theleft inferior frontal gyrus (left IFG; ROI C7) and many otherROIs. The left IFG is known to play a major role in the execu-tion of inhibitory processes, such as those observed within stopsignal paradigms (e.g., refs. 42 and 43). Here, a large negativecoactivation of left IFG with many other brain areas suggests the


Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1





COLL

OQ

UIU

MPA

PER

PSYC

HO

LOG

ICA

LA

ND

COG

NIT

IVE

SCIE

NCE

SN

EURO

SCIE

NCE

A B C

D

Coherence ResponseFunctional

Coactivation

E(Above threshold only) (Above threshold only) (Reference)

Fig. 5. Inspection of latent dynamics. (A–C) The same latent dynamics estimated from the data are shown, color coded according to key properties of thedata: (A) stimulus coherence (left, red; central, gray; right, green), (B) the position of the joystick (behavioral response: left, orange; central, gray; right, blue),and (C) the result of a clustering analysis on the functional coactivation matrices (clusters 1 to 3: red, yellow, and green, respectively). (D and E) Functionalcoactivation matrices for different pairwise region of interest combinations. For this analysis, we chose a baseline coactivation cluster, cluster 3 in E, fromwhich to visualize coactivation changes in the remaining clusters. D shows only coactivations whose differences from cluster 3 are larger than a thresholdof ±0.2, for visual clarity. SI Appendix provides a complete list of the regions of interest (SI Appendix, Tables S3 and S4), as well as the raw coactivationcoefficients (SI Appendix, Fig. S12). Red lines reflect positive correlations whereas blue lines reflect negative correlations. Although the coactivation patternsare consistent between clusters 1 and 3, cluster 1 has generally larger coactivations among the regions of interest. However, cluster 2 involves a strongnegative coactivation between the left inferior frontal gyrus (labeled C7) and many other regions of interest.

inhibition or cancellation of the report of motion coherence, pos-sibly with the intention of making momentary adjustments in theresponse.

DiscussionIn this article, we extended an existing framework for connectingmind, brain, and behavior by specifying Gaussian process linkingfunctions for the dynamics of both latent and manifest variables.This extension allows a few key advantages that will be importantin the field of model-based cognitive neuroscience (4, 7, 8, 44).First, our approach inherits all of the advantages that a hierar-chical structure provides, such as statistical reciprocity throughconditionally independent variables within a global model andflexibly accounting for missing observations (9, 45, 46). Second,the model’s representation of the mind is a latent dynamical sys-tem. Because the mind is a latent construct, we believe it is bestspecified as a set of conditionally independent variables that canbe estimated, rather than as a transformation of either neural orbehavioral data (see ref. 8, for discussions). Third, the represen-tations of all variables within the GPJM are dynamic, allowingus to infer periodic changes at both local and global time scales.Fourth, the GPJM can be specified to allow for both spatial andtemporal structure through the Kronecker product. The fusionof space and time in the latent dynamics allows us to appreciatethe spatiotemporal constraints that any biological system mustobey.

We have only begun to investigate the advantages provided byGPJM, and so at present there remain a number of importantaspects of this innovation that merit further investigation. First,strategies for interpreting the representation inferred from theGPJM’s fit to data will be important for understanding the cog-nitive dynamics that emerge from the model. In the simulation

study, although the latent dynamics shown in Fig. 3B separatedtwo cognitive states from one another, there were other esti-mated dynamics that were not an essential part of the groundtruth. Although complex and ambiguous representations oftenemerge in many other machine-learning applications, the fieldwill require some type of guideline for interpretation of thesecognitive dynamics if GPJMs are to advance our understandingof the connections between mind, brain, and behavior.

Another limitation is the resolution of the data in constrain-ing the GPJM. Although our experimental application used acontinuous motion tracking task, most experiments in psychol-ogy and cognitive science use discrete, “event-related” designsrather than block-type manipulations. Although GPJM could beused in a similar way to that of earlier joint modeling work (12)to exploit trial-to-trial changes in cognitive states (e.g., atten-tion), additional theoretical innovation is needed to account forchanges in cognitive states between two stimulus presentations.

From our perspective, the biggest challenge facing GPJMs iscomputation. To fully capitalize on all of the complexities thatGPJMs offer, computational solutions will be needed to increasethe scalability of our approach to full-brain, spatiotemporalinvestigations. Although there have been attempts to increasethe scalability of joint models (11, 13) and Gaussian processes(47), many interesting opportunities are just out of reach. Forexample, hierarchical extensions for voxel-wise or group-levelmodeling are obvious next steps, but they are currently compu-tationally infeasible. Alternatively, two promising directions aresparse GP approximations and variational inference (e.g., refs.25, 47, and 48), but the tradeoffs of these approximations haveyet to be well studied. For now, we propose the GPJM for itstheoretical advantages and save computational innovations forfuture investigations.


Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1




ConclusionsIn this article, we have extended the basic joint modeling frame-work to incorporate nonlinear dynamics by Gaussian processes.At their core, Gaussian process joint models rely on GPs thatfuse temporal and spatial information, allowing complex latentdynamics to emerge from the intrinsic biological dynamics of thebrain. We view the GP structure as one way to alleviate the bur-den of specifying the particular shape of the linking function;instead, by specifying a flexible, nonparametric functional form,complex shapes of the linking function can emerge directly fromcomplex patterns in the data.

Materials and MethodsModel Implementation and Training. The model was implemented usingGPflow (49), a Gaussian process modeling library based on TensorFlow (50).The linking function and neural submodel used RBF kernels, whereas thebehavioral submodel used a Matern 1/2 kernel.

For efficient training, the model initialized a k-dimensional latent vari-able using the first k principal components estimated from the neural data,following the suggestion from ref. 24. To match the number of time pointsbetween the principal components and behavioral data, we performed prin-cipal component analysis after interpolating the neural data at all missingobservation points.

For the simulation, we used L-BFGS-B (51) to obtain maximum a poste-riori estimates of the model parameters. Following the default setting ofGPflow, optimization steps were limited to 1,000 iterations at maximum.For the fMRI experiment, we used a stochastic optimization algorithm Adam(52). The default learning rate recommended by the authors (α= 0.001) wastoo small for efficient model fitting. To speed up the initial learning phase,we first initialized the learning rate parameter as α0 = 0.20 for fitting tothe complete data and α0 = 0.25 for the cross-validation analyses. Then weupdated the learning rate every 20 iterations following a decreasing func-tion of the optimization steps: αt =α0/

√t−(t mod 20)+1. The optimization

steps were limited to 600 iterations at maximum.

Participant. The data presented here were acquired, after obtaininginformed consent, from a 19-y-old male as part of a larger study that isongoing. All participants in the dataset have normal or corrected-to-normalvision and are self-reported to be right-handed. The study was approved byThe Ohio State University’s Institutional Review Board.

Task and Stimuli. The participant completed four blocks of a random dotmotion task in which he was instructed to continuously track the direc-tion (left or right) and the magnitude of coherence of the dots throughouteach trial. Each block contained 20 trials separated by intertrial intervalsranging from 6 to 8 s. Trials began with a 2-s fixation cross at the centerof the screen followed by 30 s of the randomly moving dots stimulus cen-tered on the screen. There were 150 dots that were each 3 × 3 pixels andthe diameter of the entire dot cloud subtended 5.4◦ of visual angle. Theproportion of coherently moving dots ranged from −0.35 to 0.35 (possiblevalues were −0.35, −0.25, −0.15, −0.05, 0, 0.05, 0.15, 0.25, 0.35), wherenegative coherence indicates leftward motion, positive coherence indicatesrightward motion, and zero indicates uniform circular noise. Coherence foreach trial was initialized by randomly sampling one coherence level fromthe set of all possible values. This first state variable was also initialized tohave an “age” of zero seconds. At each new second, a shift in the valueof the coherence level was stochastically considered. Specifically, we sam-pled a value x∼U(0, 1). If x< 1− exp(−0.1× (age)), the state moved toa new coherence value by randomly sampling one value from only thenearby coherence values to maximize smooth transitions; in this case, theage of the state was reset to zero. If not, the state remained the sameand the age was incremented. The coherence sequences were generatedprior to the experiment using the State Machine Interface Library for Exper-iments (SMILE) (https://smile-docs.readthedocs.io/en/latest/), a Python libraryfor programming psychological experiments. Stimuli were displayed at aviewing distance of 74 cm on a rear-projection screen with a resolutionof 1,280 × 1,024 pixels and a 60-Hz refresh rate. The participant used a

Current Designs TETHYX HHSC-JOY-5 joystick to indicate the direction andmagnitude of coherent motion by moving the joystick left or right along acontinuous slider, which allowed him to monitor his responses in real time.Because the behavioral data were output as the pixel position of the slider,responses were later rescaled to be −1 to 1 with 0.01 precision for the neuralGLM analyses and 0.00001 for the GPJM.

Behavioral Data Processing. SMILE recorded the cursor position with adap-tively changing sampling rate, changing from 1 Hz (when the cursor doesnot move) to 332 Hz. For the modeling purpose, we interpolated anddown-sampled the data to 2 Hz for the neural GLM analyses and the GPJM.

The horizontal cursor position is not appropriate to model with theassumption of Gaussian likelihood because the cursor position is bounded tothe range of [0, 1280]. To model the data using a GP with Gaussian noise, werescaled the data to the range of [0, 1] and logit-transformed them. As therescaled data points with the value of 0 and 1 cannot be logit-transformed,those points were adjusted to 10−5 and 1− 10−5.

MRI Data Analysis. fMRI data preprocessing and analysis were performedprimarily with the fMRI Expert Analysis Tool (FEAT) (53) Version 6.00 inFMRIB’s Software Library (FSL) (https://fsl.fmrib.ox.ac.uk/fsl/). A typical pre-processing pipeline including motion correction, fieldmap-based echo pla-nar imaging distortion correction, removal of nonbrain structures, spatialsmoothing, high-pass filtering, and normalization via grand-mean scalingwas used to clean each block of the four-dimensional task data. Addition-ally, the T1 image was segmented into gray matter (GM), white matter(WM), and cerebrospinal fluid (CSF). EPI data were registered first to theT1-weighted image and then transformed to standard space using the sametransformation matrix acquired from registering the T1 image to standardspace. See SI Appendix for complete preprocessing details.

After preprocessing, FSL’s general linear model tool (FILM; ref. 54) wasused to calculate activity estimates for the preprocessed functional datafrom each block. Three primary predictors were included in the model:participant responses, nonzero stimulus coherence, and zero stimulus coher-ence. Participant responses and stimulus coherence were rescaled such thatthey ranged from −1 to 1. The predictors were convolved with a double-gamma HRF to create the main regressors. Nuisance regressors includingmean WM and CSF signal as well as 24 realignment estimates (X, Y, Z, pitch,yaw, roll, and their first- and second-order temporal derivatives) were alsoadded to the model to account for motion and other variance of no inter-est. The temporal derivatives of all regressors in the model were also addedas confounds of no interest. Finally, the time series was prewhitened withinFILM to correct for autocorrelations in the BOLD signal. Contrasts for eachof the three predictors of interest were calculated as the effect of interestversus no activity (i.e., zero). Additional contrasts included nonzero coher-ence greater than response, nonzero coherence less than response, nonzerocoherence greater than zero coherence, and nonzero coherence less thanzero coherence.

Following the block-wise analyses, a fixed-effects analysis was conductedin FEAT to assess the average effects of each contrast collapsed across blocks.The resulting clusters of the fixed-effects analysis were thresholded at Z≥2.33 and corrected for familywise error at P< 0.05. Unsurprisingly, manyof the resulting clusters from the contrasts comparing nonzero coherenceto participant response or zero coherence shared significant overlap withclusters from the main predictors. Due to the constraints of the GPJM, weselected ROIs that did not have any spatial overlap. Thus, ROIs were definedas the clusters resulting from the contrasts of nonzero coherence versus noactivity and participant response versus no activity.

Data Availability. All code and data that accompany this article can bedownloaded from GitHub: http://github.com/MbCN-lab/gpjm and http://github.com/GiwonBahg/gpjm. The fMRI dataset is available in Open ScienceFramework (https://osf.io/vabrh/?view only=b47ff7bedfa34f3e8ea78bd04bcb28d0).

ACKNOWLEDGMENTS. We thank Michael Shvartsman for helpful discussionsthat motivated the present research. This research was supported by a Fac-ulty Early Career Development (CAREER) award from the National ScienceFoundation (to B.M.T.).

1. G. S. Brindley, Physiology of the Retina and the Visual Pathway (Williams and Wilkins,Oxford, England, ed. 2, 1970).

2. D. Y. Teller, Linking propositions. Vis. Res. 24, 1233–1246 (1984).3. J. D. Schall, On building a bridge between brain and behavior. Annu. Rev. Psychol.

55, 23–50 (2004).

4. B. M. Turner, J. J. Palestro, S. Miletic, B. U. Forstmann. Advances in techniques forimposing reciprocity in brain-behavior relations. Neurosci. Biobehav. Rev. 102, 327–336 (2019).

5. B. U. Forstmann et al., Striatum and pre-SMA facilitate decision-making under timepressure. Proc. Natl. Acad. Sci. U.S.A. 105, 17538–17542 (2008).


Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1

https://smile-docs.readthedocs.io/en/latest/

https://fsl.fmrib.ox.ac.uk/fsl/


http://github.com/MbCN-lab/gpjm

http://github.com/GiwonBahg/gpjm

http://github.com/GiwonBahg/gpjm

https://osf.io/vabrh/?view_only=b47ff7bedfa34f3e8ea78bd04bcb28d0

https://osf.io/vabrh/?view_only=b47ff7bedfa34f3e8ea78bd04bcb28d0


COLL

OQ

UIU

MPA

PER

PSYC

HO

LOG

ICA

LA

ND

COG

NIT

IVE

SCIE

NCE

SN

EURO

SCIE

NCE

6. B. U. Forstmann et al., Cortico-striatal connections predict control over speed andaccuracy in perceptual decision making. Proc. Natl. Acad. Sci. U.S.A. 107, 15916–15920(2010).

7. B. U. Forstmann, E.-J. Wagenmakers, T. Eichele, S. Brown, J. T. Serences, Reciprocalrelations between cognitive neuroscience and formal cognitive models: Oppositesattract? Trends Cognit. Sci. 15, 272–279 (2011).

8. B. M. Turner, B. U. Forstmann, B. C. Love, T. J. Palmeri, L. Van Maanen, Approaches toanalysis in model-based cognitive neuroscience. J. Math. Psychol. 76, 65–79 (2017).

9. B. M. Turner et al., A Bayesian framework for simultaneously modeling neural andbehavioral data. NeuroImage 72, 193–206 (2013).

10. B. M. Turner, P. B. Sederberg, S. D. Brown, M. Steyvers, A method for efficiently sam-pling from distributions with correlated dimensions. Psychol. Methods 18, 368–384(2013).

11. B. M. Turner, C. A. Rodriguez, T. M. Norcia, S. M. McClure, M. Steyvers, Why more isbetter: A method for simultaneously modeling EEG, fMRI, and behavior. NeuroImage128, 96–115 (2016).

12. B. M. Turner, L. Van Maanen, B. U. Forstmann, Combining cognitive abstractions withneurophysiology: The neural drift diffusion model. Psychol. Rev. 122, 312–336 (2015).

13. B. M. Turner, T. Wang, E. Merkel, Factor analysis linking functions for simultaneouslymodeling neural and behavioral data. NeuroImage 153, 28–48 (2017).

14. J. J. Palestro et al., A tutorial on joint models of neural and behavioral measures ofcognition. J. Math. Psychol. 84, 20–48 (2018).

15. J. M. Shine et al., Human cognition involves the dynamic integration of neural activityand neuromodulatory systems. Nat. Neurosci. 22, 289–296 (2019).

16. K. Friston et al., Bayesian decoding of brain images. NeuroImage 39, 181–205 (2008).17. L. M. Harrison, W. Penny, J. Ashburner, T. Trujillo-Barreto, K. J. Friston, Diffusion-

based spatial priors for imaging. NeuroImage 38, 677–695 (2007).18. C. K. I. Williams, C. E. Rasmussen, Gaussian Processes for Machine Learning (MIT Press,

Cambridge, MA, 2006).19. E. Schulz, M. Speekenbrink, A. Krause, A tutorial on Gaussian process regression:

Modelling, exploring, and exploiting functions. J. Math. Psychol. 85, 1–16 (2018).20. R. M. Neal, Bayesian Learning for Neural Networks (Springer, New York, NY,

1996).21. M. Shvartsman, N. Sundaram, M. C. Aoi, A. Charles, T. C. Wilke, J. D. Cohen, Matrix-

normal models for fMRI analysis. arXiv:1711.03058 (8 November 2017).22. N. D. Lawrence, “Gaussian process latent variable models for visualisation of high

dimensional data” in Advances in Neural Information Processing Systems 16, S. Thrun,L. K. Saul, B. Scholkopf, Eds. (MIT Press, 2004), pp. 329–336.

23. N. D. Lawrence, Probabilistic non-linear principal component analysis with Gaussianprocess latent variable models. J. Mach. Learn. Res. 6, 1783–1816 (2005).

24. N. Lawrence, “Hierarchical Gaussian process latent variable models” in Proceedingsof the International Conference in Machine Learning, G. Ghahramani, Ed. (ACM NewYork, NY, 2007), pp. 481–488.

25. M. Titsias, N. D. Lawrence, “Bayesian Gaussian process latent variable model” in Pro-ceedings of the Thirteenth International Conference on Artificial Intelligence andStatistics, Y. W. The, M. Titterington, Eds. (PMLR, 2010), pp. 844–851.

26. A. Shon, K. Grochow, A. Hertzmann, R. Rao, “Learning shared latent structure forimage synthesis and robotic imitation” in Advances in Neural Information ProcessingSystems 18, Y. Weiss, B. Scholkopf, J. C. Platt, Eds. (MIT Press, 2006), pp. 1233–1240.

27. C. H. Ek, “Shared Gaussian process latent variable models,” PhD thesis, OxfordBrookes University, Oxford, United Kingdom (2009).

28. A. C. Damianou, C. H. Ek, M. K. Titsias, N. D. Lawrence, “Manifold relevance determi-nation” in Proceedings of the 29th International Conference on Machine Learning,J. Langford, J. Pineau, Eds. (Omnipress, New York, 2012), pp. 145–152.

29. G. Song, S. Wang, Q. Huang, Q. Tian, “Multimodal Gaussian process latent variablemodels with harmonization” in Proceedings of the IEEE International Conference onComputer Vision (IEEE, Piscataway, NJ, 2017), pp. 5039–5047.

30. G. Song, S. Wang, Q. Huang, Q. Tian, Harmonized multimodal learning withGaussian process latent variable models. IEEE Trans. Pattern. Anal. Mach. Intell.,10.1109/TPAMI.2019.2942028 (2019).

31. A. Wu, N. A. Roy, S. Keeley, J. W. Pillow, “Gaussian process based nonlinear latentstructure discovery in multivariate spike train data” in Advances in Neural Informa-tion Processing Systems 30, I. Guyon et al., Eds. (Curran Associates, Inc., 2017), pp.3496–3505.

32. G. M. Boynton, J. B. Demb, G. H. Glover, D. J. Heeger, Neuronal basis of contrastdiscrimination. Vis. Res. 39, 257–269 (1999).

33. J. Rissman, A. Gazzaley, M. D’Esposito, Measuring functional connectivity duringdistinct stages of a cognitive task. NeuroImage 23, 752–763 (2004).

34. D. Higdon, “Space and space-time modeling using process convolutions” in Quan-titative Methods for Current Environmental Issues, C. W. Anderson, V. Barnett,P. C. Chatwin, A. El-Shaarawi, Eds. (Springer-Verlag, London, UK, 2002), pp. 37–56.

35. M. Alvarez, N. D. Lawrence, “Sparse convolved Gaussian processes for multi-output regression” in Advances in Neural Information Processing Systems 21,D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, Eds. (Curran Associates, Inc., 2009),pp. 57–64.

36. M. A. Alvarez, N. D. Lawrence, Computationally efficient convolved multiple outputGaussian processes. J. Mach. Learn. Res. 12, 1459–1500 (2011).

37. G. H. Glover, Deconvolution of impulse response in event-related bold fMRI1.NeuroImage 9, 416–429 (1999).

38. E. V. Bonilla, K. M. Chai, C. Williams, “Multi-task Gaussian process prediction” inAdvances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer,S. T. Roweis, Eds. (Curran Associates, Inc., 2008), pp. 153–160.

39. S. Flaxman, A. Wilson, D. Neill, H. Nickisch, A. Smola, “Fast Kronecker inference inGaussian processes with non-Gaussian likelihoods” in Proceedings of the 32nd Inter-national Conference on Machine Learning, F. Bach, D. Blei, Eds. (PMLR, 2015), pp.607–616.

40. J. Wilzen, A. Eklund, M. Villani, Physiological Gaussian process priors for thehemodynamics in fMRI analysis. J. Neurosci. Methods 342, 108778.

41. T. Gneiting, Nonseparable, stationary covariance functions for space–time data. J. Am.Stat. Assoc. 97, 590–600 (2002).

42. D. Swick, V. Ashley, A. U. Turken, Left inferior frontal gyrus is critical for responseinhibition. BMC Neurosci. 9, 1–11 (2008).

43. A. Aron, T. W. Robbins, R. A. Poldrack, Inhibition and the right inferior frontal cortex.Trends Cognit. Sci. 8, 170–177 (2004).

44. B. U. Forstmann, E.-J. Wagenmakers, “Model-based cognitive neuroscience: A con-ceptual introduction” in An Introduction to Model-Based Cognitive Neuroscience,B. U. Forstmann, E.-J. Wagenmakers, Eds. (Springer, 2015), pp. 139–156.

45. B. M. Turner, “Constraining cognitive abstractions through Bayesian modeling”in An Introduction to Model-Based Cognitive Neuroscience, B. U. Forstmann,E.-J. Wagenmakers, Eds. (Springer, New York, NY, 2015), pp. 199–220.

46. B. M. Turner, B. U. Forstmann, M. Steyvers, Joint Models of Neural and BehavioralData (Springer, New York, NY, 2019).

47. S. Ahmed, M. Rattray, A. Boukouvalas, GrandPrix: Scaling up the Bayesian GPLVM forsingle-cell data. Bioinformatics 35, 47–54 (2018).

48. M. Galdo, G. Bahg, B. Turner, Variational Bayes methods for cognitive science. Psychol.Methods, 10.1037/met0000242 (2019).

49. D. G. Matthews et al., GPflow: A Gaussian process library using TensorFlow. J. Mach.Learn. Res. 18, 1299–1304 (2017).

50. M. Abadi et al., “TensorFlow: A system for large-scale machine learning” in The12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)(USENIX Association, 2016), pp. 265–283.

51. C. Zhu, R. H. Byrd, P. Lu, J. Nocedal, Algorithm 778: L-bfgs-b: Fortran subroutines forlarge-scale bound-constrained optimization. ACM Trans. Math Software 23, 550–560(1997).

52. D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv:1412.6980 (22December 2014).

53. M. W. Woolrich et al., Bayesian analysis of neuroimaging data in FSL. NeuroImage 45(suppl.), S173–S186 (2009).

54. M. W. Woolrich, B. D. Ripley, M. Brady, S. M. Smith, Temporal autocorrelation inunivariate linear modeling of FMRI data. NeuroImage 14, 1370–1386 (2001).


Dow

nloa

ded

by g

uest

on

Nov

embe

r 14

, 202

1

Gaussian process linking functions for mind, brain, and ...

Documents