Modeling Clinical Time Series Using Gaussian Process Sequences Zitao Liu Lei Wu Milos Hauskrecht Department of Computer Science, University of Pittsburgh Motivation Background (con’t) State Space Gaussian Process (con’t) Experiments Goal “Develop accurate models of complex clinical time series!” Specifically, a prediction model that can: 1. Handle missing values 2. Deal with irregular time sampling intervals 3. Make accurate long term predictions Problem Statement We define the time series prediction/regression function for clinical time series as: where is a sequence of past observation-time pairs such that, , is a p-dimensional observation vector made at time ( ), and n is the number of past observations; and is the time at which we would like to predict the observation . Irregularly sampled, . obs : g t Y y obs Y obs 1 ( ,) n i i i t Y y i y i t 1 0 i i t t 1 1 i i i i t t t t y n t t ??? Time Value ( , ) Development of accurate models of complex clinical time series data is critical for understanding the disease, its dynamics, and subsequently patient management and clinical decision making. • Gaussian Process (GP) GP is an extension of a multivariate Gaussian to distributions over functions. Defined by two components: . ( (),(, ')) mx kxx Mean function: Covariance function: () [ ( )] m f x x (, ) [( () ( ))( ( ) ( ))] K f m f m xx x x x x GP regression equations: Estimated Mean : Estimated Covariance : 1 2 * ( ,) (,) Kx K I x xx y * ( ( )) Cov f * ( ) f 1 2 * * * * ( , ) ( ,) (,) ( ,) Kx x Kx K I Kx x xx x Time Value ??? • Discrete non-linear model (GPIL) Y – time series of observations; Z – hidden states driving the dynamics. 1 ( ) t t t r z z w ( ) t t t u y z v 1 1 1 ~ ( , ), V z ~ (0, ), t Q w ~ (0, ) t R v ) ( u ) ( r ) ( r ) ( u ) ( u – unknown transition function; – unknown measurement function. ) ( r ) ( u Acknowledgement This research work was supported by grants R01LM010019 and R01GM088224 from the National Institutes of Health. Its content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Future Work • Study and model dependences among multiple time series • Extend to switching-state and controlled dynamical systems Reference • M. Hauskrecht, M. Valko, I. Batal, G. Clermont, S. Visweswaran, and G.F. Cooper, Conditional outlier detection for clinical alerting, in AMIA Annual Symposium Proceedings, 2010, p. 286. • Carl Edward Rasmussen and Christopher K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006. • R. Turner, M.P. Deisenroth, and C.E. Rasmussen, State-space inference and learning with Gaussian processes, in AISTATS, vol. 9, 2010, pp. 868-875. • Data • Evaluation Metric 1/2 1 2 1 | | n i i i RMSE n y y Root Mean Square Error(RMSE): • Results • Choice of Covariance Functions( ) Mean Reverting Property: Periodicity: 1 1 1 exp( | |) K t t 2 2 2 2 exp( sin ( )) 2 K t t 1 2 K K K Figure 2. Time series for six tests from the Complete Blood Count(CBC) panel for one of the patients. Figure 3. Root Mean Square Error(RMSE) on CBC test samples. Figure 1. Graphical representation of the state-space Gaussian process model. Shaded nodes denote (irregular) observations and shaded nodes denote times associated with each observation. Each rectangle (plate) corresponds to a window, which is associated with its own local GP. is the number of observations in each window. is Gaussian field. i s , ij y , ij T , ij f • State Space Gaussian Process(SSGP) Model () () () , ()~ (0, (, )) T f q f f K t t ht β t tt ()~ (() , (, ) () ( )) T q q K T t ht b tt ht ht We consider the Gaussian process q(t) with the mean function formed by a combination of a fixed set of basis functions with coefficients, β: In this definition, f(t) is a zero mean GP , h(t) denotes a set of fixed basis functions, for example, , and β is a Gaussian prior, . Therefore, q(t) is another GP process, defined by: 2 () (1, , , ) h tt t ~ (,) I b Background • Linear Dynamical System (LDS) 1 ( | ) ( , ), t t t p A Q z z z ( | ) ( , ) t t t p C R y z z 1 t t t A z z w t t t C y z v 1 1 1 ~ ( , ), V z ~ (0, ), t Q w ~ (0, ) t R v Y – time series of observations; Z – hidden states driving the dynamics. A A C C C ??? Time Value • Idea Illustration Time Value Time Value Time Value State Space Gaussian Process • Learning 1 1 1 log ( | ) 1 1 Tr 2 2 T p K K K K K Y Y Y Parameter Set: 1 1 { ,{ }, , , , , , } i ACRQ V β (Θ denotes covariance function parameters) Learn Θ: gradient based methods( ) Learn Ω\Θ: EM algorithm with , [log (,, )] p βz βzY 1 1 , 2 1 1 1 ( ) (,, ) ( ) ( | ) ( | ) ( | ) i s m m m i i i i ij i i i i j pD p p p p z βY z z z β z y β Joint distribution: • Prediction 1. Split and t into windows. 2. For windows that do not contain t, extract the last values in those windows as βs and feed them into Kalman Filter algorithms to infer the most recent hidden state where k is the index of the last window that does not contain t. 3. Get from and . 4. If t is in window k+1, use observations in window k+1 and to make the prediction, where ; otherwise find out the window index i where t belongs to. The prediction at t is . To support the prediction inference, we need the following steps: obs Y 1 1 k k C β z 1 k k CA β z k z 1 1 1 1 1 1 1 (, ) ( , )( ) k k k k k k Ktt K t t y β y β 1 1 ( , ) k k t y 1 k k A z z i k k CA y z 1 k β Patient management Making decision Disease understanding