Time Series Karanveer Mohan Keegan Go Stephen Boyd EE103 Stanford University November 15, 2016
Time Series
Karanveer Mohan Keegan Go Stephen Boyd
EE103Stanford University
November 15, 2016
Outline
Introduction
Linear operations
Least-squares
Prediction
Introduction 2
Time series data
I represent time series x1, . . . , xT as T -vector x
I xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T
I examples:
– average temperature at some location on day t– closing price of some stock on (trading) day t– hourly number of users on a website– altitude of an airplane every 10 seconds– enrollment in a class every quarter
I vector time series: xt is an n-vector; can represent as T × n matrix
Introduction 3
Types of time series
time series can be
I smoothly varying or more wiggly and random
I roughly periodic (e.g., hourly temperature)
I growing or shrinking (or both)
I random but roughly continuous
(these are vague labels)
Introduction 4
Melbourne temperature
I daily measurements, for 10 years
I you can see seasonal (yearly) periodicity
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Introduction 5
Melbourne temperature
I zoomed to one year
50 100 150 200 250 300 3500
5
10
15
20
25
Introduction 6
Apple stock price
I log10 of Apple daily share price, over 30 years, 250 trading days/year
I you can see (not steady) growth
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
1.5
2
2.5
Introduction 7
Log price of Apple
I zoomed to one year
6000 6050 6100 6150 6200 62500.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Introduction 8
Electricity usage in (one region of) Texas
I total in 15 minute intervals, over 1 year
I you can see variation over year
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
0
1
2
3
4
5
6
7
8
9
10
Introduction 9
Electricity usage in (one region of) Texas
I zoomed to 1 month
I you can see daily periodicity and weekend/weekday variation
500 1000 1500 2000 25001
2
3
4
5
6
7
8
9
Introduction 10
Outline
Introduction
Linear operations
Least-squares
Prediction
Linear operations 11
Down-sampling
I k× down-sampled time series selects every kth entry of x
I can be written as y = Ax
I for 2× down-sampling, T even,
A =
1 0 0 0 0 0 · · · 0 0 0 00 0 1 0 0 0 · · · 0 0 0 00 0 0 0 1 0 · · · 0 0 0 0...
......
......
......
......
...0 0 0 0 0 0 · · · 1 0 0 00 0 0 0 0 0 · · · 0 0 1 0
I alternative: average consecutive k-long blocks of x
Linear operations 12
Up-sampling
I k× (linear) up-sampling interpolates between entries of x
I can be written as y = Ax
I for 2× up-sampling
A =
11/2 1/2
11/2 1/2
1. . .
11/2 1/2
1
Linear operations 13
Up-sampling on Apple log price
4× up-sample
5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
Linear operations 14
Smoothing
I k-long moving average y of x is given by
yi =1
k(xi + xi+1 + · · ·+ xi+k−1), i = 1, . . . , T − k + 1
I can express as y = Ax, e.g., for k = 3,
A =
1/3 1/3 1/3
1/3 1/3 1/31/3 1/3 1/3
. . .
1/3 1/3 1/3
I can also have trailing or centered smoothing
Linear operations 15
Melbourne daily temperature smoothed
I centered smoothing with window size 41
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Linear operations 16
First-order differences
I (first-order) difference between adjacent entries
I discrete analog of derivative
I express as y = Dx, D is the (T − 1)× T difference matrix
D =
−1 1 . . .
−1 1 . . .. . .
. . .
. . . −1 1
I ‖Dx‖2 (Laplacian) is a measure of the wiggliness of x
‖Dx‖2 = (x2 − x1)2 + · · ·+ (xT − xT−1)2
Linear operations 17
Outline
Introduction
Linear operations
Least-squares
Prediction
Least-squares 18
De-meaning
I de-meaning a time series means subtracting its mean:x = x− avg(x)
I rms(x) = std(x)
I this is the least-squares fit with a constant
Least-squares 19
Straight-line fit and de-trending
I fit data (1, x1), . . . , (T, xT ) with affine model xt ≈ a+ bt(also called straight-line fit)
I b is called the trend
I a+ bt is called the trend line
I de-trending a time series means subtracting its straight-line fit
I de-trended time series shows variations above and below thestraight-line fit
Least-squares 20
Straight-line fit on Apple log price
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
1.5
2
2.5
Trend
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
Residual
Least-squares 21
Periodic time series
I let P -vector z be one period of periodic time series
xper = (z, z, . . . , z)
(we assume T is a multiple of P )
I express as xper = Az with
A =
IP...IP
Least-squares 22
Extracting a periodic component
I given (non-periodic) time series x, choose z to minimize ‖x−Az‖2
I gives best least-squares fit with periodic time series
I simple solution: average periods of original:
z = (1/k)ATx, k = T/P
I e.g., to get z for January 9, average all xi’s with date January 9
Least-squares 23
Periodic component of Melbourne temperature
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Least-squares 24
Extracting a periodic component with smoothing
I can add smoothing to periodic fit by minimizing
‖x−Az‖2 + λ‖Dz‖2
I λ > 0 is smoothing parameter
I D is P × P circular difference matrix
D =
−1 1
−1 1. . .
. . .
−1 11 −1
I λ is chosen visually or by validation
Least-squares 25
Choosing smoothing via validation
I split data into train and test sets, e.g., test set is last period (Pentries)
I train model on train set, and test on the test set
I choose λ to (approximately) minimize error on the test set
Least-squares 26
Validation of smoothing for Melbourne temperature
trained on first 8 years; tested on last two years
10−2
10−1
100
101
102
2.6
2.65
2.7
2.75
2.8
2.85
2.9
2.95
3
Least-squares 27
Periodic component of temperature with smoothing
I zoomed on test set, using λ = 30
2900 3000 3100 3200 3300 3400 3500 36000
5
10
15
20
25
Least-squares 28
Outline
Introduction
Linear operations
Least-squares
Prediction
Prediction 29
Prediction
I goal: predict or guess xt+K given x1, . . . , xtI K = 1 is one-step-ahead prediction
I prediction is often denoted xt+K , or more explicitly x(t+K|t)(estimate of xt+K at time t)
I xt+K − xt+K is prediction error
I applications: predict
– asset price– product demand– electricity usage– economic activity– position of vehicle
Prediction 30
Some simple predictors
I constant: xt+K = a
I current value: xt+K = xtI linear (affine) extrapolation from last two values:
xt+K = xt +K(xt − xt−1)
I average to date: xt+K = avg(x1:t)
I (M + 1)-period rolling average: xt+K = avg(x(t−M):t)
I straight-line fit to date (i.e., based on x1:t)
Prediction 31
Auto-regressive predictor
I auto-regressive predictor:
xt+K = (xt, xt−1, . . . , xt−M )Tβ
– M is memory length– (M + 1)-vector β gives predictor weights– can add offset v to xt+K
I prediction xt+K is linear function of past window xt−M :t
I (which of the simple predictors above have this form?)
Prediction 32
Least squares fitting of auto-regressive models
I choose coefficients β via least squares (regression)
I regressors are (M + 1)-vectors
x1:(M+1), . . . , x(N−M):N
I outcomes are numbers
xM+K+1, . . . , xN+K
I can add regularization on β
Prediction 33
Evaluating predictions with validation
I for simple methods: evaluate RMS prediction error
I for more sophisticated methods:
– split data into a training set and a test set (usually sequential)– train prediction on training data– test on test data
Prediction 34
Example
I predict Texas energy usage one step ahead (K = 1)
I train on first 10 months, test on last 2
Prediction 35
Coefficients
I using M = 100I 0 is the coefficient for today
0 10 20 30 40 50 60 70 80 90 100−1
−0.5
0
0.5
1
1.5
2
Prediction 36
Auto-regressive prediction results
3 3.1 3.2 3.3 3.4 3.5
x 104
1
2
3
4
5
6
7
8
9
Prediction 37
Auto-regressive prediction results
showing the residual
3 3.1 3.2 3.3 3.4 3.5
x 104
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Prediction 38
Auto-regressive prediction results
predictor RMS erroraverage (constant) 1.20current value 0.119auto-regressive (M = 10) 0.073auto-regressive (M = 100) 0.051
Prediction 39
Autoregressive model on residuals
I fit a model to the time series, e.g., linear or periodic
I subtract this model from the original signal to compute residuals
I apply auto-regressive model to predict residuals
I can add predicted residuals back to model to obtain predictions
Prediction 40
Example
I Melbourne temperature data residualsI zoomed on 100 days in test set
3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−8
−6
−4
−2
0
2
4
6
8
Prediction 41
Auto-regressive prediction of residuals
3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−10
−8
−6
−4
−2
0
2
4
6
8
10
Prediction 42
Prediction results for Melbourne temperature
I tested on last two years
predictor RMS erroraverage 4.12current value 2.57periodic (no smoothing) 2.71periodic (smoothing, λ = 30) 2.62auto-regressive (M = 3) 2.44auto-regressive (M = 20) 2.27auto-regressive on residual (M = 20) 2.22
Prediction 43