Time Series - Stanford Universityweb.stanford.edu/class/ee103/lectures/time_series_slides.pdfTypes of time series time series can be I smoothly varying or more wiggly and random I

Time Series

Karanveer Mohan Keegan Go Stephen Boyd

EE103Stanford University

November 15, 2016

Outline

Introduction

Linear operations

Least-squares

Prediction

Introduction 2

Time series data

I represent time series x1, . . . , xT as T -vector x

I xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T

I examples:

– average temperature at some location on day t– closing price of some stock on (trading) day t– hourly number of users on a website– altitude of an airplane every 10 seconds– enrollment in a class every quarter

I vector time series: xt is an n-vector; can represent as T × n matrix

Introduction 3

Types of time series

time series can be

I smoothly varying or more wiggly and random

I roughly periodic (e.g., hourly temperature)

I growing or shrinking (or both)

I random but roughly continuous

(these are vague labels)

Introduction 4

Melbourne temperature

I daily measurements, for 10 years

I you can see seasonal (yearly) periodicity

0 500 1000 1500 2000 2500 3000 3500 4000−5

0

5

10

15

20

25

30

Introduction 5

Melbourne temperature

I zoomed to one year

50 100 150 200 250 300 3500

5

10

15

20

25

Introduction 6

Apple stock price

I log10 of Apple daily share price, over 30 years, 250 trading days/year

I you can see (not steady) growth

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

−0.5

0

0.5

1

1.5

2

2.5

Introduction 7

Log price of Apple

I zoomed to one year

6000 6050 6100 6150 6200 62500.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

Introduction 8

Electricity usage in (one region of) Texas

I total in 15 minute intervals, over 1 year

I you can see variation over year

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

1

2

3

4

5

6

7

8

9

10

Introduction 9

Electricity usage in (one region of) Texas

I zoomed to 1 month

I you can see daily periodicity and weekend/weekday variation

500 1000 1500 2000 25001

2

3

4

5

6

7

8

9

Introduction 10

Outline

Introduction

Linear operations

Least-squares

Prediction

Linear operations 11

Down-sampling

I k× down-sampled time series selects every kth entry of x

I can be written as y = Ax

I for 2× down-sampling, T even,

A =

1 0 0 0 0 0 · · · 0 0 0 00 0 1 0 0 0 · · · 0 0 0 00 0 0 0 1 0 · · · 0 0 0 0...

......

......

......

......

...0 0 0 0 0 0 · · · 1 0 0 00 0 0 0 0 0 · · · 0 0 1 0

I alternative: average consecutive k-long blocks of x


Up-sampling

I k× (linear) up-sampling interpolates between entries of x

I can be written as y = Ax

I for 2× up-sampling

A =

11/2 1/2

11/2 1/2

1. . .

11/2 1/2

1


Up-sampling on Apple log price

4× up-sample

5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17


Smoothing

I k-long moving average y of x is given by

yi =1

k(xi + xi+1 + · · ·+ xi+k−1), i = 1, . . . , T − k + 1

I can express as y = Ax, e.g., for k = 3,

A =

1/3 1/3 1/3

1/3 1/3 1/31/3 1/3 1/3

. . .

1/3 1/3 1/3

I can also have trailing or centered smoothing


Melbourne daily temperature smoothed

I centered smoothing with window size 41

0 500 1000 1500 2000 2500 3000 3500 4000−5

0

5

10

15

20

25

30


First-order differences

I (first-order) difference between adjacent entries

I discrete analog of derivative

I express as y = Dx, D is the (T − 1)× T difference matrix

D =

−1 1 . . .

−1 1 . . .. . .

. . .

. . . −1 1

I ‖Dx‖2 (Laplacian) is a measure of the wiggliness of x

‖Dx‖2 = (x2 − x1)2 + · · ·+ (xT − xT−1)2


Outline

Introduction

Linear operations

Least-squares

Prediction

Least-squares 18

De-meaning

I de-meaning a time series means subtracting its mean:x = x− avg(x)

I rms(x) = std(x)

I this is the least-squares fit with a constant

Least-squares 19

Straight-line fit and de-trending

I fit data (1, x1), . . . , (T, xT ) with affine model xt ≈ a+ bt(also called straight-line fit)

I b is called the trend

I a+ bt is called the trend line

I de-trending a time series means subtracting its straight-line fit

I de-trended time series shows variations above and below thestraight-line fit

Least-squares 20

Straight-line fit on Apple log price

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

−0.5

0

0.5

1

1.5

2

2.5

Trend

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1

−0.5

0

0.5

1

Residual

Least-squares 21

Periodic time series

I let P -vector z be one period of periodic time series

xper = (z, z, . . . , z)

(we assume T is a multiple of P )

I express as xper = Az with

A =

IP...IP

Least-squares 22

Extracting a periodic component

I given (non-periodic) time series x, choose z to minimize ‖x−Az‖2

I gives best least-squares fit with periodic time series

I simple solution: average periods of original:

z = (1/k)ATx, k = T/P

I e.g., to get z for January 9, average all xi’s with date January 9

Least-squares 23

Periodic component of Melbourne temperature

0 500 1000 1500 2000 2500 3000 3500 4000−5

0

5

10

15

20

25

30

Least-squares 24

Extracting a periodic component with smoothing

I can add smoothing to periodic fit by minimizing

‖x−Az‖2 + λ‖Dz‖2

I λ > 0 is smoothing parameter

I D is P × P circular difference matrix

D =

−1 1

−1 1. . .

. . .

−1 11 −1

I λ is chosen visually or by validation

Least-squares 25

Choosing smoothing via validation

I split data into train and test sets, e.g., test set is last period (Pentries)

I train model on train set, and test on the test set

I choose λ to (approximately) minimize error on the test set

Least-squares 26

Validation of smoothing for Melbourne temperature

trained on first 8 years; tested on last two years

10−2

10−1

100

101

102

2.6

2.65

2.7

2.75

2.8

2.85

2.9

2.95

3

Least-squares 27

Periodic component of temperature with smoothing

I zoomed on test set, using λ = 30

2900 3000 3100 3200 3300 3400 3500 36000

5

10

15

20

25

Least-squares 28

Outline

Introduction

Linear operations

Least-squares

Prediction

Prediction 29

Prediction

I goal: predict or guess xt+K given x1, . . . , xtI K = 1 is one-step-ahead prediction

I prediction is often denoted xt+K , or more explicitly x(t+K|t)(estimate of xt+K at time t)

I xt+K − xt+K is prediction error

I applications: predict

– asset price– product demand– electricity usage– economic activity– position of vehicle

Prediction 30

Some simple predictors

I constant: xt+K = a

I current value: xt+K = xtI linear (affine) extrapolation from last two values:

xt+K = xt +K(xt − xt−1)

I average to date: xt+K = avg(x1:t)

I (M + 1)-period rolling average: xt+K = avg(x(t−M):t)

I straight-line fit to date (i.e., based on x1:t)

Prediction 31

Auto-regressive predictor

I auto-regressive predictor:

xt+K = (xt, xt−1, . . . , xt−M )Tβ

– M is memory length– (M + 1)-vector β gives predictor weights– can add offset v to xt+K

I prediction xt+K is linear function of past window xt−M :t

I (which of the simple predictors above have this form?)

Prediction 32

Least squares fitting of auto-regressive models

I choose coefficients β via least squares (regression)

I regressors are (M + 1)-vectors

x1:(M+1), . . . , x(N−M):N

I outcomes are numbers

xM+K+1, . . . , xN+K

I can add regularization on β

Prediction 33

Evaluating predictions with validation

I for simple methods: evaluate RMS prediction error

I for more sophisticated methods:

– split data into a training set and a test set (usually sequential)– train prediction on training data– test on test data

Prediction 34

Example

I predict Texas energy usage one step ahead (K = 1)

I train on first 10 months, test on last 2

Prediction 35

Coefficients

I using M = 100I 0 is the coefficient for today

0 10 20 30 40 50 60 70 80 90 100−1

−0.5

0

0.5

1

1.5

2

Prediction 36

Auto-regressive prediction results

3 3.1 3.2 3.3 3.4 3.5

x 104

1

2

3

4

5

6

7

8

9

Prediction 37


showing the residual

3 3.1 3.2 3.3 3.4 3.5

x 104

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Prediction 38


predictor RMS erroraverage (constant) 1.20current value 0.119auto-regressive (M = 10) 0.073auto-regressive (M = 100) 0.051

Prediction 39

Autoregressive model on residuals

I fit a model to the time series, e.g., linear or periodic

I subtract this model from the original signal to compute residuals

I apply auto-regressive model to predict residuals

I can add predicted residuals back to model to obtain predictions

Prediction 40

Example

I Melbourne temperature data residualsI zoomed on 100 days in test set

3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−8

−6

−4

−2

0

2

4

6

8

Prediction 41

Auto-regressive prediction of residuals

3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−10

−8

−6

−4

−2

0

2

4

6

8

10

Prediction 42

Prediction results for Melbourne temperature

I tested on last two years

predictor RMS erroraverage 4.12current value 2.57periodic (no smoothing) 2.71periodic (smoothing, λ = 30) 2.62auto-regressive (M = 3) 2.44auto-regressive (M = 20) 2.27auto-regressive on residual (M = 20) 2.22

Prediction 43

Time Series - Stanford Universityweb.stanford.edu/class/ee103/lectures/time_series_slides.pdfTypes of time series time series can be I smoothly varying or more wiggly and random I

Documents