Top Banner
STK3100/4100 Autumn 2014 Introduction to Generalized Linear Models (GLM) and mixed models Teacher: Magne Aldrin professor II at Uio main position at Norsk Regnesentral (Norwegian Computing Center) Responsible for exercises: Tonje Gulbrandsen Lien Slides based on previous presentations of Sven Ove Samuelsen and Geir Storvik GLM and MM – p. 1
36

STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Jun 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

STK3100/4100 Autumn 2014• Introduction to Generalized Linear Models

(GLM) and mixed models• Teacher: Magne Aldrin

• professor II at Uio• main position at Norsk Regnesentral

(Norwegian Computing Center)• Responsible for exercises: Tonje Gulbrandsen

Lien• Slides based on previous presentations of Sven

Ove Samuelsen and Geir Storvik

GLM and MM – p. 1

Page 2: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Plan for the day

1. Introduction, literature, computer program

2. Examples

3. Informal definition of GLM

4. Mixed models

5. Plan for the course

GLM and MM – p. 2

Page 3: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Generalized Linear Models (GLM)

• Extension of multiple regression and anova

• An important class of models

• A common framework for regression analysis of

continuous, binary (binomial), categorical (multinomial) or

count response variables

• Includes ordinary least squares regression (OLS), logistic

regression and Poisson regression

GLM and MM – p. 3

Page 4: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Mixed models/random effect models

• Some regression coefficients are random

• Can account for correlations within groups of observations

• Can be combined with GLM

• Active research field, still not fully developed

GLM and MM – p. 4

Page 5: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Goals

• Introduction toGLM

• learn to use these models to analyse practical problems

• know the mathematical background for the analyses

• Knowledge ofmixed models

• learn to use these models to analysesimple practical

problems

• knowledge of approximations and challenges when

using such models

The course will have both a practical and a theoreticalperspective, with examples from medicine, biology, socialsciences, economics and insurance

GLM and MM – p. 5

Page 6: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Literature

Main text book Generalized Linear Models for Insurance Data

by Piet de Jong and Gillian Z. Heller.

• available at Akademika.

• homepage:

www.actuary.mq.edu.au/research/books/GLMsforInsuranceData

Additional text book Mixed Effects Models and Extensions in

Ecology with R by Alain Zuur et al.

• ebook, can be downloaded from internet

• only selected chapters

GLM and MM – p. 6

Page 7: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

R statistical package

• We will use theRpackage for computing

• Can be downloaded for free from

http://mirrors.sunsite.dk/cran/

• Can be used under Windows, Mac and Linux operative

systems

• We will mainly used routines that already are programmed

in R, not much programming by yourselves

• R homepage:http://www.r-project.org/

GLM and MM – p. 7

Page 8: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Data example 1: Birth weight and gestational age

Boys Girls

Duration (weeks) Birth weight (grams) Duration (weeks) Birth weight (grams)

40 2968 40 331738 2795 36 272940 3163 40 293535 2925 38 275436 2625 42 321037 2847 39 281741 3292 40 312640 3473 37 253937 2628 36 241238 3176 38 299140 3421 39 287538 2975 40 3231

Av. 38.33 3024.00 38.75 2911.33

Interested in studying how birth weight depends on gestational

age and gender

GLM and MM – p. 8

Page 9: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Scatter plot for Ex. 1

svangerskapslengde (uker)

fłdse

lsve

kt (g

)

36 38 40 42

2400

2600

2800

3000

3200

3400

+ Jenter

o Gutter

GLM and MM – p. 9

Page 10: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Typical model for Ex. 1: Linear regression

One response variable and two explanatory variables:

Yjk = birth weight for baby no.k gender no.j

xjk = gestational age for baby no.k gender no.j

for k = 1, ..., 12 andj = 1, 2 (j = 1 means boy andj = 2 girl)

Assumed model:

Yjk = αj + βxjk + εjk

whereεjk ∼ N(0, σ2), i.e. normally distributed with mean 0 and

common varianceσ2 and also independent

β = slope, regression coefficient

αj = intercept for genderjGLM and MM – p. 10

Page 11: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Least squares estimates

svangerskapslengde (uker)

fłdse

lsve

kt (g

)

36 38 40 42

2400

2600

2800

3000

3200

3400

GutterJenter

Estimates:α1 = −1610, α2 = −1773, β = 121

GLM and MM – p. 11

Page 12: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Equivalent model formulation

• Linearity: E[Yjk] = µjk = αj + βxjk

• Constant variance: Var[Yjk] = σ2

• Normality: Yjk ∼ N(µjk, σ2)

• Independent responses:Yjk-s are independent

Extensions in STK3100/4100:

• Linearity after transformation ofµ by a link functiong():

g(µjk) = αj + βxjk ⇔ E[Yjk] = µjk = g−1(αj + βxjk)

• The variance depends on the expectation

• Other distributions: Binomial, Poisson, gamma, ...

• Including random effects (mixed models) to account for

dependenciesGLM and MM – p. 12

Page 13: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Data Ex. 2: Deadly dose of poison for beetles

About 60 beetles were exposed to each of 8 differentconcentrations ofCS2, and number killed at each of theconcentrations were recorded

Dose

(log10

CS2mg l−1)

Number

beetles

Number

dead

1.6907 59 6

1.7242 60 13

1.7552 62 18

1.7842 56 28

1.8113 63 52

1.8369 59 53

1.8610 62 61

1.8839 60 60

Want to study how mortality depends on dose

GLM and MM – p. 13

Page 14: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Ex. 2: Proportion of dead beetles vs dose

dose (log_10)

ande

l dod

e bi

ller

1.70 1.75 1.80 1.85

0.0

0.2

0.4

0.6

0.8

1.0

GLM and MM – p. 14

Page 15: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Reasonable model for Ex. 2

Yi = number of dead beetles with dosexi are binomially

distributed

Yi ∼ bin(ni, πi)

whereπi = probability for a beetle to die at dosexi and

ni = number of beetles treated with dosexi

A linear model forπi estimated by ordinary least squares (OLS)

is problematic because

• 0 ≤ πi ≤ 1 that can not be guaranteed by a linear

expressionα + βxi

• Var(Yi) = niπi(1− πi), non-constant (heteroscedastic)

varianceGLM and MM – p. 15

Page 16: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Usual solution for Ex. 2: Logistic regression

Logistic regression model:

πi =exp(α + βxi)

1 + exp(α + βxi)

Then0 ≤ πi ≤ 1

Fit or estimate the model by Maximum Likelihood (ML).

• Take into account that the responses are binomially

distributed

• Estimates are efficient if we have enough data,

(better estimation methods may exist when number of

observations are few)

GLM and MM – p. 16

Page 17: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Logistic regression for Ex. 2

MLE: α = −60.72, β = 34.27

Predicted probabilities:π = exp(α+βx)

1+exp(α+βx)

dose (log_10)

ande

l dod

e bi

ller

1.70 1.75 1.80 1.85

0.0

0.2

0.4

0.6

0.8

1.0

GLM and MM – p. 17

Page 18: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Estimating parameters in logistic regression

Storvik: "Numerical optimization of likelihoods: Additional

literature for STK2120" gives a Newton-Raphson routine inR to

fit logistic regression to these data

But this is already implemented inR. Use the command

glm(cbind(dead,tot-dead)˜Dose,data=beetle,

family=binomial)

• glm = Generalised linear model

• family=binomial indicates that we have binary or

binomial response data

• cbind(dead,tot-dead) is an n x 2 matrix with no.

successes and no. failures in the two columns

GLM and MM – p. 18

Page 19: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Data Ex. 3: Number of children among pregnant

women

de Jong & Heller: Data for no. previous children among 141

pregnant women of various ages.

The number of children tends to increase by age (as expected)

20 25 30 35 40

01

23

45

67

alder

anta

ll bar

n

20 25 30 35 40

0.0

0.5

1.0

1.5

2.0

2.5

alder

gjenn

omsn

ittlig

anta

ll bar

n

GLM and MM – p. 19

Page 20: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Data Ex. 3b: Number of car damages

de Jong & Heller: Data for reported car damages for 65535

policies

Explanatory variables:

• Value of car

• Age of car

• Type of car

• Gender of driver

• Age of driver

The response variable is acount variable in both examples,perhaps Poisson distributed

GLM and MM – p. 20

Page 21: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

In Ex. 3: Yi = No. previous children for mother no. i

it can be reasonable to assume thatYi is Poisson distributed with

expectationµi,

whereµi depends onxi = mother’s age

Similar to Ex. 2:

• Expectationµi > 0

• Variance ofYi equal toµi, i.e. non-constant variance

Usual solution: Poisson regression

Yi ∼ Po(µi) whereµi = exp(α + βxi)

This is also a GLM and can be fitted by the glm function in R

Must specify that response data are Poisson distributed byfamily=poisson

GLM and MM – p. 21

Page 22: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Poisson regression for Ex. 3

MLE for (α, β): (α, β) = (−4.0895, 0.1129)

Gives fitted expectationsµi = exp(α + βxi)

20 25 30 35 40

0.00.5

1.01.5

2.02.5

alder

forve

ntet a

ntall b

arn

o Observert i 5 årsgrupper

Tilpasset med glm

GLM and MM – p. 22

Page 23: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Definition of GLM

Independent responses:Y1, Y2, . . . , Yn conditioned on

explanatory variables

Vectors of explanatory variablesx1,x2, . . . ,xn

wherexi = (xi1, xi2, . . . , xip) arep-dimensional

A GLM = Generalized Linear Model is defined by

• Y1, Y2, . . . , Yn comes from the same class of distributions

from the exponential family

(The exponential family will be defined later. It includes

normal, binomial, Poisson and gamma distributions)

• Linear predictorsηi = β0 + β1xi1 + · · ·+ βpxip

• Link functiong(): µi = E[Yi] is coupled to the linear

predictor byg(µi) = ηiGLM and MM – p. 23

Page 24: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

The linear regression model is a GLM

• Responses (Yi-s) are normally distributed

• Linear predictorηi = β0 + β1xi1 + · · ·+ βpxip

• E[Yi] = µi = ηi, i.e. the link functiong(µi) = µi is the

identity function

The R-commandslm for linear regression andglm does

essentially the same, but with slightly different output

Linear regression is the default specification ofglm

GLM and MM – p. 24

Page 25: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Ex 1: Birth weights

> lm(vekt˜sex+svlengde)

Call:

lm(formula = vekt ˜ sex + svlengde)

Coefficients:

(Intercept) sex svlengde

-1447.2 -163.0 120.9

> glm(vekt˜sex+svlengde)

Call: glm(formula = vekt ˜ sex + svlengde)

Coefficients:

(Intercept) sex svlengde

-1447.2 -163.0 120.9

Degrees of Freedom: 23 Total (i.e. Null); 21 Residual

Null Deviance: 1830000

Residual Deviance: 658800 AIC: 321.4GLM and MM – p. 25

Page 26: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

The logistic regression model is a GLM

• Responses (Yi-s) are binomially distributed Bin(ni, πi)

• Linear predictorηi = β0 + β1xi1 + · · ·+ βpxip

• E[Yi]/ni = πi =exp(ηi)

1+exp(ηi).

Gives the link functiong(πi) = log( πi

1−πi

) = ηi

g(π) = log( π1−π

) = logit(π) is called the logit function

GLM and MM – p. 26

Page 27: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Logistic regression in R

> glmfit = glm(cbind(dead,tot-dead)˜dose,data=beetle,

family=binomial)

> print(glmfit)

Call: glm(formula = cbind(dead, tot - dead) ˜ dose, family = binomial,

data = beetle)

Coefficients:

(Intercept) dose

-60.72 34.27

Degrees of Freedom: 7 Total (i.e. Null); 6 Residual

Null Deviance: 284.2

Residual Deviance: 11.23 AIC: 41.43

GLM and MM – p. 27

Page 28: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

The Poisson regression model is a GLM

• ResponsesYi ∼ Po(µi)

• Linear predictorηi = β0 + β1xi1 + · · ·+ βpxip

• E[Yi] = µi = exp(ηi), i.e. the link functiong(µi) = log(µi)

is the (natural) logarithm

> glm(children˜age,family=poisson)

Call: glm(formula = children ˜ age, family = poisson)

Coefficients:

(Intercept) age

-4.0895 0.1129

Degrees of Freedom: 140 Total (i.e. Null); 139 Residual

Null Deviance: 194.4

Residual Deviance: 165 AIC: 290

GLM and MM – p. 28

Page 29: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Ex. 4

Weights of 30 rats measured weekly in 5 weeks

10 15 20 25 30 35

150

200

250

300

350

days

Wei

ght

GLM and MM – p. 29

Page 30: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Ordinary linear model

ResponseYi,j is weight of rati for weekj.

Individual differences in level can be handled by one intercept

per rat.

Possible model:

Yi,j = αi + β ∗ xj + εi,j, εi,j ∼ N(0, σ2)

wherexj is number of days.Can estimateα1, ..., α30, β, σ

2 by ordinary linear regression

GLM and MM – p. 30

Page 31: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Ex. 4 cont.

The 30 rats are a sample from a population and we are interested

in the whole population.

We assume therefore a distribution forαi for all rats in the

population.

Specifically, we assumeαi ∼ N(α, σ2a),

whereα andσa are parameters

This is an example of amixed model

This mixed model can alternatively be formulated as

Yi,j = α + ai + β ∗ xj + εi,j, εi,j ∼ N(0, σ2)

whereai ∼ N(0, σ2a).

GLM and MM – p. 31

Page 32: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Ex 4: Estimation in R

lme(y˜x,random=˜1|id,data=d)

Linear mixed-effects model fit by REML

Data: d

AIC BIC logLik

1145.302 1157.290 -568.6508

Random effects:

Formula: ˜1 | id

(Intercept) Residual

StdDev: 14.03351 8.203811

Fixed effects: y ˜ x

Value Std.Error DF t-value p-value

(Intercept) 106.56762 3.0379720 119 35.07854 0

x 6.18571 0.0676639 119 91.41824 0

Correlation:

(Intr)

x -0.49

GLM and MM – p. 32

Page 33: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Some extensions

Other GLM-s:

• Count data withV ar(Y ) > E(Y ) - overdispersion:

Negative binomial distribution

• Continuous, non-normal response: Gamma or Inverse

Gaussian distributions

Extensions of GLM:

• Multinomial responses (STK3100)

• Mixed models (STK3100,STK4070)

• Dependent responses (STK3100,STK4060/STK4150)

• Survival data (STK4080)

• Generalized Additive Models (GAM) (STK4030)GLM and MM – p. 33

Page 34: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Overview of the book of de Jong & Heller

• Ch. 1: Introduction, data examples

• Ch. 2: Various distributions (most of these should be

known)

• Ch. 3: Exponential family, ML estimation

• Ch. 4: Linear modelling (mostly known from

STK1110/STK2120)

• Ch. 5: GLM

• Ch. 6: Count data (Poisson regression, overdispersion)

• Ch. 7: Categorical responses (binomial and multinomial)

• Ch. 8: Continuous responses

Ch. 1, 2 and 4 will not be teached in detail. Read!GLM and MM – p. 34

Page 35: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Overview of the book of Zuur et al.

• Ch. 5: Linear mixed models

• Ch. 8: Exponential family (supplement to de Jong &

Heller)

• Ch. 13: GLM and mixed models

• Perhaps other chapters

GLM and MM – p. 35

Page 36: STK3100/4100 Autumn 2014 - Universitetet i oslo · STK3100/4100 Autumn 2014 ... Additional text book Mixed Effects Models and Extensions in Ecology with R by Alain Zuur et al. ...

Plan for the course

de Jong & Heller

• Will mainly follow the chapters in the book 3, 5, 6, 7 and 8

Zuur et al.

• will mostly look at models and examples

The lecture slides will be published at the home page of thiscourse, together with an overview of planned lectures

GLM and MM – p. 36