Modeling Correlated/Clustered Multinomial Data Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County Probability.

Modeling Correlated/Clustered Modeling Correlated/Clustered Multinomial Data Multinomial Data

Justin NewcomerJustin Newcomer

Department of Mathematics and StatisticsDepartment of Mathematics and Statistics

University of Maryland, Baltimore CountyUniversity of Maryland, Baltimore County

Probability and Statistics Day, April 28, 2007Probability and Statistics Day, April 28, 2007

Joint Research with Professor Nagaraj K. Neerchal, UMBC Joint Research with Professor Nagaraj K. Neerchal, UMBC and Jorge G. Morel, PhD, P&G Pharmaceuticals, Inc.and Jorge G. Morel, PhD, P&G Pharmaceuticals, Inc.

2

Motivation

In the analysis of forest pollen, counts of the frequency of occurrence of different kinds of pollen grains are made at various levels of a sediment core

An attempt is then made to reconstruct the past vegetation changes in the area from which the core was taken

Example – Forrest Pollen Count, Mosimann (1962)Example – Forrest Pollen Count, Mosimann (1962)

3

Motivation

Four arboreal types of fossil forest pollen (pine, fir, oak and alder) were counted in the Bellas Artes core from the Valley of Mexico

At various levels of the core, pollen was classified in clusters of 100 pollen grains

The Data:

Example – Forrest Pollen Count, Mosimann (1962)Example – Forrest Pollen Count, Mosimann (1962)

Core Level (Cluster) Pine Fir Oak Alder

1 94 0 5 12 75 2 14 93 81 2 13 44 95 2 3 0

72 80 0 14 673 85 3 9 3

...

4

Motivation

The probability function:

Key assumptions: Each observation can be classified by exactly one of k

possible outcomes, with probabilities 1,..., k

All observations are independent of each other

In our example, since each pollen count comes from a cluster of 100 pollen grains, the individual observations within a cluster can be expected to be correlated The possible correlations are a violation of the multinomial

model assumptions!

The Multinomial ModelThe Multinomial Model

ktk

tt

ktt

m

2121

1 !!

!Pr tT

5

Motivation

How can we properly model these data and estimate the proportions of pollen grains?

What are the effects of using the wrong model?

Problem StatementProblem Statement

6

Overdispersion (Extra Variation)

Data exhibit variances larger than that permitted by the multinomial model

Usually caused by a lack of independence or clustering of experimental units

“Overdispersion is not uncommon in practice. In fact, some would maintain that over-dispersion is the norm in practice and nominal dispersion the exception.” McCullagh and Nelder (1989)

OverviewOverview

7

Overdispersion (Extra Variation)

Usually characterized by the first two moments

The quantity {1+ 2(m – 1)} is known as the design effect (Kish, 1965).

The parameter is known as the “intra class” or “intra cluster” correlation We use to denote a positive intra cluster correlation which

corresponds to overdispersion

Multinomial OverdispersionMultinomial Overdispersion

πT mE

11 DiagVar 2 mm πππT

8

Parameter Estimation

How can we properly model these data and estimate the proportions of pollen grains?

Moment Based

Likelihood Based

Quasi-Likelihood

Generalized Estimating Equations

Finite Mixture Distribution

Dirichlet Multinomial Distribution

(Easily implemented in SAS – Proc Genmod)

(Not currently in SAS – Must write your own code)

9

Quasi-Likelihood Estimation

Here we assume that overdispersion occurs by inflation of variances by a constant factor

Estimate systematic structure of the model via maximum likelihood procedures

Inflate the variance by a suitable constant

Wedderburn (1974), Cox and Snell (1989)Wedderburn (1974), Cox and Snell (1989)

jjjj YYE Var Var ,

10

Generalized Estimating Equations (GEE)

Liang and Zeger (1986), Zeger and Liang (1986)Liang and Zeger (1986), Zeger and Liang (1986)

Extension of Quasi-likelihood to clustered and longitudinal data:

The Generalized Estimating Equations are:

, Cov , E ,αμv,yyμy jrsjsjrjj

, 21 jrsjjmjjj v,....,μ,μμj

Vμ

0βμyVβHβU

n

jjjjj

1

1

11

Likelihood Models for Correlated Multinomial

Multinomial Distribution with a Dirichlet Prior

Dirichlet Multinomial Distribution, Mosimann (1962)Dirichlet Multinomial Distribution, Mosimann (1962)

11 ,, ~ , ~| kCCDirichletlMultinomia PPT

12

It can be shown that

If we let then the moments of the Dirichlet

Multinomial distribution are given by

Dirichlet Multinomial DistributionDirichlet Multinomial Distribution, Mosimann (1962), Mosimann (1962)


k

ii

k

iii

k C

Ct

Cm

C

tt

m

1

1

1 !!

!Pr

tT

2

21

C

πT mE


13


Can be represented as: T=YN+X|N

N Binomial(, m), Y Multinomial(, 1), N Y (X|N) Multinomial(, m-N ) if N < m

Finite Mixture of Multinomials, Morel & Neerchal (1993)Finite Mixture of Multinomials, Morel & Neerchal (1993)

1 1

11

?

?

?

N m-N

YN X given N(a)

0 0

00

?

?

?

N m-N

YN X given N(b)

14


It can be shown that:

If

and,

Then the moments of the Finite Mixture distribution are given by,

Finite Mixture of Multinomials, Morel & Neerchal (1993)Finite Mixture of Multinomials, Morel & Neerchal (1993)

1th ofcolumn theis , )1( kiii i Ieeπp

πp )1( k

πT mE


k

i

k

j

tij

k

i

k

iii

jptt

m

1 111 !!

!PrPr

tXtT

15

Maximum Likelihood Estimation

Computed using the Fisher Scoring Algorithm:

Fisher Information Matrix plays an important role

Can be computationally challenging

Approximations are available

Dirichlet Multinomial FIM can be computed using marginal Beta-Binomial moments

OverviewOverview

ˆ

ˆ ˆ ˆˆ 1

1

i

iiii

L

θ

θθIθθ

)()()(

tθθ

t

θθ

tθI P

PPE

t

loglog)(

22

elements! 176,851 has then 4 and 100 If km

16


Maximum Likelihood Estimation results under the Finite Mixture and Dirichlet Multinomial Distributions

The naïve model underestimates the standard errors

The FM model gives smaller standard errors for the estimates of

Example Example – Forrest Pollen Count, Mosimann (1962)– Forrest Pollen Count, Mosimann (1962)

Parameter Estimate S.E. Estimate S.E. Estimate S.E.

1 0.8627 0.0040 0.8621 0.0065 0.8684 0.00482 0.0141 0.0014 0.0164 0.0022 0.0151 0.00153 0.0906 0.0034 0.0888 0.0053 0.0863 0.0040 0.1278 0.0109 0.0897 0.0139

ModelNaïve (Multinomial) Dirichlet Multinomial Finite Mixture

(pine)

(fir)

(oak)

(alder) 4 = 1-(1 + 2 + 3)

17


Simulation StudySimulation Study

What are the effects of using the wrong model?

After each simulation, we calculate the average of the determinants from each model

A comparison of these averages gives us insight as to which model may be more efficient

Finite Mixture Dirichle Multinomial

Finite Mixture

Calculate an estimate of and its SE under the FM model. Calculate the determinant of the estimated inverse FIM

Calculate an estimate of and its SE under the DM model. Calculate the determinant of the estimated inverse FIM

Likelihood ModelSimulate 5,000 Datasets From

18



The Joint Asymptotic Relative Efficiency (JARE) can be used to summarize the simulation results as it indicates which estimate would have a smaller asymptotic variance

For a vector parameter, JARE is the ratio of the determinants of the asymptotic variance-covariance matrices

ˆ det

ˆ detˆ,ˆ JARE

FMFM

DMDMFMDM

πvar

πvarππ

(0.3) (0.5) (0.1, 0.3)' (0.1, 0.5)'0.3 FM 1.16028 1.24236 1.11770 1.15731

DM 1.15604 1.23019 1.20241 1.22824

0.7 FM 2.20322 2.28815 2.60584 2.67401DM 2.13496 2.19185 3.52726 3.48980

Value of Simulated Data From

19

Conclusions

If we observe correlated/clustered multinomial data, use of the naïve multinomial model causes the standard errors to be underestimated which leads to erroneous inferences and inflated Type-I error rates

If the data truly comes from a Finite Mixture distribution, then estimation using this model clearly outperforms the Dirichlet Multinomial in terms of efficiency

If we are unsure of the distribution, the FM model may underestimate the standard errors and the Dirichlet Multinomial model provides a safe alternative

20

Future Work

Covariates can be included and linked to the model parameters through “link” functions as in the Generalized Linear Model (GLM) frameworkObtain the expressions for the efficiency of likelihood models relative to GEE

Use simulations to see if gains in efficiency of the likelihood models can be achieved over GEEDoes the inclusion of covariates change our conclusions? Does the choice of link function have an influence?

Extension to Include CovariatesExtension to Include Covariates


21

References

Cox, D.R. and Snell, E.J. (1989) Analysis of Binary Data. 2nd Ed. New York: Chapman and Hall.

Kish, L. (1965) Survey Sampling. New York: John Wiley & Sons.

Liang, K.Y. and Zeger, S.L. (1986) “Longitudinal data analysis using generalized linear models.” Biometrika 73: 13-22.

McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd Ed. London: Chapman and Hall.

Morel, J.G. and Nagaraj, N.K. (1993) “A finite mixture distribution for modelling multinomial extra variation.” Biometrika 80: 363-371.

Mosimann, J. E. (1962) “On the Compound Multinomial Distribution, the Multivariate -distribution, and Correlation among Proportions,” Biometrika, 49: 65-82.

Neerchal, N.K. and Morel, J.G. (1998) “Large cluster results for two parametric multinomial extra variation models.” Journal of the American Statistical Association 93: 1078-1087.

Wedderburn, R.W.M. (1974) “Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method.” Biometrika 61: 439-447.

Zeger, S.L. and Liang, K.Y. (1986) “Longitudinal data analysis for discrete and continuous outcomes.” Biometrics 42: 121-130.

Modeling Correlated/Clustered Multinomial Data Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County Probability.

Documents

pollen grainsthe data

proportions of pollen

example forrest pollen

analysis of forest pollen

multinomial model assumptions

likelihood models

multinomial modelmotivationhow

multinomial modelusually