Top Banner
Sampling based appr'ximation of confidence intervals for functions of genetic covariance matrices Karin Meyer 1 David Houle 2 1 Animal Genetics and Breeding Unit, University of New England, Armidale NSW 2351 2 Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295 AAABG 2013
15

Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Aug 07, 2015

Download

Science

prettygully
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling based appr©ximation of

confidence intervals for functions of

genetic covariance matrices

Karin Meyer 1 David Houle 2

1Animal Genetics and Breeding Unit, University of New England, Armidale NSW 2351

2Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295

AAABG 2013

Page 2: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Introduction

REML sampling variances

REML estimates of covariance components

– multivariate normal distribution: θ̂θθ ∼ N (θθθ, I(θθθ)−1)

– inverse of information matrix −→ sampling errors– large sample theory; asymptotic lower bounds

Linear functions of estimates

– sampling variances readily obtained

Non-linear functions

– obtain 1st order Taylor series expansion– evaluate sampling variance of linear approximation– needs partial derivatives w.r.t. all variables−→ can be complicated / tedious−→ options for evaluating in REML software limited

Confidence intervals: ±zα s.e.

– misleading at boundary of parameter space?

K. M. | 2 / 12

“Delta method”

Page 3: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Introduction

Alternatives

Dealing with boundary conditions

– Derive confidence intervals from profile likelihood– Bayesian estimation

General procedure

– Sample data, repeat analysis −→ distribution over reps– slow & laborious!

Objectives

1 Propose new scheme

– sample from (theoretical) distribution of estimates– simple & fast

2 Examine quality of approximation of sampling errors

K. M. | 3 / 12

Page 4: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Introduction

Alternatives

Dealing with boundary conditions

– Derive confidence intervals from profile likelihood– Bayesian estimation

General procedure

– Sample data, repeat analysis −→ distribution over reps– slow & laborious!

Objectives

1 Propose new scheme

– sample from (theoretical) distribution of estimates– simple & fast

2 Examine quality of approximation of sampling errors

K. M. | 3 / 12

Page 5: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Method

Sampling scheme

Large sample theory– (RE)ML estimates have MVN distribution– Sampling covariance ∝ inverse of information matrix

Sample from this distribution

θ̃θθ ∼ N

θ̂θθ, H(θ̂θθ)−1

Information matrix

– Use same parameterisation as REML analysis,→ eliminate linear approx., account for constraints

– Evaluate function(s) of interest for θ̃θθ– Examine distribution over replicates

Mandel, M. (2013) Simulation-based confidence intervals forfunctions with complicated derivatives. American Statistician67, 76–81.

K. M. | 4 / 12

Page 6: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Method

Sampling scheme

Large sample theory– (RE)ML estimates have MVN distribution– Sampling covariance ∝ inverse of information matrix

Sample from this distribution

θ̃θθ ∼ N

θ̂θθ, H(θ̂θθ)−1

Information matrix

– Use same parameterisation as REML analysis,→ eliminate linear approx., account for constraints

– Evaluate function(s) of interest for θ̃θθ– Examine distribution over replicates

Mandel, M. (2013) Simulation-based confidence intervals forfunctions with complicated derivatives. American Statistician67, 76–81.

K. M. | 4 / 12

Page 7: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Simulation

Does it work?

Simulate two data sets

– 4000 animals, 6 traits– h2 = 2× (0.2,0.3,0.4)

– σ2P

= 100– rE = 0.3– a) rG = 0.5, b) rG = |0.7||i−j|

REML analysis

– AI algorithm– Cholesky factor

Estimates

– θ̂θθ– H(θ̂θθ)

Compare estimates of sampling variances

REML Based on H(θ̂θθ), “Delta” method

Empirical Re-sample data using estimates as popul.values, repeat analysis; 10000 replicates

Approx. Sample from MVN distribution, N(θ̂θθ,H(θ̂θθ)−1)

200000 replicates

K. M. | 5 / 12

Page 8: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results

Sampling covariances for Σ̂ΣΣG - a∗

Empirical vs. REML Approximate vs. REML Approximate vs. Empirical

●●

●●●

●●

●●

●●

●●●●●●●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●●●●●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

REML

●●

●●●

●●

●●

●●

●●●●

●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

REML

●●

●●●

●●

●●

●●

●●●●

●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●

●●●●

●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

Empirical0

5

10

15

0 5 10 15 0 5 10 15 0 5 10 15

6 traits, 21 (co)variance components, 231 sampling (co)variances� variance, ◦ covariance

∗Case a: all genetic eigenvalues > 0

K. M. | 6 / 12

Page 9: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results

Sampling covariances for Σ̂ΣΣG - b†

Rank 6 Rank 5

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●●●

0

5

10

15

0 5 10 15 0 5 10 15Empirical

Appro

xim

ate

Approximation unreliable if model is over-parameterised

†Case b: one genetic eigenvalue ≈ 0

K. M. | 7 / 12

Page 10: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results

Delta method for r̂ijEstimate elements of Cholesky L factor of ΣΣΣ = LL′

– H(θ̂θθ)−1 gives Cov(̂lij, l̂mn)

– covariances between σij

Cov(σ̂ij, σ̂kl) ≈f(i,j)∑

t=1

f(k,m)∑

s=1

l̂jt̂lms Cov�

l̂it, l̂ks�

+ l̂jt̂lks Cov�

l̂it, l̂ms

+ l̂it̂lms Cov�

l̂jt, l̂ks�

+ l̂it̂lks Cov�

l̂jt, l̂ms

For r̂ij = σ̂ij/Ç

σ̂2iσ̂2j

Var(r̂ij) ≈�

4σ̂4iσ̂4j

Var(σ̂ij) + σ̂2ijσ̂4j

Var(σ̂2i) + σ̂2

ijσ̂4i

Var(σ̂2j)

− 4σ̂ijσ̂2iσ̂4j

Cov(σ̂ij, σ̂2i)− 4σ̂ijσ̂

4iσ̂2j

Cov(σ̂ij, σ̂2j)

+ 2σ̂2ijσ̂2iσ̂2j

Cov(σ̂2i, σ̂2

j)

/�

4σ̂6iσ̂6j

K. M. | 8 / 12

Page 11: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results

Approximation for r̂ij

Let ΣΣΣ = LL′ and θθθ = vech(L)

For many replicates

– Sample θ̃θθ ∼ N(θ̂θθ,H(θ̂θθ)−1)

– Construct L̃ from θ̃θθ– Calculate Σ̃ΣΣ = L̃L̃

– Calculate correlation r̃ij = σ̃ij/Ç

σ̃2iσ̃2j

Evaluate Var(r̂ij) as emprical variance of r̃ij acrossreplicates

K. M. | 9 / 12

Page 12: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results

Distribution of r̂G12 - bEmpirical

0.5 0.6 0.7 0.8 0.9 1.0Correlation

Approximate

0.5 0.6 0.7 0.8 0.9 1.0Correlation

REML Empirical Approxim.r̂G12 0.897 0.873 0.866s.e. 0.059 0.066 0.063

K. M. | 10 / 12

Page 13: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results

Distribution of second eigenvalueEmpirical

20 30 40Eigenvalue

Approximate

20 30 40Eigenvalue

REML Empirical Approxim.

λ̂2 32.93 33.25 33.84s.e. – 3.27 3.30

K. M. | 11 / 12

Page 14: Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling standard errors | Results | Conclusions

Conclusions

Sampling from MVN distribution

– accommodates arbitrary functions– yields good approximation of sampling variances– easier than Delta method for complicated derivatives– more appropriate confidence interval at boundary of

parameter space– but:

−→ relies on large sample theory−→ information matrix needs to be safely p.d.−→ assumes θ̂θθ ≈ θθθ

Simple but useful addition to our toolkit

– implemented in WOMBAT

K. M. | 12 / 12

Page 15: Sampling based approximation of confidence intervals for functions of genetic covariance matrices