Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling based appr©ximation of

confidence intervals for functions of

genetic covariance matrices

Karin Meyer 1 David Houle 2

1Animal Genetics and Breeding Unit, University of New England, Armidale NSW 2351

2Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295

AAABG 2013

Sampling standard errors | Introduction

REML sampling variances

REML estimates of covariance components

– multivariate normal distribution: θ̂θθ ∼ N (θθθ, I(θθθ)−1)

– inverse of information matrix −→ sampling errors– large sample theory; asymptotic lower bounds

Linear functions of estimates

– sampling variances readily obtained

Non-linear functions

– obtain 1st order Taylor series expansion– evaluate sampling variance of linear approximation– needs partial derivatives w.r.t. all variables−→ can be complicated / tedious−→ options for evaluating in REML software limited

Confidence intervals: ±zα s.e.

– misleading at boundary of parameter space?

K. M. | 2 / 12

“Delta method”


Alternatives

Dealing with boundary conditions

– Derive confidence intervals from profile likelihood– Bayesian estimation

General procedure

– Sample data, repeat analysis −→ distribution over reps– slow & laborious!

Objectives

1 Propose new scheme

– sample from (theoretical) distribution of estimates– simple & fast

2 Examine quality of approximation of sampling errors

K. M. | 3 / 12


Alternatives

Dealing with boundary conditions

– Derive confidence intervals from profile likelihood– Bayesian estimation

General procedure

– Sample data, repeat analysis −→ distribution over reps– slow & laborious!

Objectives

1 Propose new scheme

– sample from (theoretical) distribution of estimates– simple & fast

2 Examine quality of approximation of sampling errors

K. M. | 3 / 12

Sampling standard errors | Method

Sampling scheme

Large sample theory– (RE)ML estimates have MVN distribution– Sampling covariance ∝ inverse of information matrix

Sample from this distribution

θ̃θθ ∼ N

�

θ̂θθ, H(θ̂θθ)−1

�

Information matrix

– Use same parameterisation as REML analysis,→ eliminate linear approx., account for constraints

– Evaluate function(s) of interest for θ̃θθ– Examine distribution over replicates

Mandel, M. (2013) Simulation-based confidence intervals forfunctions with complicated derivatives. American Statistician67, 76–81.

K. M. | 4 / 12

Sampling standard errors | Method

Sampling scheme

Large sample theory– (RE)ML estimates have MVN distribution– Sampling covariance ∝ inverse of information matrix

Sample from this distribution

θ̃θθ ∼ N

�

θ̂θθ, H(θ̂θθ)−1

�

Information matrix

– Use same parameterisation as REML analysis,→ eliminate linear approx., account for constraints

– Evaluate function(s) of interest for θ̃θθ– Examine distribution over replicates

Mandel, M. (2013) Simulation-based confidence intervals forfunctions with complicated derivatives. American Statistician67, 76–81.

K. M. | 4 / 12

Sampling standard errors | Simulation

Does it work?

Simulate two data sets

– 4000 animals, 6 traits– h2 = 2× (0.2,0.3,0.4)

– σ2P

= 100– rE = 0.3– a) rG = 0.5, b) rG = |0.7||i−j|

REML analysis

– AI algorithm– Cholesky factor

Estimates

– θ̂θθ– H(θ̂θθ)

Compare estimates of sampling variances

REML Based on H(θ̂θθ), “Delta” method

Empirical Re-sample data using estimates as popul.values, repeat analysis; 10000 replicates

Approx. Sample from MVN distribution, N(θ̂θθ,H(θ̂θθ)−1)

200000 replicates

K. M. | 5 / 12

Sampling standard errors | Results

Sampling covariances for Σ̂ΣΣG - a∗

Empirical vs. REML Approximate vs. REML Approximate vs. Empirical

●●

●

●●●

●

●●

●

●

●

●

●

●●

●●

●●●●●●●●●●●

●

●●

●

●

●●●

●

●●

●●●

●

●●

●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●

●

●

●

●●●●

●

●●

●

●

●

●

●●

●

●

●●

●●

●●●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●●●●

●

●●

●●

●

●●●●●●●●●●

●

●

●●

●

●

●●●

●

●

●●●

●●●

●●●

●

●●

●●

●

●●●

●

●●

●

●

●●

●●●

●

●●

●●

●

●●●

●

●●

●

●

●

●

●

●

●●●

REML

●●

●

●●●

●

●●

●

●

●

●

●

●●

●●

●●●●

●●●●●●●

●

●●

●

●

●●●

●●●●●●

●

●●

●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●

●

●

●

●●●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●●●●

●

●●

●●

●

●●●●

●●●●●●

●

●

●●

●

●

●●●

●

●

●

●●

●●●

●●●

●

●●

●●

●

●●●

●

●●

●

●

●●

●●●

●

●●●●

●

●●●

●

●●

●

●

●

●

●

●

●●●

REML

●●

●

●●●

●

●●

●

●

●

●

●

●●

●●

●●●●

●●●●●●●

●

●●

●

●

●●●

●●●●●●

●

●●

●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●

●

●

●

●●●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●●●●

●

●●●●

●

●●●●

●●●●●●

●

●

●●

●

●

●●●

●

●

●

●●

●●●

●●●

●

●●

●●

●

●●●

●

●●

●

●

●●

●●●

●

●●●●

●

●●●

●

●●

●

●

●

●

●

●

●●●

Empirical0

5

10

15

0 5 10 15 0 5 10 15 0 5 10 15

6 traits, 21 (co)variance components, 231 sampling (co)variances� variance, ◦ covariance

∗Case a: all genetic eigenvalues > 0

K. M. | 6 / 12


Sampling covariances for Σ̂ΣΣG - b†

Rank 6 Rank 5

●●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●●

●●●

●●

●

●

●

●

●

●

●

●●●

●●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●●

●●

●●●●

●●

●

●●

●

●●

●

●

●●

●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●●●●

●

●●●

●

●●

●

●●

●

●

●●●●

●●

●

●●●

●

●●

●

●

●

●

●

●●

●

●●●

●●

●●●

●●●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●

●●

●●●

●

●●

●

●●

●

●●

●

●

●●

●

●●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●

●

●●●●

●

●●●

●

●●

●

●●

●

●

●●●●

0

5

10

15

0 5 10 15 0 5 10 15Empirical

Appro

xim

ate

Approximation unreliable if model is over-parameterised

†Case b: one genetic eigenvalue ≈ 0

K. M. | 7 / 12


Delta method for r̂ijEstimate elements of Cholesky L factor of ΣΣΣ = LL′

– H(θ̂θθ)−1 gives Cov(̂lij, l̂mn)

– covariances between σij

Cov(σ̂ij, σ̂kl) ≈f(i,j)∑

t=1

f(k,m)∑

s=1

�

l̂jt̂lms Cov�

l̂it, l̂ks�

+ l̂jt̂lks Cov�

l̂it, l̂ms

�

+ l̂it̂lms Cov�

l̂jt, l̂ks�

+ l̂it̂lks Cov�

l̂jt, l̂ms

�

�

For r̂ij = σ̂ij/Ç

σ̂2iσ̂2j

Var(r̂ij) ≈�

4σ̂4iσ̂4j

Var(σ̂ij) + σ̂2ijσ̂4j

Var(σ̂2i) + σ̂2

ijσ̂4i

Var(σ̂2j)

− 4σ̂ijσ̂2iσ̂4j

Cov(σ̂ij, σ̂2i)− 4σ̂ijσ̂

4iσ̂2j

Cov(σ̂ij, σ̂2j)

+ 2σ̂2ijσ̂2iσ̂2j

Cov(σ̂2i, σ̂2

j)

�

/�

4σ̂6iσ̂6j

�

K. M. | 8 / 12


Approximation for r̂ij

Let ΣΣΣ = LL′ and θθθ = vech(L)

For many replicates

– Sample θ̃θθ ∼ N(θ̂θθ,H(θ̂θθ)−1)

– Construct L̃ from θ̃θθ– Calculate Σ̃ΣΣ = L̃L̃

′

– Calculate correlation r̃ij = σ̃ij/Ç

σ̃2iσ̃2j

Evaluate Var(r̂ij) as emprical variance of r̃ij acrossreplicates

K. M. | 9 / 12


Distribution of r̂G12 - bEmpirical

0.5 0.6 0.7 0.8 0.9 1.0Correlation

Approximate

0.5 0.6 0.7 0.8 0.9 1.0Correlation

REML Empirical Approxim.r̂G12 0.897 0.873 0.866s.e. 0.059 0.066 0.063

K. M. | 10 / 12


Distribution of second eigenvalueEmpirical

20 30 40Eigenvalue

Approximate

20 30 40Eigenvalue

REML Empirical Approxim.

λ̂2 32.93 33.25 33.84s.e. – 3.27 3.30

K. M. | 11 / 12

Sampling standard errors | Results | Conclusions

Conclusions

Sampling from MVN distribution

– accommodates arbitrary functions– yields good approximation of sampling variances– easier than Delta method for complicated derivatives– more appropriate confidence interval at boundary of

parameter space– but:

−→ relies on large sample theory−→ information matrix needs to be safely p.d.−→ assumes θ̂θθ ≈ θθθ

Simple but useful addition to our toolkit

– implemented in WOMBAT

K. M. | 12 / 12

Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Science

sampling covariances

analysis distribution

empirical reml reml

distribution n

new scheme sample

h1 information matrix

nonlinear functions

boundary conditions