Top Banner
G under Drift, Comparisons of G, Dimensionality of G
41

G under Drift, Comparisons of G, Dimensionality of G.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: G under Drift, Comparisons of G, Dimensionality of G.

G under Drift, Comparisons of G, Dimensionality of G

Page 2: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Drift and G

Recall there is a simple expression for the expectedchange in G given drift:

Complications: Can be considerable variation aboutthe expected value of G!

Complications: If nonadditive variances are present,expected change in G complex

Page 3: G under Drift, Comparisons of G, Dimensionality of G.

High variance about E[G] under inbreeding

Phillips, Whitlock, and Fowler (2001) examined 52 lines of Drosophila melanogaster that had been inbred for one generation of brother-sister mating and then expanded to a large population size by two generations of random mating. They estimated G for all 52 lines using 6 wing traits

The Mean G (the average over all 52 lines) was quiteconsistent with the theory, showing a proportionalfeduction (eigenvectors unchanged, eigenvalues reduced)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

95% confidence ellipsoid for 2 traits

Major axes

G from outbred population

Average G over inbreds

Page 4: G under Drift, Comparisons of G, Dimensionality of G.

While the MEAN G agreed with theory, theirwas massive variation among the particularrelations.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture. Major axis

Page 5: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: G under Drift, Comparisons of G, Dimensionality of G.

Example of hidden pleiotropy in their data

Below are plots for two sets of traits, both of whichhad a genetic covariance of zero in the outbred pop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

little spread in the major axes -- traits still uncorrelatedamong the samples

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Significant spread in the major axes -- traits positively andnegatively correlated amongvarious sampled G

Page 7: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Changes in G under nonadditive variance

When nonadditive (dominance, epistasis) variancepresent, genetic variances can actually increase(for a time) under inbreeding.

Simplest example is with Additive x additive epistasis:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.Hence

Here G is not proportionately changing with f.

Page 8: G under Drift, Comparisons of G, Dimensionality of G.

Expected Eigenstructure of G under drift

Griswold et al. showed that the distribution ofeigenvalues is highly non-uniform under a driftand mutation model, showing close to an exponentialDistribution.

This skewing arises for genealogical reasons: drift imposes a dependence structure on the alleles in the sample due to shared common ancestry, and this in turn results in the distribution of the eigenvalues of G being highly nonuniform.

Page 9: G under Drift, Comparisons of G, Dimensionality of G.

General issues in comparing G

One immediate issue with G is that it is typicallynot estimated as a product-moment covariance

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

P can be directlyestimated

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.(co)variances of breedingvalues, on the other hand,are indirectly estimated:

Less precision on individual elements of G than of P

Page 10: G under Drift, Comparisons of G, Dimensionality of G.

No guarantee that G is non-negative definite(contains no negative eigenvalues)

If G contains at least one negative eigenvalue,then there is some combination of traitssuch that Var( aigi) < 0

Indeed (Hill and Thompson) the probability of anegative eigenvalue for a sample covariance matrixis very high

Estimated G may not be a covariance matrix

Page 11: G under Drift, Comparisons of G, Dimensionality of G.

Power:

The basic sampling unit when constructing G is thefamily.

Thus, sample size power is more a function of number of families vs. number of individuals

Further, distribution theory for tests of matrixdifferences usually built around product-moment,as opposed to varaince-component, matrices

Page 12: G under Drift, Comparisons of G, Dimensionality of G.

“Robust” statistical approaches

Many matrix comparison approaches use “robust”methods (some of these are called distribution-freeor nonparametric)

Most powerful: Randomization tests. Basic idea(due to Fisher) is to construct our test statisticfor matrix difference on the original sample

One then randomizes the independent samplingunit (unrelated families) over groups and recomputesthe test statistic. This is done thousands of timesgenerating a distribution under the null of equality

Suppose n of our N randomization test statistics aremore extreme than our sample. The p value forequality is just (n+1)/(N+1)

When one can identify independent sampling units,randomization tests are bullet-proof.

Page 13: G under Drift, Comparisons of G, Dimensionality of G.

While randomization can give p values, what aboutstandard errors and approximate confidence intervals?Two re-sampling procedures widely used: the Bootstrapand the jackknife. CARE must be taken when using these,as they do not always work!

The idea behind the bootstrap is that the sample itselfprovides information on the sample variance

Suppose n families are used in constructing G, and we wanta CI on det(G)

A bootstrap sample for G is obtained by samplingwith replacement from the original families to generatea sample of n families. G is constructed, det(G) found

The sample variance in det(G) in the bootstrap samplesis the estimate of the true sample variance, the lower2.5% and upper 97.5% values set the 95% CI

Page 14: G under Drift, Comparisons of G, Dimensionality of G.

Again, the sampling unit here is the family

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Jackknife methods also involve resampling. Hereone removes each observation and constructsn “pseudovalues”

Mean of the n pseudovalues is the jackknife estimateof the parameter, their variance is the estimate ofthe sample variance, and t-test are used forhypothesis testing given the estimate mean andvariance

Page 15: G under Drift, Comparisons of G, Dimensionality of G.

Comparing two G matrices

Most obvious approach is element-by element

One could use standard tests (or robust methods)to test, separately the equality of each element.

For example, are the heritabilties of trait one thesame in the two groups we are comparing? What abouttrait 2, etc.

Multiple comparison issues: is the collection of tests,as a whole, significant?

Example 32.1: Paulsen found 9 of 45 comparison of heritabilities between 2 species of buckeye butterflies significant at the 5% level. Is this a significant difference in G?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 16: G under Drift, Comparisons of G, Dimensionality of G.

What does it mean if matrices arenot “equal”

What does it mean if a test for equality fails?

A matrix is a complex geometric object, and twomatrices can be very similar geometrically, but not equal

Key: We really want to compare elements ofmatrix geometry.

Want to compare eigenstructure: do they share common eigenvectors, eigenvalues?

Page 17: G under Drift, Comparisons of G, Dimensionality of G.

AIdentical

BProportional

DSame Scaling

Different Orientation

CSame Orientation Different Scaling

Target MatrixFor Comparison

Page 18: G under Drift, Comparisons of G, Dimensionality of G.

Random Skewers

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

One approach for comparing matrices is to compareThe similarities of response to random directions ofselection, R = G

The method of random skewers generates a largenumber of random (unit length) and then projectsthese through both G matrices (Ri = Gi) and thenmeasure the angle or distance between them

Page 19: G under Drift, Comparisons of G, Dimensionality of G.

Flury’s CPC

A more formal approach to compare aspectsof shared geometry is the method of CPC --common principal components, proposed byFlury

Idea is that there is a hierarchy of relatedness:at the bottom are unrelated, then share 1 PC, share 2 PC …, share all PCs, proportionate, and finally equal

Page 20: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

LR tests used. One can step-jump, jump-up orUse model comparison (AIC)

Page 21: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Step-up

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Jump-up

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Model comparison (smallest AIC)

Page 22: G under Drift, Comparisons of G, Dimensionality of G.

Krzanowski Subspace comparison

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Two matrices may have most of their variation in thesame subspace, but have very different eigenvectors

CPC would score this as no relationship, but thereclearly is when sets, as opposed to individual, PCs used

Krzanowski proposed a method for comparison of suchsubspaces

Compute a subspace projection matrix B for eachmatrix by taking the first k < n/2 eigenvectors

Page 23: G under Drift, Comparisons of G, Dimensionality of G.

Next, compute the matrixQuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 24: G under Drift, Comparisons of G, Dimensionality of G.

Dimensionality of G

A final issue in the comparison of G is its rank(number of positive eigenvalues)

As mentioned, estimated G matrices are expectedto be of less than full rank

Also, much of the variation in G often concentratedin the first few PCs

Leads to the related issue of reduced-rank estimatesof G

Page 25: G under Drift, Comparisons of G, Dimensionality of G.

Estimation of Eigenvalues

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Leading eigenvalues tend to be overestimated, minoreigenvalues underestimated

This arises because fitting the first eigenvectortrys to account for as much variation as possible,leading to overfitting. This is made-up for byunderfitting of minor eigenvalues.

Example: suppose 6 x 6 with as below

Page 26: G under Drift, Comparisons of G, Dimensionality of G.

Bias with bootstrap CI for eigenvalues

One approach to estimating rank is to compute theconfidence intervals of the eigenvalues, and thendeclare rank as the number of such intervalsexcluding zero.

Bootstrapping has been used to estimate these CIs.

Here, a bootstrap sample is generated by resamplingfamilies with replacement, and the eigenvalues computedFor this sample

Assigning these eigenvalues and repeating this inthousands of bootstrap samples generates approximateCIs

Page 27: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Problem: Rank ordering generates CIs that are too small.

Better approach: Projection

The problem arises in the ASSIGNMENT of eigenvalues.eigenvectors are fairly unstable over bootstrap samples,so one approach is to use rank-ordering to assign: thelargest in the sample corresponds to 1, the next to 2, etc

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Obtain the bootstrap estimate for the ith eigenvalueby projecting the sample G onto the ith original eigenvector

Page 28: G under Drift, Comparisons of G, Dimensionality of G.

A cleaner bootstrap estimate of rank issimply to look at the distribution of rank in the samples

Again, generate a set of bootstrap samplesG1, G2 , …, , G5000. This generates 5000bootstrap values for rank, from which appropriate Cis and sampling variancescan be obtained.

Page 29: G under Drift, Comparisons of G, Dimensionality of G.

Diagonalization of Sample G

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Balanced one-way ANOVA/MANOVA

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Recall that diagonalization of a matrix decomposesit into a diagonal matrix of eigenvalues and matrices of associated eigenvectors,

Such a diagonalization for the sample estimate of Gprovides insight into how negative eigenvalues occur,offers a test for dimension, and generates areduced-rank estimate of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 30: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

For G estimated by one-way MANOVA, negativeeigenvalues arise from eigenvalues of Q less than one

Hence, one test for rank is the number of eigenvaluesof Q that are significantly greater than one (mucheasier to test than significantly greater than zero).

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Amemiya’s LR test:

Page 31: G under Drift, Comparisons of G, Dimensionality of G.

Amemiya’s reduced-rank estimator of G

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 32: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Hence, we can write the estimate of G as positive-definite part (eigenvalues of Q > 1) and apart with all zero or negative eigenvalues

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

The first part is a reduced-rank estimator of G,

We can choose k as the number of eigenvalues ofQ observed to be > 1. This is equivalent to settingany negative eigenvalues of the estimated G to zero

Page 33: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Alternatively, the number of included dimensionssan be less than the observed rank of G.

For example, k could be the number of statisticallysupported eigenvalues of Q > 1

Likewise, we might fix the number of dimensions toconsider (since majority of variation often restrictedto the first few PCs

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Problem with last: It has a bias that does not decrease with increasing sample size

Page 34: G under Drift, Comparisons of G, Dimensionality of G.

Factor-analytic modeling: Direct estimationof PCs of G

Kirkpatrick and Meyer proposed that instead ofestimating G first and extracting the few feweigenvalue, instead one could estimate the eigenvalues directly (without going through anestimated G)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Motivation: Consider the spectral decompositionof a matrix A,

Setting QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.We can write this as

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Note that fi is the direction of PCi, while thesquare of its length is the amount of variationit accounts for, as

Page 35: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Thus, if we can directly estimate the first k PCs,a reduced-rank estimator of G, guaranteed to bepositive-definite, is given by

Question: How do we directly estimate the fi?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.Consider the animal model,

One can write the vector of breeding values ai forindividual a as the sum of the contributions overfj,

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.Hence, the animal model becomes

Page 36: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Hence, one can start by estimating the first PC,

Where i,1 is individual i’s breeding value for thefirst PC.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

One can test for improvement of fit over the no PCmodel, and if the first is significant, move to a modelwith 2 PCs,

Here i,2 is i’s BV for PC 2. This is repeated untilthe addition of new factors does not improve fit,

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

If k factors are fitted, the reduced-rank estimate becomes

Page 37: G under Drift, Comparisons of G, Dimensionality of G.

Caution! When using this approach. The factors changevalues as ones are added, and the eigenvalue for the lastPC estimated is typically estimated with bias.

It can happen that “PC1” is really some other PC, and onlyby the addition of extra factors does this become clear.

“step-down” approach best -- start with m factors andsee if there is a significant loss of fit moving to m-1.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Another check is to make sure that the traceis well-behaved,

Page 38: G under Drift, Comparisons of G, Dimensionality of G.

Other eigenvalue-based measures of matrixdimension

Rank (number of positive eigenvalues) is in somesense a crude measure of dimension

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Measures of the dispersion of the eigenvalues attempt to quantity this, as the more spread among the i, the more potential constraints.

Page 39: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Wagner considered a measure based on the varianceof the eigenvalues of the correlation matrix,

Motivation is that for a diagonal matrix, all eigenvaluesare one, hence zero variance and dim = n

With only a single positive eigenvalue, var = n-1, and dim =1

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Kirkpatrick: Mean-standardize the traits, and then lookat the fraction of variation accounted for by PC1,

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Motivation: Total variance can be written asWith this metric, Kirkpatrick found that (for thetraits he examined), all had effective dimensionsless than 2!

Page 40: G under Drift, Comparisons of G, Dimensionality of G.
Page 41: G under Drift, Comparisons of G, Dimensionality of G.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.