S-1 Analytical and Bioanalytical Chemistry Electronic Supplementary Material An international assessment of the metrological equivalence of higher-order measurement services for creatinine in serum Johanna E. Camara , Katrice A. Lippa, David L. Duewer, Hugo Gasca-Aragon, Blaza Toman
12
Embed
Analytical and Bioanalytical Chemistry Electronic ...10.1007/s00216-012-5869...Analytical and Bioanalytical Chemistry Electronic Supplementary Material ... i may not be available in
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
S-1
Analytical and Bioanalytical Chemistry
Electronic Supplementary Material
An international assessment of the
metrological equivalence of higher-order measurement services for
creatinine in serum
Johanna E. Camara , Katrice A. Lippa, David L. Duewer, Hugo Gasca-Aragon, Blaza Toman
S-2
Repeatability Measurements. Table S1 lists the repeatability measurements for the CCQM-K80
study. All values are formally expressed in arbitrary units.
Variance Components and Uncertainty Estimation. The sources of measurement variability
(instrumental, sample preparation and between-campaign) can be described using the
experimental design in Figure 1 as:
2
r,~ ,iijkijiijkl NR [S1]
where i indexes materials, j indexes units, k indexes independent aliquots per unit, l indexes
independent replicates per aliquot, “~ N” indicates “is distributed as Normal (i.e., Gaussian) with
the specified mean and variance,” μi respresents the unknowable true value of the measurand in
the material (i.e., creatinine in serum), γij are between-unit differences, δijk are between-aliquot
differences, and ζr,i is the true instrumental repeatability and is assumed to be the same across all
units. The γij are assumed to be
2c,0~ ,iij N [S2]
where ζc,i reflects the true between-campaign variability for the material. The δijk are assumed to
be
2
a,0~ ,iiijk N [S3]
where ζa,i reflects the true within-unit (between-aliquot) material variability and is assumed to be
the same for all units of a given material. While potentially over-simplified, this model with
these assumptions is likely fit-for-purpose given that HOVAMs are designed to be stable and
S-3
homogenous. Use of more complex models, e.g., allowing ζr,i or ζa,i to vary between units,
generally will require more data in order to provide reliable estimates.
For completely balanced measurement designs, an appropriate estimate for μi is the grand mean,
Ri, of all the measurements for each material. The usual anova-type estimate of the standard
uncertainty of Ri is:
rac
2r,
2a,r
2cra
nnn
nnnRu
ii,i
i
[S4]
where nc is the number of campaigns, na is the number of aliquots taken from the unit used in
each campaign, and nr is the number of replicates of each aliquot.
In CCQM-K80, nc and nr are always 2 and na is either 2 or 3. Given that these serum-based
materials may not be of uniform composition throughout a given unit, analyzing duplicate
aliquots (or triplicate in the case of LGC materials) within each unit as well as evaluating at least
two separate units can help ensure a representative assessment of each material. Hence, this
necessitates the use of the three-level nested measurement design as depicted in Figure 1 in main
text.
Estimates for the variance components, symbolized ic, , ia, , and ir, , can be obtained using
linear mixed model analysis systems such as the SAS MIXED [1] and the R “lmer” [2]
procedures. The variance component and standard uncertainty estimates are listed in Table S2.
All non-zero estimates of relative ,ir were pooled and resulted in a value of 0.99 %, which
estimates the instrumental sources of variance. There are no significant differences in the
S-4
estimates based on duplicate analyses of two aliquots per unit versus those on duplicates of three
aliquots. Since many of the ,ia are estimated as zero, the 0.68 % pooled relative standard
deviation provides only a worst-case bound on the aliquot preparation variance. Since many of
the ,ic are likewise estimated as zero, the 0.77 % pooled relative standard deviation also
provides only a worst-case bound on the sample preparation variance sources.
Estimating 95 % level of confidence coverage intervals U95(Ri) for each Ri from a small number
of measurements can be much more complicated than multiplying the standard uncertainties
u(Ri) obtained from equation [S4] by a coverage factor of 2. The approach used herein is to first
expand the standard uncertainties into U95(Ri) form and then revert to “large sample standard
uncertainties,” by dividing by a factor of 2. We refer to this quantity as u∞(Ri):
u∞(Ri) = U95(Ri)/2 [S5]
The u∞(Ri) estimates provide the same statistical interpretation as the U95(Vi)/2 in equation 3 and
thus provide a consistent description for both the assigned values, V, and the measurements
responses, R.
Two methods were used to estimate the U95(Ri): classical long-term frequency (“frequentist”)
expansion and constrained empirical Bayesian analysis. The frequentist method recommended
in the GUM [3] expands the standard uncertainty u(Ri) using an appropriate Student’s t coverage
factor to generate an expanded uncertainty:
ivi RutRUi %,95f95 [S6]
where vi are the degrees of freedom. Depending on the relative magnitude of the estimated
S-5
variance components r, a, and c, the vi for the CCQM-K80 materials range from 7 to 1. This
generates two-tailed ivt %,95 that range from 2.4 to 12.7, the latter of which translates into
unrealistically large U95(Ri)f values. The vi and U95(Ri)f values are listed in Table S2.
Bayesian analysis is based on a somewhat different definition of probability than the frequentist
interpretation that underpins classical statistical inference. Under the Bayesian paradigm,
parameters such as the measurand value and variance components have probability distributions
that quantify our knowledge about them. The estimation process starts with quantification of
prior knowledge about the parameters followed by specification of the statistical model that
relates the parameters to the data. The statistical model is called a likelihood function and is the
same as described in Equation S1. The components of these models are combined via Bayes
Theorem to obtain posterior distributions for the parameters. These distributions update our
knowledge about the parameters based on the evidence provided by the data. This analysis can
produce a probability distribution for each of the i (the true value of measurand quantity
estimated by the measurement mean, Ri) which encompasses all of the information and
variability present in the data but is confined by bounds based on prior knowledge. The process
yields a probability interval which is interpretable as an uncertainty interval as defined in JCGM
101:2008 “Evaluation of measurement data — Supplement 1 to the GUM — Propagation of
distributions using a Monte Carlo method (GUM Supplement 1)” [4].
While the probability distributions for the μi may not be available in closed form, with these
priors and using empirical Bayesian methods, it is possible to estimate the desired coverage
intervals, U95(Ri)B. Software systems suitable for computing the intervals, such as WinBUGS
S-6
[5], are freely available and (relatively) easy-to-use. The U95(Ri)B are listed in Table S2. The
U95(Ri)B are smaller than the U95(Ri)f for materials associated with one degree of freedom and
tend to be somewhat larger than the U95(Ri)f for materials associated with more than one degree
of freedom.
As discussed in the main text, “Leave-one-out” (LOO) cross-validation was used if the GDR
function was strongly influenced by materials having relatively small U95(Vi) and/or U95(Ri) or
very low or very high {Vi, Ri}. LOO is an established and routine tool for evaluating the
predictive utility of a model [6] and is an efficient, if empirical approach to establishing which, if
any, materials are sufficiently influential to distort the consensus estimation of the GDR
function. Figure S1 compares the “exact” uncertainty-scaled distances, i, calculated using all
materials with the LOO-estimated i for each material. Circles with crosses that are not
substantially on the diagonal line indicate materials that strongly influence the GDR. Circles
outside this square are potentially anomalous. Only materials B-1 and E-1 appear either
anomalous or strongly influential using the u∞(Ri)f. None of the materials appear particularly
problematic when the u∞(Ri)B are used. This reflects the somewhat more realistic estimates of
repeatability measurement provided by the Bayesian procedure.
After careful consideration, it was decided by NIST to exclude material “E-1” from the
estimation of the GDR parameters on the grounds that 1) U95(Vi) is suspiciously small, 2) the
process used to establish the U95(Vi) does not reflect their current practice, and 3) this CRM is
completely sold out and no longer available for re-evaluation, let alone sale. After E-1’s
exclusion from the GDR model, none of the remaining 16 materials were anomalous or
influential with either set of repeatability measurement uncertainties.
S-7
Degrees of Equivalence. The degree of equivalence for a given material expressed as a relative
percent, %di, can be estimated from the signed orthogonal distance of the certified and measured
values (Vi, Ri) relative to the estimates provided from the GDR analysis (V , R , , ):
2ˆˆ
ˆˆˆˆSIGN100%
22
ii
iii
iiRV
RRVVVVd [S7]
The function SIGN returns the sign (±1) of its argument and defines whether the observed {V,
Ri} pair is “above” or “below” the GDR function. The measurement-related terms are
transformed to have the same scale as the assigned values.
Given the covariances among the and parameters of the GDR function and the
iiii RuRVuV , used in their estimation, the expanded uncertainties for the %di,
U95(%di), were estimated via a parametric bootstrap Monte Carlo (PBMC) approach [7]. For a
sufficiently large number of samples from the bootstrap analysis and assuming an approximately
symmetric distribution, the percentiles of the empirical distribution of %di provide the desired
a Results for Aliquot3 were obtained using the “routine” sample preparation protocol used for all other materials; the Aliquot1 and Aliquot2 results for these three materials are for aliquots prepared using the certificate-specified minimum sample volume.
b Results adjusted for the measured fill-weights of each unit.
S-10
Table S2. Variance components, means and standard uncertainties together with the 95 % level of confidence intervals from both a frequentist-type expansion and a Bayesian estimation
Materials Numbers a Variance Components b Means and Uncertainties c
NMI Label Code Nt Nr Na Nc r a c R u(R) v U95(R)f U95(R)B
a Nr is the number of replicates per aliquot, Na is the number of aliquots per campaign, and Nc is the number of campaigns. Nt is the total number of
measurements, which is equal to Nr × Na × Nc. b , , and are the estimated between-replicate, between-aliquot, and between-campaign components of variance expressed as standard deviations. c R is the mean of the measured creatinine responses, u(R) the standard uncertainty for R, v the number of degrees of freedom associated with u(R), U95(R)f, a
frequentist 95 % confidence estimate for R, and U95(R)B a Bayesian 95 % confidence estimate for R. d Estimated using only the Aliquot1 and Aliquot2 measurements.
r a c
S-11
Fig. S1. “Leave One Out” (LOO) Analysis in the Identification of Potentially Anomalous,
Strongly Influential Materials.
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
"Exact" Uncertainty-Weighted Distance, ε i
LO
O U
nce
rtai
nty
-Wei
gh
ted
Dis
tan
ce, ε i
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
"Exact" Uncertainty-Weighted Distance, ε i
LO
O U
nce
rtai
nty
-Wei
ghte
d D
ista
nce
, ε i
E-1
B-1
E-1
B-1
The panel to the left presents results using the frequentist u∞(Ri)f, the panel to the right presents results using the Bayesian u∞(Ri)B. Each open circle represents estimates for a particular material; the cross represents the estimated 95 % level of confidence intervals on the estimates. The red lines bound the distances expected for materials compatible with the GDR function with a 95 % level of confidence.
S-12
References
1. SAS/STAT 9.2 User’s Guide (2008) SAS Institute Inc. Cary, NC USA.
2. Bates D, Maechler M (2009) lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-32 http://cran.us.r-project.org/web/packages/lme4/
3. JCGM 100:2008 (2008) Guide to the Expression of Uncertainty in Measurement (GUM). BIPM, Sèvres, France. http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.
4. JCGM 101:2008 (2008) Evaluation of measurement data — Supplement 1 to the “Guide to the expression of uncertainty in measurement” — Propagation of distributions using a Monte Carlo method. BIPM, Sèvres, France. http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf.
5. Lunn, DJ, Thomas, A, Best, N, Spiegelhalter, D (2000) WinBUGS -- a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 10:325-337
6. Picard RR, Cook RD (1984) Cross-validation of regression models. J Am Stat Assoc 79(387):575-583
7. Duewer DL, Kowalski BR, Fasching JL. Improving reliability of factor-analysis of chemical data by utilizing measured analytical uncertainty (1976) Anal Chem 48:2002-10