7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
1/75
INTERNATIONAL UNION OF PURE AND APPLIED CHEMISTRY1
ANALYTICAL CHEMISTRY DIVISION*2
INTERDIVISIONAL WORKING PARTY FOR HARMONIZATION3
OF QUALITY ASSURANCE SCHEMES4
5
COOPERATION ON INTERNATIONAL TRACEABILITY6
IN ANALYTICAL CHEMISTRY (CITAC)7
8
IUPAC/CITAC GUIDE9
SELECTION AND USE OF PROFICIENCY TESTING SCHEMES10
FOR A LIMITED NUMBER OF PARTICIPANTS 11
CHEMICAL ANALYTICAL LABORATORIES12
13
(IUPAC Technical Report)14
15
Prepared for publication by16
ILYA KUSELMAN1,AND ALE FAJGELJ2171The National Physical Laboratory of Israel, Givat Ram, Jerusalem 91904, Israel;18
2International Atomic Energy Agency, Wagramer Strasse 5, P.O.Box 100, Vienna19
A-1400, Austria20
21
Corresponding author: e-mail: [email protected]
23
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
2/75
2
*Membership of the Analytical Chemistry Division during the final preparation of this1
report was as follows:2
President:A. Fajgelj (IAEA); Vice-President: W. Lund (Norway);Past-President:3
R. Lobinski (France); Secretary: D.B. Hibbert (Australia); Titular Members:4
M.F. Cames (Portugal); Z. Chai (China); P. De Bivre (Belgium); J. Labuda5
(Slovakia); Z. Mester (Canada); S. Motomizu (Japan); Associate Members: P. De6
Zorzi (Italy); A. Felinger (Hungary); M. Jarosz (Poland); D.E. Knox (USA);7
P.Minkkinen (Finland); P.M. Pingarrn (Spain); National Representatives: S.K.8
Aggarwal (India); R. Apak (Turkey); M.S. Iqbal (Pakistan); H. Kim (Korea); T.A.9
Maryutina (Russia); R.M. Smith (UK); N. Trendafilova (Bulgaria)10
11
Membership of the Task Group:12
Chairman:A. Fajgelj (IAEA);Members: I.Kuselman (Israel); M.Belli (Italy); S.L.R.13
Ellison (UK); U.Sansone (IAEA); W.Wegscheider (Austria)14
15
ACKNOWLEDGEMENTS16
The Task Group would like to thank P. Fisicaro (France) and M. Koch (Germany) for17
their data used and help in preparation of Examples 1 and 2, respectively, in Annex B18
of the Guide; H. Emons (IRMM) for helpful discussions; Springer, Heidelberg19
(www.springer.com) and the Royal Society of Chemistry, London (www.rsc.org) for20
permission to use material from the published papers cited in the Guide.21
22
23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
3/75
3
IUPAC/CITAC Guide1
Selection and Use of Proficiency Testing Schemes for a Limited2
Number of Participants Chemical Analytical Laboratories3
(IUPAC technical Report)4
5
Abstract:A metrological background for implementation of proficiency testing (PT)6
schemes for a limited number of participating laboratories (fewer than 30) is7
discussed. Such schemes should be based on the use of certified reference materials8
with traceable property values to serve as proficiency test items whose composition is9
unknown to the participants. It is shown that achieving quality of PT results in the10
framework of the concept tested once, accepted everywhere requires both11
metrological comparability and compatibility of these results.12
A possibility to assess collective/group performance of PT participants by13
comparison of the PT consensus value (mean or median of the PT results) with the14
certified value of the test items, is analyzed. Tabulated criteria for this assessment are15
proposed.16
Practical examples are described for illustration of the issues discussed.17
18
Keywords: proficiency testing, sample size, metrological traceability, measurement19
uncertainty, metrological comparability and compatibility20
21
22
23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
4/75
4
ABBREVIATIONS AND SYMBOLS1
2
A- critical value for numbersN+ and/orN-3
AAS - atomic absorption spectrometry4
ai- empirical sensitivity coefficient of the i-th component5
AN- acid number6
AS adequacy score7
- probability equivalent to the area under the tail/s of a distribution;8
bcf- buoyancy correction factor9
- probability of type 2 error10
c1, c2 measurement/test results corresponding to the crossing points of two11
probability density functions12
ccert certified (assigned) value of a particular property of a CRM13
ci measurement/test result of i-th laboratory participating in PT14cis value of a particular property of routine samples15
CP- criterion power16
cPT- population (theoretical) mean of PT results17
cPT/avg observed/experimental mean of PT results (consensus value)18
CRM certified reference material19
- ratiocert/PT20
- permissible bias ofMPTfrom ccert21
and - parameters22
EMD - Ecole des Mines de Douai23
F- frequencyof a c-value24
f- probability density function25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
5/75
5
GC-MS gas chromatography-mass spectroscopy1
GF-AAS graphite furnace-atomic absorption spectrometry2
H0- null hypothesis3
H1- alternative hypothesis4
hand - hand preparation of a sample5
HPLC high performance liquid chromatography6
i, j, n index numbers7
ICP-MS - inductively coupled plasma mass spectroscopy8
ICP-OES inductively coupled plasma-optical emission spectroscopy9
ID-ICP-MS isotope dilution-inductively coupled plasma-mass spectrometry10
IHRM in-house reference material11
INPL National Physical Laboratory of Israel12
ISO International Organization for Standardization13
K kelvin14
LNE - Laboratoire National de Mtrologie et dEssais15
MCL - maximum contaminant level16
32OAsm - mass of a sample of arsenic oxide17
mdil- mass of the diluted solution (a sample)18
mdil/t total mass of the diluted solution19
mlot- total mass of final lot20
MPT population median of PT results21
mss - mass of the stock solution (a sample)22
mss/t- total mass of the stock solution23
N-- number of PT results ci < ccert- 24
N size of the a statistical sample of measurement results of PT participants25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
6/75
6
N* - number of potentiometric titration results1
N+- number of PT results ci > ccert+ 2
NIST SRM standard (certified) reference material developed by the National3
Institute of Standards and Technology, USA4
NMR nuclear magnetic resonance5
Np - size of the population of PT participants6
P probability7
pc purity of chemicals8
Pe- probability of an event9
pH-metr. pH-metric method10
Pot. titr. potentiometric titration11
PT proficiency testing12
32/ OAsAsp - proportion of atomic weights of As and As2O313
- symbol of multiplication14
Qest questionable15
RAN limit of a difference between two results ofANdetermination (range)16
Ri-ratio of the min to the max values from two concentrations17
RL reference laboratory18
lot density of a lot of an aqueous IHRM19
s observed sample standard deviation20
SADCMET - Southern African Cooperation in Measurement Traceability21
sbsiand sisi- between-sample and intra-sample standard deviations22
SI International System of Units;23
sPT- observed sample standard deviation of PT results24
PT- population standard deviation of PT results25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
7/75
7
PT/av - standard deviation of the sample mean cPT/avof PT results1
targ- target standard deviation of PT results2
t1-/2- percentile of the one-tailed Students distribution at level of confidence 1-/23
TP - test power4
u(ci)andU(ci) - standard and expanded uncertainties of ci, respectively5
ucertand Ucert- standard and expanded uncertainty of ccert, respectively6
ucomb combined standard uncertainty7
umLP- standard measurement uncertainty declared by a laboratory participating in PT8
umRL standard measurement uncertainty declared by the reference laboratory9
USN - ultrasonic nebulization10
UV ultraviolet11
vibr sample preparation with a vibrating table12
VIM3 International Vocabulary of Metrology; 3rded.13
xj-normalized value of the j-th PT result14
2{,N-1} - 100percentile of the 2distribution atN 1 degrees of freedom15
- function of normalized normal distributionfunction16
(xj) - value of the function of the normalized normal distribution forxj17
fraction of the statistical sample of sizeNfrom the population of sizeNp18
2
empirical value of the Cramer-von-Mises criterion19
z, andEn- scores for assessment of proficiency of a laboratory participating in PT20
21
22
23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
8/75
8
CONTENTS1
1. INTRODUCTION2
1.1. Scope and field of application3
1.2. Terminology4
2. APPROACH5
2.1. Properties of PT consensus values: dependence on the statistical sample size6
2.2. Measurement uncertainty use for interpretation of PT results7
2.3. What is a metrological approach to PT?8
3. VALUE ASSIGNMENT9
3.1. Metrological traceability of a CRM property value and of PT results10
3.1.1. Commutability of the CRMs and routine samples11
3.1.2. Three scenarios12
3.2. Scenario I: Use of adequate CRM13
3.3. Scenario II: No closely matched CRMs14
3.4. Scenario III: Appropriate CRMs are not available15
4. INDIVIDUAL LABORATORY PERFORMANCE EVALUATION AND16
SCORING17
4.1. Single (external) criterion for all laboratories participated in a PT18
4.2. Own criterion for every laboratory19
5. METROLOGICAL COMPARABILITY & COMPATIBILITY OF PT RESULTS20
6. EFFECT OF SMALL LABORATORY POPULATION ON SAMPLE21
ESTIMATES22
7. OUTLIERS23
8. EFFECTIVENESS OF APPROACHES TO PT24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
9/75
9
ANNEX A. CRITERIA FOR ASSESSMENT OF METROLOGICAL1
COMPATIBILITY OF PT RESULTS2
ANNEX B. EXAMPLES3
ANNEX C. REFERENCES4
5
6
1. INTRODUCTION7
The International Harmonized Protocol for the proficiency testing (PT) of analytical8
chemistry laboratories adopted by IUPAC in 1993 [1] was revised in 2006 [2].9
Statistical methods for use in PT [3] have been published as a complementary standard10
to ISO/IEC Guide 43, which describes PT schemes based on interlaboratory11
comparisons [4]. General requirements for PT are updated in the new standard [5].12
International Laboratory Accreditation Corporation (ILAC) Guidelines define13
requirements for the competence of PT providers [6]. Guidelines for PT use in specific14
sectors, like clinical laboratories, have also been widely available [7]. In some other15
sectors they are under development.16
These documents are, however, oriented mostly towards PT schemes for a17
relatively large number Nof laboratories or participants (greater than or equal to 30),18
henceforth referred to as "large schemes". This is important from a statistical point of19
view, since with Nbelow 30, evaluations by statistical methods become increasingly20
unreliable, especially for N< 20. For example, uncertainties in estimates of location21
(such as mean and median) are sufficiently small to be neglected in scoring as N22
increases to approximately 30, but cannot be neglected safely withN < 20. Deviations23
from normal distribution are harder to identify if Nis small. Robust statistics, too, are24
not usually recommended when N< 20. Therefore, the assigned/certified value of the25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
10/75
10
proficiency test items ccertcan not be calculated safely from the measurement results1
obtained by the participants (PT results) as a consensus value: its uncertainty becomes2
large enough to affect scores in "small schemes", that is, schemes with small numbers3
of participants (N< 20).4
Moreover, if the sizeNp of the population of laboratories participating in PT is not5
infinite, and the size of the statistical sample N is greater than 5 to 10 % of Np, the6
value of the sample fraction = N/Npmay need to be taken into account.7
Thus, implementation of small PT schemes is sometimes not a routine task. Such8
schemes are quite often required for quality assurance of environmental analysis9
specific for a local region, analysis of specific materials in an industry (e.g. under10
development), for purposes of a regulator or a laboratory accreditation body, etc. [8].11
12
1.1. Scope and field of application13
This Guide is developed for implementation of simultaneous participation schemes14when the number of laboratories is smaller than 30. This includes: 1) selection of a15
scheme based on simultaneous distribution of test items to participants for concurrent16
quantitative testing; 2) use of certified reference materials (CRMs) as test items17
unknown to the participants; 3) the individual laboratory performance assessment and18
assessment of the metrological comparability and compatibility of the measurement19
results of the laboratories taking part in the PT scheme as a collective (group) of the20
participants.21
The document is intended for PT providers and PT participants (chemical22
analytical laboratories), for accreditation bodies, laboratory customers, regulators,23
quality managers, metrologists and analysts.24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
11/75
11
1.2.Terminology1
Terminology used in this Guide corresponds to ISO standards 17043 [5] and 3534 [9],2
and ISO Guide 99 (VIM) [10].3
4
2. APPROACH5
2.1. Properties of PT consensus values: dependence on the statistical sample size6
The difference between the population parameters and the corresponding sample7
estimates increases with decreasing sample sizeN. In particular, a sample mean cPT/avg8
ofNPT results can differ from the population mean cPTby up to 1.96PT /Nwith9
95 % probability, 1.96 being the appropriate percentile of the normal distribution for a10
two-sided 95 % interval, and PT is the population standard deviation of the results.11
Dependence of the upper limit of the interval for the expected bias |cPT/avg- cPT| onN12
is shown (in units of PT) in Fig. 1, where the range N = 20 to 30 is indicated by the13
grey bar. Even forN= 30 the bias may reach 0.36PTat the 95 % level of confidence.14
Similarly, the sample standard deviation sPT is expected to be in the range15
PT [2{0.025,N1}/(N1)]1/2sPT PT [
2{0.975,N 1}/(N1)]1/2with probability16
of 95 %, where2{, N 1} is the 100 percentile of the 2 distribution at N 117
degrees of freedom. The dependence of the range limits for sPTonNis shown in Fig. 218
(again in PT
units), also with the range N = 20 to 30 marked by the grey bar. For19
example, for N = 30 the upper 95 % limit for sPTis 1.26PT. In other words, sPTcan20
differ from PT for N = 30 by over 25 % rel. at the level of confidence 0.95. For21
N< 30 the difference between the sample and the population characteristics increases22
with decreasingN, especially dramatically for the standard deviation whenN < 20.23
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
12/75
12
0.1
0.3
0.5
0.7
0.9
1.1
1.3
0 20 40 60 80 100
N
B
ias/PT
1
Fig. 1. Dependence of the upper limit of the bias |cPT/avg- cPT| (in units of PT)on the2
numberNof PT results; reproduced from ref. [8] by permission of Springer. The line3
is the upper 97.5thpercentile, corresponding to the upper limit of the two-sided 95 %4
interval for the expected bias. The range of N = 20 to 30, intermediate between small5
and large sample sizes, is shown by the grey bar.6
7
While consensus mean values are less affected than observed standard deviations,8
uncertainties in consensus means are relatively large in small schemes, and will9
practically never meet the guidelines for unqualified scoring suggested in the IUPAC10
Harmonized Protocol [2] for cases when the uncertainties are negligible. It follows11
that scoring for small schemes should usually avoid simple consensus values.12
Methods of obtaining traceable assigned values ccertare to be used wherever possible13
to provide comparable PT results [11, 12].14
The high variability of dispersion estimates in small statistical samples has special15
implications for scoring based on observed participant standard deviation sPT. This16
practice is already not recommended even for large schemes [3], on the grounds that it17
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
13/75
13
does not provide consistent interpretation of scores from one round (or scheme) to the1
next. For small schemes, the variability of sPTmagnifies the problem.2
3
0.0
0.5
1.0
1.5
2.0
0 20 40 60 80 100
N
s
PT/PT
4
Fig. 2. Dependence of the sample standard deviation sPTlimits (in units of PT)on the5
number N of PT results; reproduced from ref. [8] by permission of Springer. Solid6
lines show 2.5th(lower line) and 97.5th(upper line) percentiles for sPT. The dashed line7
is at sPT/PT=1.0 for reference.The grey bar shows the range of intermediate sample8
sizes (N= 20 to 30).9
10
It follows that scores based on the observed participant standard deviation should11
not be applied in such a case. If a PT provider can set an external, fit-for purpose,12
normative or target standard deviation targ, then z-scores, which compare a result bias13
from the assigned value with targ, can be calculated in a small scheme in the same14
manner as recommended in refs. [1-5] for a large scheme. The condition is only that15
the standard uncertainty of the assigned/certified value ucert is insignificant in16
comparison to targ(ucert2
< 0.1targ2
).17
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
14/75
14
2.2. Measurement uncertainty use for interpretation of PT results1
When information necessary to set targis not available, and/or ucertis not negligible,2
the information, included in the measurement uncertainty u(ci)of the result cireported3
by the i-th laboratory, is helpful for performance assessment using zeta-scores and/or4
En numbers [2, 3]. It may also be important for a small scheme that laboratories5
working according to their own fitness-for-purpose criteria (for example, in conditions6
of competition) can be judged by individual criteria based on their declared7
measurement uncertainty values.8
9
2.3. What is a metrological approach to PT?10
The approach based on metrological traceability of an assigned value of test items,11
providing comparability of PT results, and on scoring PT results taking into account12
uncertainties of the assigned value and uncertainties of the measurement results, has13
been described as a "metrological approach" [13].14Two main steps are common for any PT scheme using this approach:15
1) establishment of a metrologically traceable assigned value, ccert, of analyte16
concentration in the test items/reference material and quantification of the standard17
uncertainty ucert of this value, including components arising from the material18
homogeneity and stability during the PT round, and 2) calculation of fitness-for-19
purpose performance statistics as well as assessment of the laboratory performance,20
taking into account the laboratory measurement uncertainty. For the second step it21
may be necessary in addition to take into account the small population size of22
laboratories able to take part in the PT. These issues are considered below.23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
15/75
15
3. VALUE ASSIGNMENT1
3.1. Metrological traceability of a CRM property value and of PT results2
Since the approach to PT for a limited numberNof participants is based on the use of3
CRMs as test items unknown to the participants, metrological traceability of a CRM4
property value is a key to understanding metrological comparability and compatibility5
of the PT results. Interrelations of these parameters are shown in Fig. 3.6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fig. 3. A scheme of calibration hierarchy, traceability and commutability (adequacy20
or match) of reference materials used for PT, comparability and compatibility of PT21
results; reproduced from ref. [16] by permission of Springer.22
23
The left pyramid in Fig. 3 illustrates the calibration hierarchy of CRMs as24
measurement standards or calibrators [10] ranked by increasing uncertainties of25
Uncertain
ty
Comparabi
lity
Traceabi
lity
Assigned value-measurement
result
SI unitskg K mol others
Primary CRM
NMIs
Secondary CRM
CRM producers
Working CRM/ IHRM
Testing labs and other users
Ref.meas.stand.
Ref.meas.stand.
CRM commutability
Compatibility
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
16/75
16
supplied property values from primary CRMs (mostly pure substances developed by1
National Metrology Institutes - NMIs), to secondary CRMs (e.g. a matrix CRM2
traceable to primary CRMs), and from secondary to working CRMs (certified in-3
house reference materials - IHRMs - developed by testing/analytical laboratories, PT4
providers and other users) [14,15]. When a CRM of a higher level is used for5
certification of a reference material of a lower level by comparing them (for example,6
for certification of IHRM), the first one plays the role of a reference measurement7
standard: shown in Fig. 3 by semicircular pointers. Since uncertainty of CRM8
property values is increasing in this way, the uncertainty pointer is directed from the9
top of the pyramid to the bottom.10
The same CRM can be used for calibration of a measurement system and for PT,11
i.e. for two different purposes: as a calibrator and as a quality control material (test12
items), but not at the same time, in the same measurement or in the same test [17].13
The right-side overturned pyramid in Fig. 3 shows traceability chains from a14
reference material certified value and the corresponding measurement/analysis/test15
results to SI units. As a rule, one result is to be traceable to the definition of its unit,16
while simultaneously there are several influence quantities which need also to be17
traceable to their own definition of units: to the mole of the analyte entities per mass18
of sample (i.e. for the concentrations in the calibration solutions), to the kilogram19
because a size of a sample under analysis is quantified by mass or volume, to the20
Kelvin when the temperature influences the results obtaining for the main quantity,21
etc. Thus, the traceability pointer has a direction which is opposite to the measurement22
uncertainty. Of course, the width of the overturned pyramid is not correlated with the23
uncertainty values, as the case is in the left-side pyramid.24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
17/75
17
Understanding traceability of measurement/analysis/test and PT results to the mole1
(realized through the chain of the CRMs according to their hierarchy) is often not2
simple and requires reliable information about the measurement uncertainty. The3
problem is that the uncertainty of analytical results may increase because of4
deviations of the chemical composition of the matrix CRM (used for calibration of the5
measurement system) from the chemical composition of the routine samples under6
analysis. Similarly, the difference between a certified value of the matrix reference7
material (applied in a PT as test items) and the result of a laboratory participating in8
the PT may increase when the CRM has a different chemical composition than the9
routine samples. This is known as the problem of CRM commutability - adequacy or10
match - to a sample under analysis [18], and is shown in Fig. 3 as an additional11
pointer above the uncertainty pointer. The commutability is discussed in the following12
paragraph 3.1.1, while the metrological comparability and compatibility pointers13
shown also in Fig. 3 in paragraph 5.14
15
3.1.1. Commutability of the CRMs and routine samples16
Since a difference in property values and matrices of CRM and of routine samples17
influences the measurement uncertainty in PT, the chemical composition of both, the18
measurement standard (the CRM used as test items) and the routine samples of the19
test object, should be as close as possible. Algorithm for a priori evaluation of CRMs20
adequacy can be based on the use of an adequacy score: AS % =100n
i
a
iiR , where 21
is the symbol of multiplication, i = 1, 2, , nis the number of a component or of a22
physico-chemical parameter; Ri= [min(ci,s, ci,cert)/max(ci,s, ci,cert)] is the ratio of the23
minimal to the maximal values from ci,sand ci,cert; ci,sand ci,cert are the concentrations24
of the i-th component or the values of the i-th physico-chemical parameter in the25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
18/75
18
sample and certified in the CRM, respectively; 0 ai1 is the empirical sensitivity1
coefficient which allows decreasing the influence of a component or a parameter on2
the score value, if the component or the parameter is less important for the analysis3
than others. According to this score, the ideal adequacy (AS= 100 %) is achieved4
when the composition and properties of the sample and of the RM coincide. The5
adequacy is absent (AS= 0 %) when the sample and the CRM are different substances6
or materials, and/or the analyte is absent in the CRM (ci,cert= 0). Intermediate cases,7
for example for two components under control, are shown in Fig. 4. The ratios R1and8
R2providing adequacy score valuesAS= 70, 80 and 90 %, form here curves 1, 2 and9
3, respectively.10
11
Fig. 4. Adequacy scoreASvalues in dependence on ratiosR1andR2of concentrations12
of two components in a sample under analysis and in a CRM; reproduced from ref.13
[16] by permission of Springer. Curves 1, 2 and 3 correspond to AS= 70, 80 and 9014
%, respectively. The dotted pointer shows the direction of the adequacy increasing.15
16
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
19/75
19
The adequacy score may be helpful for CRM choice as a calibrator since direct use1
of a CRM having a low adequacy score can lead to an incorrect/broken traceability2
chain. Such a CRM applied for PT will decrease the reliability of a laboratory3
performance assessment. Therefore, CRM commutability in PT and a score allowing4
its evaluation are also important. However, the adequacy score does not properly5
quantify the measurement uncertainty contribution caused by insufficient6
commutability (AS< 100 %). This requires a special study.7
More details ofAScalculations see in Annex B, Example 5.8
9
3.1.2. Three scenarios10
Thus, the task of value assignment is divided into the following three scenarios: I) an11
adequate matrix CRM with traceable property value is available for use as test items;12
II) available matrix CRMs are not directly applicable, but a CRM can be used in13
formulating a spiked material with traceable property values; III) only an IHRM with14
a limited traceability chain of the property value is available (for example, because15
instability of the material under analysis).16
17
3.2. Scenario I: Use of adequate CRM18
The ideal case is when the test items distributed among the laboratories participating19
in the PT are portions of a purchased adequate matrix CRM (primary or secondary20
measurement standard). However, when the CRMs available in the market are too21
expensive for direct use in PT in the capacity of test items, a corresponding IHRM22
(working measurement standard) is to be developed. Characterization of an IHRM23
with a property value traceable to the CRM value by comparison, and application of24
the IHRM for PT are described in refs. [3, 19-21]. The characterization can be25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
20/75
20
effectively carried out by analysis of the two materials in pairs, each pair consisting of1
one portion of the IHRM and one portion of the CRM. A pair is analyzed practically2
simultaneously, by the same analyst and method, in the same laboratory and3
conditions. According to this design, the analyte concentration in the IHRM under4
characterization is compared with the certified value of the CRM and is calculated5
using differences in results of the analyte determinations in the pairs. The standard6
uncertainty of the IHRM certified value is evaluated as a combination of the CRM7
standard uncertainty and of the differences' standard uncertainty (the standard8
deviation of the mean of the differences). The uncertainty of the IHRM certified value9
includes homogeneity uncertainties of both the CRM and the IHRM, since the10
differences in the results are caused not only by the measurement uncertainties, but11
also by fluctuations of the analyte concentrations in the test portions. When more than12
one unit of IHRM is prepared for PT, care still needs to be taken to include the IHRM13
between-unit homogeneity term in evaluating the uncertainty. Since, in this scenario,14
the CRM and IHRM have similar matrixes and close chemical compositions, at15
similar processing, packaging and transportation conditions their stability16
characteristics during PT are assumed to be identical unless there is information to the17
contrary. The CRM uncertainty forms a part of the IHRM uncertainty budget and is18
expected to include any necessary uncertainty related to stability, therefore no19
additional stability term is included in the IHRM uncertainty.20
The criterion of fitness-for-purpose uncertainty of the property value of a reference21
material applied for PT is formulated depending on the task. For example, for PT in the22
field of water analysis in Israel [22], expanded uncertainty valuesshould be negligible23
in comparison to the maximum contaminant level (MCL), i.e. the maximum24
permissible analyte concentration in water delivered to any user of the public water25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
21/75
21
system. In this example, the uncertainty was limited to 2ucert
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
22/75
22
A related scenario is based on traceable quantitative elemental analysis and1
qualitative information on purity/degradation of the analyte under characterization in2
the IHRM. For example, IHRMs for determination of inorganic polysulfides in water3
have been developed in this way [24]. The determination included the polysulfides4
derivatization with a methylation agent followed by GC-MS or HPLC analysis of the5
difunctionalized polysulfides. Therefore, the IHRMs were synthesized in the form of6
dimethylated polysulfides containing four to eight atoms of sulfur. Composition of the7
compounds was confirmed by NMR and by dependence of HPLC retention time of the8
dimethylpolysulfides on the number of sulfur atoms in the molecule. Stability of the9
IHRMs was studied by HPLC with UV detection. Total sulfur content was determined10
by the IHRMs oxidation with perchloric acid in high-pressure vessels (bombs),11
followed by determination of the formed sulfate using ICP-OES. IHRM certified12
values were traceable to NIST SRM 682 through the Anion Multi-Element Standard II13
from Merck (containing certified concentration of sulfate ions) that was used for the14
ICP-OES calibration, and to the SI kg, since all the test portions were quantified by15
weight.16
More detailed example see in Annex B, Example 2.17
18
3.4. Scenario III: Appropriate CRMs are not available19
This scenario can arise when a component or an impurity of an object/material under20
analysis is unstable, or the matrix is unstable, and no CRMs (primary or secondary21
measurement standards) are available. The proposed PT scheme for such a case is22
based on preparation of an individual sample of IHRM for every participant in the23
same conditions provided by a reference laboratory (RL), allowing the participant to24
start the measurement/test process immediately after the sample preparation. In this25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
23/75
23
scheme IHRM instability is not relevant as a source of measurement/test uncertainty,1
while intra- and between-samples inhomogeneity parameters are evaluated using the2
results of RL testing of the samples taken at the beginning, the middle and the end of3
the PT experiment. For example, such a PT scheme was used for concrete testing:4
more details see in Annex B, Example 3.5
6
4. INDIVIDUAL LABORATORY PERFORMANCE EVALUATION AND7
SCORING8
4.1. Single (external) criterion for all laboratories participated in a PT9
The present IUPAC Harmonized Protocol [2] recommends thatz-score values10
arg
-
t
certi
i
ccz
= ,11
are considered acceptable within 2, unacceptable with values outside 3, and12
questionable with intermediate values (the grounds for that are discussed thoroughly13
elsewhere [2]). This score provides the simplest and most direct answer to the14
question: Is the laboratory performing to the quantitative requirement (targ) set for15
the particular scheme? The laboratorys quoted uncertainty is not directly relevant to16
this particular question, so is not included in the score. Over the longer term, however,17
a laboratory will be scored poorly if its real (as opposed to estimated) uncertainty is18
too large for the job, whether the problem is caused by unacceptable bias or19
unacceptable variability. This scoring, based on an externally set value targ(without20
explicitly taking uncertainties of the assigned value and participant uncertainties into21
account), remains applicable to small schemes, provided that laboratories share a22
common purpose for which a single value of targcan be determined for each round.23
Examples ofthe targsetting andz-score use see in Annex B, Examples 1-2.24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
24/75
24
4.2. Own criterion for every laboratory1
Often, however, a small group of laboratories has sufficiently different requirements2
that a single criterion is not appropriate. It may then (as well as generally) be of3
interest to consider a somewhat different question about performance: Are the4
participants results consistent with their own quoted uncertainties? For this purpose,5
zeta() andEnnumber scores are appropriate. The scores are calculated as6
7
22
)(
-
certi
certi
i
ucu
cc
+
= and22
-
certi
certi
n
U)c(U
ccE
+
= ,8
9
where u(ci) and U(ci) are the standard and expanded uncertainties of the i-th10
participant result ci, respectively, Ucertis the expanded uncertainty of the certified (or11
otherwise assigned) value ccert.Zetascore values are typically interpreted in the same12
way asz-score values (see Annex B, Example 3).Ennumber differs fromzetascore in13
the use of expanded uncertainties and En values are usually considered acceptable14
within 1. The advantages of zetascoring are that i) it takes explicit account of the15
laboratorys reported uncertainty; ii) it provides feedback on both the laboratory result16
and on the laboratorys uncertainty estimation procedures. The main disadvantages17
are that i) it cannot be directly related to an independent criterion of fitness-for-18
purpose; ii) pessimistic uncertainty estimates lead to consistently good zeta scores19
irrespective of whether they are fit for a particular task; and iii) the PT provider has no20
way of checking that reported uncertainties are the same as those given to customers,21
although a customer or accreditation body is able to check this if necessary. The En22
number shares these characteristics, but adds two more. First, it additionally evaluates23
the laboratorys choice of coverage factor for converting standard to expanded24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
25/75
25
uncertainty. This is an advantage. Second, unless the confidence level is set in1
advance, Enis sensitive to the level of confidence chosen both by participant and by2
provider in calculating U(ci) and Ucert. It is obviously important to ensure consistency3
in the use of coverage factors ifEnnumbers are to be compared.4
It is clear that a single score cannot provide simultaneous information on whether5
laboratories meet external criteria (z-scores apply best here) and on whether they meet6
their own criteria (zetaorEnnumber apply best).7
8
5. METROLOGICAL COMPARABILITY & COMPATIBILITY OF PT RESULTS9
The meaning of metrological comparability of PT results is that being traceable to the10
same metrological reference, they are comparable independently of the result values11
and of the associated measurement uncertainties. Since scoring a laboratory12
proficiency in the discussed small PT schemes is based on evaluation of the bias13
ci c
certof i-th laboratory result c
ifrom the certified property value c
certof the test14
items, both PT results and the CRM certification (measurement) data should be15
comparable, i.e. traceable to the same metrological reference. The same is correct for16
different runs of the PT scheme, when laboratory score values obtained in these runs17
are compared. As much as metrological comparability is a consequence of18
metrological traceability, the comparability pointer in Fig. 3 is directed like the19
traceability one.20
Metrological compatibility can be interpreted for PT results as the property21
satisfied by each pair of PT results, so that the absolute value of the difference22
between them is smaller than some chosen multiple of the standard measurement23
uncertainty of that difference. Moreover, successful PT scoring means that the24
absolute value of the bias ci ccertis smaller than the corresponding chosen multiple25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
26/75
26
of the bias standard uncertainty. In other words, a PT result is successful when it is1
compatible with the CRM (test item) certified value. Therefore compatibility is shown2
in Fig. 3 by a horizontal pointer uniting the direct and the inversed pyramids.3
Thus, achieving the quality of measurement/analysis/test and PT results in the4
framework of the concept tested once, accepted everywhere [11, 25] requires both5
comparability and compatibility of the results.6
When PT is based on the metrological approach, there are two key parameters for7
assessment of comparability & compatibility of results [26]: 1) position of the CRM8
sent to the participants in the calibration hierarchy of measurement standards, and 2)9
closeness of the distribution of PT results to the distribution of the CRM data.10
The position of a CRM in the calibration hierarchy depends on the top11
measurement standard in the traceability chain. For example, if a CRM property value12
is traceable to SI units (by scenarios I and II), it confirms world-wide comparability of13
PT results. Any PT scheme based on the use of IHRM with a limited traceability14
chain of the property value (not traceable to SI units: scenario III) provides the15
possibility of confirming local comparability only. The same situation took place in16
the classical fields of mass and length measurements before the Convention of the17
Metre, when measurement results in different countries had been traceable to different18
national (local) measurement standards.19
At any traceability of the CRM property value used, the closeness of the20
distributions of the PT results and of the CRM data is important for the result21
compatibility and performance assessment. Since laboratory performance is assessed22
individually for each PT participant, even in a case when the performance of the23
majority of them is found to be successful, compatibility of all the PT results (i.e. a24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
27/75
27
group performance characteristic of the laboratories participating in PT) still remains1
unassessed.2
The situation is illustrated in Fig. 5, where both distribution density functions fof3
PT results (curve 1) and of CRM data (curve 2) are shown as normal ones. The vertical4
lines are the centers of these distributions: cPT and ccert, respectively. The common5
shaded area P under the density function curves is the probability of obtained PT6
results belonging to the population of the RM data. It can be considered as a parameter7
of compatibility. The value Ptends to zero when the difference between cPTand ccertis8
significantly larger than standard deviations PT and ucert of both distributions. The9
closer cPT is to ccert (shown by the semicircular pointers in Fig. 5), the higher the P10
value is.11
0.0
1.0
2.0
3.0
9.8 10.4 11.0 11.6 12.2 12.8 13.4C
fCPT Ccert
12
Fig. 5. Probability density functionsf of PT results, curve 1, and of CRM data, curve13
2; reproduced from ref [16] by permission of Springer. Vertical lines are the centers of14
these distributions: cPT and ccert, respectively. The common shaded area under the15
density function curves is the probability Pof obtained PT results belonging to the16
population of the CRM data. The semicircular pointers show the direction of the17
compatibility increasing.18
19
1
2P
c
fcPT ccert
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
28/75
28
The distributions, Pvalues, hypotheses necessary for assessment of compatibility of1
results of a limited number Nof PT participants, as a group, and suitable criteria for2
that based on analysis of the statistical sample characteristics (average cPT/avg, standard3
deviation sPT,etc.) are discussed in detail in Annex A.4
In principle, cPT/avg and sPT are the consensus values which cannot be used for a5
reliable assessment of an individual laboratory performance when the number of the6
laboratories participating in the PT scheme is limited. However, here the consensus7
values are used for another purpose: for comparison of PT results, as a statistical8
sample, with the CRM data (see Examples 1-4 in Annex B). The compatibility of PT9
results of a group of laboratories can be low if one or more laboratories from the group10
perform badly. Analysis of reasons leading to such a situation, as well as ways to11
correct it, are a task for the corresponding accreditation body and/or the regulator12
responsible for these laboratories and interested in the comparability & compatibility13
of the results.14
15
6. EFFECT OF SMALL LABORATORY POPULATION ON SAMPLE16
ESTIMATES17
The population of possible laboratory participants is not usually infinite. For example,18
the population size of possible PT participants in motor oil testing organized by the19
Israel Forum of Managers of Oil Laboratories was Np =12 only, while the statistical20
sample size, i.e. the number of the participants agreed to take part in the PT in21
different years was N= 6 to 10 (see Annex B, Example 4). In such cases the sample22
fraction = 6/12 to 10/12 = 0.5 to 0.8 (i.e. 50 to 80 %) is not negligible and23
corrections for finite population size are necessary in the statistical data analyses. The24
corrections include the standard deviation (standard uncertainty) of the sample mean25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
29/75
29
ofNPT results cPT/av,equal toPT/av = PT{[(NP N)/(NP 1)]/N}1/2and the standard1
deviation of a PT result equal to sPT= PT[NP/(NP 1)]1/2.2
After simple transformations the following formula for the sample mean can be3
obtained: PT/av/(PT/N) = [(NP N)/(NP 1)]1/2= [(1 )/(1 1/Np)]
1/2. The4
dependence of PT/avon is shown (in units of PT/N) in Fig. 6 for the populations of5
NP = 10, 20 and 100 laboratories, curves 1, 2 and 3, respectively.6
0.4
0.6
0.8
1.0
0 20 40 60 80
, %, %, %, %
PT/av
/(PT/N
)
7
Fig. 6. Dependence of the standard deviation of the sample meanPT/av(in units of8
PT/N) on the sample fraction; reproduced from ref. [8] by permission of Springer.9
Curves 1, 2 and 3 are for the populations of NP = 10, 20 and 100 laboratories,10
respectively. The grey bar shows the intermediate range of sample fraction values11
= 5 to 10 % (at < 5 % corrections for a finite population size are negligible, as a12
rule).13
14
Since at least two PT results are necessary for calculation of a standard deviation (i.e.15
the minimal sample size is N= 2), curve 1 is shown for 20 %, curve 2 - for16
2
3
1
, %
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
30/75
30
10 %, and curve 3 - for 2 %. The population size has much less influence here1
than the sample fraction value.2
Dependence of sPT on by the formula sPT/PT = [1/(1 /N)]1/2 is weak in3
comparison with the previous one in Fig. 6, since the correction factor values are of4
0.96 to 1.00 only for any event when the sample size is ofN= 10 to 100 PT results.5
AsNP increases and decreases, the values (NPN)/(NP 1) 1 and 1/(1 /N)6
1, and the corrections for finite population size disappear: PT/av PT/NandsPT7
PT. Therefore, the corrections are negligible for values up to around 5 to 10 %8
(shown by the grey bars in Fig. 6).9
These corrections should, however, be applied with care, only when the population10
is really finite.11
12
7. OUTLIERS13
Since the number of PT results (the sample size N) is limited, it is also important to14
treat extreme results correctly if they are not caused by a known gross error or15
miscalculation. Even at large Nextreme results can provide valuable information to16
the PT provider and should not be disregarded entirely in analysis of the PT results17
without due consideration. When N is small, extreme results cannot usually be18
identified as outliers by known statistical tests because of low power of these tests.19
Fortunately, the metrological approach for small schemes makes outlier handling20
less important, since assigned values should not be calculated by consensus, and21
scores are not expected to be based on observed standard deviations. Accordingly,22
outliers have effect on scoring only for the laboratory reporting outlying results and23
for the PT provider seeking the underlying causes of such problems.24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
31/75
31
8. EFFECTIVENESS OF APPROACHS TO PT1
While traditional approaches to PT (used consensus values for assessment of a2
laboratory performance) are not acceptable forN< 30, the metrological one (based on3
the CRM use) is acceptable from statistical and metrological points of view for anyN,4
includingN30 as well. However, a PT cost increasing withNshould also be taken5
into account for any correct PT scheme design.6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
32/75
32
ANNEX A. CRITERIA FOR ASSESSMENT OF METROLOGICAL1
COMPATIBILITY OF PT RESULTS2
3
CONTENTS4
1. RELATIONSHIP BETWEEN THE DISTRIBUTION OF CRM ASSIGNED5
VALUE DATA AND THE DISTRIBUTION OF PT RESULTS6
2. NULL AND ALTERNATIVE HYPOTHESES7
3. A CRITERION FOR PT RESULTS BEING NORMALLY DISTRIBUTED8
3.1. Example9
3.2. Reliability of the assessment10
4. A NON-PARAMETRIC TEST FOR PT RESULTS WITH AN UNKNOWN11
DISTRIBUTION12
4.1. Reliability of the test13
4.2. Example14
4.3. Limitations15
16
17
1. RELATIONSHIP BETWEEN THE DISTRIBUTION OF CRM ASSIGNED18
VALUE DATA AND THE DISTRIBUTION OF PT RESULTS19
Data used for calculation of the CRM assigned value, and the measurement/analysis20
results of the laboratories participating in PT can be considered as independent21
random events. Therefore, the relation between them can be characterized by the22
common area P under the density function curves for both CRM data and for PT23
results. The Pvalue is the probability of joint events and, therefore, the probability of24
obtained PT results belonging to the population of CRM data.25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
33/75
33
For the sake of simplicity, both distributions are assumed to be normal, with1
parameters ccert, certand cPT, PT, as shown in Fig. 7. The figure refers to a simulated2
example of aluminum determination in coal fly ashes using a CRM developed by3
NIST, USA: SRM 2690 with ccert= 12.35 % and cert= 0.14 % (as mass fraction)4
[27].5
0.0
1.0
2.0
3.0
9.8 10.4 11.0 11.6 12.2 12.8 13.4
c
f c PT c cert
c 1 c 2
6
Fig. 7. Probability density functions f of the PT results and of the CRM data when7
cPT= 12.25 % and PT= 0.34 %; reproduced from ref. [27] by permission of RSC.8
Values c1and c2are the measurement/test results corresponding to the crossing points9
of thefcurves.10
11
Since both density functions,fcertof CRM data andfPTof PT results are equal at the12
c1and c2values, one can write13
14
cert
/)cc(
cert
/)cc(
PT
PT feefcertcertPTPT === 2222 22
2
1
2
1
(1)15
16
As shown in ref. [27], after transformations of expression (1), c1 and c2 can be17
calculated by the following formula:18
19
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
34/75
34
,)(
,22
22
21
PTcert
PTcertcertPTPTcert cccc
= (2)1
where2
.ln)(2)( 222
cert
PT
certPTPTcert cc
+= (3)3
When c1 and c2 are known, the probability calculation is convenient by the next4
formula:5
6
+
+=++=
+
cert
cert
PT
PT
PT
PT
cert
cert
c c
c c
certPTcert
cc
ccccccdcfdcfdcfP
2
12111 2
1 2 7
8
where stands for the normalized normal distribution function. For example,9
calculations by formulas (2)-(4) in the case shown in Fig. 7 yield c1= 12.16,10c2= 12.58 and P= 0.58.11
Information on the distributions of both PT results and CRM data is limited by12
experimental statistical sample sizes. Therefore, the common area P under the13
probability density function curves of the distributions (the probability of obtained PT14
results belonging to the population of the CRM data) can adequately characterize the15
metrological compatibility only as much as the goodness-of-fit of empirical and16
theoretical distributions is high. However, the Pvalue is of practical importance since17
it allows one to choose a suitable null hypothesis for a criterion of a yes-no type for18
assessment of the metrological compatibility of relatively small (not infinite) number19
of PT results.20
21
(4),
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
35/75
35
2. NULL AND ALTERNATIVE HYPOTHESES1
The chosen null hypothesisH0states that the metrological compatibility is satisfactory2
if the bias | certPT cc | exceeds cert only by a value which is insignificant in3
comparison with random interlaboratory errors:4
5
H0: ( ) 2/122 ]3.0[ PTcertcertPT cc + . (5)6
7
where a coefficient of 0.3 is used according to the known metrological rule defining8
one standard deviation insignificant in comparison with another one when the former9
does not exceed 1/3 of the latter (i.e. the first variance is smaller than the second one10
by an order). By this hypothesis, the probability Pof considering the PT results as11
belonging to the population of CRM data is P0.53 for the ratio= cert/PT0.412
(as shown in Fig. 7), when the right-hand side of expression (5) reaches the value of13
1.25cert.14
The alternative hypothesis H1 assumes that the metrological compatibility is not15
satisfactory and the bias | certPT cc | exceeds certsignificantly, for example:16
17
H1: ( ) 2/122 ]3.0[0.2 PTcertcertPT cc += , (6)18
etc.19
20
3. A CRITERION FOR PT RESULTS BEING NORMALLY DISTRIBUTED21
The criterion for not rejectingH0 fora statistical sample of sizeN, i.e. for results ofN22
laboratories participating in the PT, is23
( ) 2/1222/1/ ]3.0[/ PTcertPTcertavPT Nstcc ++ , (7)24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
36/75
36
where cPT/avand sPTare the sample estimates of cPTand PTcalculated from the sameN1
results as the sample average and standard deviation, correspondingly; the left-hand2
side of the expression represents the upper limit of the confidence interval for the bias3
| certPT cc |; t1-/2 is the percentile of the one-tailed Students distribution for the4
number of degrees of freedom N-1; the 1-/2 value is the probability of the bias not5
exceeding the upper limit of its confidence interval.6
By substituting the ratio and sPT/PT=2/12
2/ )]1/([ N , where 2
/2is the 100/27
percentile of 2distribution for the number of degrees of freedom N-1, into formula8
(7), the following transformation of the criterion is obtained:9
10
( ) ( )N
tNscc PTcertavPT
2/1
2/1
22
2// 09.0
1/
+
. (8)11
12
Table 1 gives the numerical values for the right-hand side of the criterion at =0.05.13
Table 114
The bias norms in sPTunits by criterion (8)15
N
5 10 15 20 30 40 50
0.4 0.20 0.20 0.23 0.26 0.30 0.32 0.34
0.7 0.95 0.68 0.65 0.64 0.65 0.66 0.67
1.0 1.76 1.19 1.09 1.06 1.03 1.02 1.02
16
17
These values are the norms for the bias of the average PT result from the analyte18
concentration certified in the CRM (in sPTunits). The value of should be set based19
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
37/75
37
on the requirements to the analytical results taking into account PT fit-for-purpose1
valuethat is equal either to the standard analytical/measurement uncertainty or to the2
target standard deviation targ calculated using the Horwitz curve [2, 3] or another3
database.4
5
3.1. Example6
According to the ASTM standard [29], the means of the results of duplicate7
aluminum determinations in coal fly ashes carried out by different laboratories on8
riffled splits of the analysis sample should not differ by more than 2.0 % for Al2O3,9
i.e. 1.06 % for aluminum. Since the range for two laboratory results is limited by the10
standard, PT= 1.06/2.77 = 0.38 %, where 2.77 is the 95 % percentile of the range11
distribution. In case of the discussed SRM 2690 with cert = 0.14 % the value12
is 0.14/0.38 = 0.4. Simulated statistical samples of the PT results are given in13
Table 2. Metrological compatibility of results of the first 15 laboratories can be14
assessed as satisfactory by the norm in Table 1 for = 0.4 (0.23), since15
cPT/av - ccert= 12.30 12.35= 0.05 < 0.23 sPT= 0.23 0.34 = 0.08 % (as mass16
fraction). The same is true concerning the metrological compatibility of results of all17
the 30 laboratories (the norm in Table 1 is 0.30):cPT/av - ccert= 12.38 12.35=18
0.03 < 0.30 sPT
= 0.30 0.35 = 0.11 %.19
Other detailed examples see in Annex B, Examples 3 and 4.20
21
22
23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
38/75
38
Table 21
PT results of aluminum determination in SRM 2690 (simulated in % as mass2fraction)3
4
Lab. No. i 100 ci Lab. No. i 100 ci
1 12.76 16 12.60
2 12.19 17 12.81
3 12.68 18 12.39
4 12.21 19 11.96
5 12.96 20 11.91
6 12.27 21 11.86
7 11.96 22 12.32
8 12.03 23 12.53
9 11.88 24 12.84
10 11.97 25 12.67
11 12.23 26 12.86
12 12.48 27 12.75
13 12.69 28 12.66
14 12.21 29 11.99
15 11.98 30 12.61
cPT/av 12.30 cPT/av 12.38
sPT 0.34 sPT 0.35
5
6
3.2. Reliability of the assessment7
Reliability in such metrological compatibility assessment is determined by the8
probabilities of not rejecting the null hypothesis H0when it is true, and rejecting it9
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
39/75
39
when it is false (i.e. when the alternative hypothesis H1is true). Criterion (8) does not1
allow rejecting hypothesisH0with probability 1-/2 when it is true. Probability of an2
error of type 1 by this criterion (to reject the H0hypothesis when it is true) is /2.3
Probability of rejecting H0, when it is false, i.e. when the alternative hypotheses H14
are actually true (the criterion power - CP) is:5
6
CP=[ ]
+
+
2/122/1
2/
)1(2/1 Nt
t
, (9)7
where8
=N
cc
PT
PTcertPT
/
)09.0( 2/12
+. (10)9
10
The value of the deviation parameter is calculated substituting the bias | certPT cc |11
in equation (10) by its value corresponding to the alternative hypothesis. For12
hypothesisH1by formula (6) the substitution is ( ) 2/122 ]3.0[0.2 PTcert + and, therefore,13
= [(0.09 + 2)N]1/2. The probability of an error of type 2 (not rejecting theH0when it14
is false) equals to = 1 - CP. Both operational characteristics of the criterion CPand15
are shown in Fig. 8 at = 0.05 for different values and different numbersNof the16
PT participants.17
Thus, the reliability of the compatibility assessment using the hypotheses H018
againstH1for the PT scheme for aluminum determination in coal fly ashes (where =19
0.4) can be characterized by 1) probability 1- /2 = 0.975 of the correct assessment of20
the compatibility as successful (i.e. not rejecting the null hypothesis H0 when it is21
true) for any number Nof the laboratories participating in PT, and by 2) probability22
CP= 0.42 of correct assessment of the compatibility as unsuccessful (i.e. rejectingH023
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
40/75
40
when the alternative hypothesisH1 is true) forN= 15, and probability CP= 0.75 for1
N = 30 results. Probability /2 of a type 1 error is 0.025 for anyN, while probability 2
of a type 2 error is 0.58 forN= 15, and 0.25 forN= 30, etc.3
4
0
0.2
0.4
0.6
0.8
1
5 15 25 35 45N
CP
5
Fig. 8. Power CP of the criterion and probability of an error of type 2 (in6
dependence on the numberN of laboratories participating in PT) for probability/2=7
0.025 of an error of type 1; reproduced from ref. [28] by permission of Springer.8
Curve 1 are at = 0.4, and curve 2 - at = 1.0.9
10
The power of criterion (8) is high (CP > 0.5) for a number of PT participants11
N20.12
13
14
15
1
2
N
0
0.2
0.4
0.6
0.8
1
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
41/75
41
4. A NON-PARAMETRIC TEST FOR PT RESULTS WITH UNKNOWN1
DISTRIBUTION2
In the case of unknown distributions differing from the normal one, the median is3
more robust than the average, i.e. better reproduced in the repeated experiments, being4
less sensitive to extreme results/outliers. Therefore, the null hypothesis assuming here5
that the bias of PT results exceeds certby a value which is insignificant in comparison6
with random interlaboratory errors, has the following form:7
8
H05: ( ) =+ 2/122 ]3.0[- PTcertcertPT cM , (11)9
10
where MPT is the median of PT results of hypothetically infinite number N of11
participants, i.e. the population median.12
IfMPTccert, the null hypothesisH0 implies that probability Peof an event when a13
result ci of the i-th PT-participating laboratory exceeds the value ccert+ , is14
Pe{ci> ccert+ } according to the median definition. If MPT< ccert, the probability15
of ciyielding the value ccert- is also Pe{ci< ccert- }. The alternative hypothesis16
assumes that the bias exceeds cert significantly and probabilities of the events17
described above are Pe > , for example:18
19
H1: =certPT cM - 2, (12)20
21
where is the same as in expression (11). Probabilities Pe of the events according to22
the alternative hypothesisH1at normal distribution (depending on the permissible bias23
in PTunits at different values) are shown in Table 3.24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
42/75
42
Table 31
ProbabilityPe according to alternative hypothesisH12
/PT Pe
0.4 0.50 0.69
0.7 0.75 0.77
1.0 1.04 0.85
3
Since the population median is unknown in practice, and results of Nlaboratories4
participating in PT form aN-size statistical sample from the population, hypothesisH05
is not rejected when the upper limit of the median confidence interval does not exceed6
ccert+ , or the lower limit does not yield ccert - . The limits can be evaluated based7
on the simplest non-parametric sign test[30]. According to this test, the numberN+of8
results ci > ccert+ or the number N-of results ci < ccert- should not exceed the9
critical value A(the bias norm) in order not to reject H0. The Avalues are available,10
for example, in ref. [31]. ForNfrom 5 to 50 PT participants and levels of confidence11
0.975 (/2 = 1-0.975 = 0.025) and 0.95 (/2 = 0.05), these values are shown in Table12
4. The Avalue for fewer than six participants at /2 = 0.025 cannot be determined,13
and therefore, is not presented in Table 4 forN= 5.14
Table 415
The bias normsAby the sign test16
N/2
5 10 15 20 30 40 50
0.025 - 1 3 5 9 13 17
0.05 0 1 3 5 10 14 18
17
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
43/75
43
4.1. Reliability of the test1
The test does not allow rejecting hypothesisH0with a probability of 1-/2, when it is2
true. Probability of an error of type 1 by this test (to reject theH0hypothesis when it is3
true) is /2. Probability of rejecting the null hypothesis when it is false, i.e. when the4
alternative hypothesis is actually true (the test power: TP), is tabulated in ref. [31].5
The probability of type 2 error (not rejecting H0when it is false) equals to= 1-TP.6
The operational characteristics of the test (TPand ) are shown in Fig. 9 at= 0.057
for the alternative hypothesisH1at different values and different numbers Nof the8
PT participants.9
0.0
0.2
0.4
0.6
0.8
1.0
5 15 25 35 45N
TP
10
Fig. 9. PowerTP of the test and probabilityof an error of type 2 in dependence on11
the number N of laboratories participating in PT, when probability of an error of12
type 1 is /2 = 0.025; reproduced from ref. [30] by permission of Springer.The null13
hypothesis H0 is tested against the alternative hypotheses H1 at = 0.4 and =1.014
shown by curves 1 and 2, respectively.15
0.0
0.2
0.4
0.6
0.8
1.0
1
2
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
44/75
44
4.2. Example1
The hypothesis about normal distribution of the PT results in the example shown in2
Table 2 was not tested because of the small size of the statistical samples. Therefore,3
the sample size is increased here to N = 50: the simulated data are presented in Table4
5 (the simulation is performed by the known method of successive approximations).5
Such sample size allows testing the hypothesis about the data normal distribution6
applying the Cramer-von-Mises 2-criterion, powerful for statistical samples of small7
sizes [32]:8
9
2= -N- 2 )]}(1ln[]2/)12(1[)(ln]2/)12[({
1jj
N
j
xNjxNj +=
, (13)10
11
where j = 1, 2, ,Nis the number of the PT result Cjin the statistical sample ranked12
by increasing c value (c1c2 cN);xj= (cj cPT/av)/sPT is the normalized value13
of the j-th result which is distributed with the mean of 0 and the standard deviation of14
1; and (xj) isthe value of the function of the normalized normal distribution forxj.15
The probability that 2= 1.95calculated by formula (13) for the data in Table 516
exceeded randomly the critical value 1.94 (forN= 50) equals to 0.10 [31]. Therefore,17
the hypothesis about normal distribution of these data should be rejected at the level18
of confidence of 0.90. The corresponding empirical histogram and the theoretical19
(normal) distribution are shown in Fig. 10. It is clear that the empirical distribution is20
a bimodal one, therefore, no normal distribution can fit it. Since other known21
distributions are also not suitable here, let us apply the proposed non-parametric test22
for the comparability assessment of the results.23
Table 524
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
45/75
45
PT results of aluminum determination in SRM 2690 (simulated in % as mass1
fraction) ranked according to their increasing value2
No.
j
Result,
100Ci
Cj ccert
Sign No.
j
Result,
Ci100
Cj ccert
Sign No.
j
Result,
100 Ci
Cj ccert
Sign
1 11.86 -0.49 - 18 12.44 0.09 0 35 12.53 0.18 0
2 11.88 -0.47 - 19 12.44 0.09 0 36 12.55 0.20 +
3 11.90 -0.45 - 20 12.45 0.10 0 37 12.56 0.21 +
4 11.91 -0.44 - 21 12.46 0.11 0 38 12.57 0.22 +
5 11.93 -0.42 - 22 12.46 0.11 0 39 12.60 0.25 +
6 11.96 -0.39 - 23 12.47 0.12 0 40 12.61 0.26 +
7 11.96 -0.39 - 24 12.48 0.13 0 41 12.64 0.29 +
8 11.97 -0.38 - 25 12.49 0.14 0 42 12.66 0.31 +
9 11.98 -0.37 - 26 12.49 0.14 0 43 12.67 0.32 +
10 11.99 -0.36 - 27 12.50 0.15 0 44 12.68 0.33 +
11 12.03 -0.32 - 28 12.50 0.15 0 45 12.69 0.34 +
12 12.07 -0.28 - 29 12.51 0.16 0 46 12.76 0.41 +
13 12.17 -0.18 0 30 12.51 0.16 0 47 12.81 0.46 +
14 12.19 -0.16 0 31 12.52 0.17 0 48 12.84 0.49 +
15 12.20 -0.15 0 32 12.52 0.17 0 49 12.90 0.55 +
16 12.34 -0.01 0 33 12.53 0.18 0 50 12.96 0.61 +
17 12.43 0.08 0 34 12.53 0.18 0 N-= 12; N+= 15
3
Taking into account ccert= 12.35 %, cert= 0.14 %, PT= 0.38 %, and = 0.14/0.384
= 0.4, one can calculate = 0.500.38 = 0.19 % (Table 5), ccert+ = 12.54 % and5
ccert- = 12.16 %.There are N+= 15 results cj > 12.54 %, N-= 12 results cj < 12.166
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
46/75
46
%, andN-N+ -N-= 23 values in the range ccert . The sample median found is c25=1
c26= 12.49 > ccert=12.35 % andN+>N-. However,N+ is lower than the critical value2
A= 17 at /2 = 0.025 andN= 50 (Table 4).Therefore, null hypothesisH0concerning3
successful metrological compatibility of the results is not rejected.4
0.00
0.10
0.20
0.30
0.40
0.50
11.7 12.0 12.3 12.6 12.9 13.2
C, %
F
5
Reliability of the assessment with hypotheses H0 against H1 for this case can be6
characterized by: 1) probability 1- /2 = 0.975 of correct assessment of the7
compatibility as successful (not rejecting the null hypothesis when it is true) for any8
number N 6 of the PT participants, and 2) probability TP= 0.73 of correct9
Fig. 10.Histogram of PT results (frequencyF of a result valuec) solid line, and
the fitted normal distribution dotted line; reproduced from ref. [30] by permission
of Springer.
c,%
F
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
47/75
47
assessment of the compatibility of N = 50 PT results as unsuccessful (rejecting H01
when alternative hypothesis H1 is true). Probability /2 of a type 1 error is 0.025 for2
anyN6, while probability of type 2 error is 0.27 forN= 50.3
Additional examples of the use of the sign test see in Annex B, Examples 1 and 2,4
of 2-criterion application Example 3.5
6
4.3. Limitations7
Since the sign test critical A values are determined for N 4 8 depending on8
probabilities , and the test power is calculated also only for N6 8, the proposed9
metrological compatibility assessment cannot be performed for a smaller sample size.10
The power efficiency of the sign test in relation to the t-test (ratio of the sizes Nof11
statistical samples from normal populations allowing the same power) is from 0.96 for12
N= 5 to 0.64 for infinite N. For example, practically the same power (0.73 and 0.75)13
was achieved in the sign test of the compatibility of PT results for aluminum14
determination in coal fly ashes at N= 50 discussed above, and in the t-test for the15
same purpose at N= 30 in the previous paragraph 3. The power efficiency here is16
approximately of 30/50 = 0.6. On the other hand, when information about the17
distribution of PT results is limited by N < 50, it is a problem to evaluate the18
goodness-of-fit empirical and theoretical/normal distributions, a decrease of the t-test19
power and the corresponding decrease of reliability of the compatibility assessment20
caused by deviation of the empirical distribution from the normal one.21
22
23
24
25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
48/75
48
ANNEX B. EXAMPLES1
2
CONTENTS3
EXAMPLE 1. SCENARIO 1: PT FOR LEAD DETERMINATION IN AIRBORNE4
PARTICLES5
1.1.Aim of the PT6
1.2. Procedure for preparation of the IHRM7
1.3. Analytical methods used and raw data8
1.4. Statistical analysis of the data9
1.4.1. Metrological compatibility assessment10
EXAMPLE 2. SCENARIO 2: PT FOR ARSENIC DETERMINATION IN WATER11
2.1.Aim of the PT12
2.2. Procedure for preparation of the IHRM13
2.3. Analytical methods used and raw data14
2.4.Statistical analysis of the data15
2.4.1. Metrological compatibility assessment16
EXAMPLE 3. SCENARIO 3: PT FOR DETERMINATION OF CONCRETE17
COMPRESSIVE STRENGTH18
3.1. Aim of the PT19
3.2. Procedure for preparation of the IHRM20
3.2.1. IHRM homogeneity, certified value and its uncertainty21
3.3. Methods used and raw data22
3.4.Statistical analysis of the data23
3.4.1. Metrological compatibility assessment24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
49/75
49
EXAMPLE 4. A LIMITED POPULATION OF PT PARTICIPANTS: PT FOR ACID1
NUMBER DETERMINATION IN USED MOTOR OILS2
4.1. Aim of the PT3
4.2. Procedure for preparation of the IHRM4
4.2.1. Characterization of the IHRM5
4.3. Methods used and raw data6
4.4.Statistical analysis of the data7
4.4.1. Metrological compatibility assessment8
EXAMPLE 5. SELECTION OF THE MOST COMMUTABLE (ADEQUATE) CRM9
FOR PT OF CEMENTS10
5.1. Twelve components11
5.2. Six components12
5.3. One component13
5.4. Sensitivity coefficient14
15
1617
EXAMPLE 1. SCENARIO 1: PT FOR LEAD DETERMINATION IN AIRBORNE18
PARTICLES19
1.1. Aim of the PT20
The objectives of this PT were to determine whether the quality criteria described in21
the European Directives [33, 34] concerning the analysis of As, Cd, Ni and Pb in22
airborne particles, are reached and the most important sources of uncertainties are23
identified. The measurement method is divided by the standard [35] into two main24
parts: first the sampling in the field and second the analysis in the laboratory. During25
sampling, particles are collected by drawing a measured volume of air through a filter26
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
50/75
50
mounted in a sampler designed to collect the fraction of suspended particulate matter1
of less than 10 m (PM10) [36]. The sample filter is transported to the laboratory and2
the analytes are taken into solution by closed vessel microwave digestion using nitric3
acid and hydrogen peroxide. The resultant solution is analysed by known analytical4
methods. When quantity of an analyte in the solution is measured, its concentration5
can be expressed in ng/m3of the sampled air.6
The PT was organized in 2005 and focused on the second (analytical) part of the7
method. The PT provider was the Ecole des Mines de Douai (EMD) supported by the8
Laboratoire National de Mtrologie et dEssais (LNE). Ten laboratories (N= 10) of9
the Association Agres de Surveillance de la Qualit de lAir participated in this10
trial.11
Results for lead only are discussed below for briefness.12
13
1.2. Procedure for preparation of the IHRM14
The PM10 fraction of suspended particulate matter was collected by EMD on an15
industrial site according to the standard [36]. The sampling was performed on 2016
quartz filters (diameter of 50 mm) during one week at a flow rate of 1 m 3h-1, which17
means a total of 168 m3. Dust on the filters was then digested with 5 ml HNO 3+ 1 ml18
H2O2in a closed microwave oven.19
The LNE was in charge to prepare one liter of a solution from the digestion residue20
which could be used in the PT as an IHRM. The assigned/certified value of the lead21
content in the solution ccert = 26.72 g l-1 provided by LNE was obtained with a22
primary method: isotope dilution inductive coupled plasma mass spectrometry (ID-23
ICP-MS). This content corresponds to 26.72 1000/168 = 159 ng m-3 Pb in the24
sampled air. The expanded measurement uncertainty of the certified value was Ucert =25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
51/75
51
0.77 g l-1at the level of confidence 0.95 and the coverage factor of 2. No stability1
tests were conducted, since the laboratories used the solution just after the2
preparation. The uncertainty due to inhomogeneity of the one liter solution was3
considered negligible. Note, the standard uncertainty was ucert = 0.77/2 = 0.38 g l-1,4
i.e. 1.4 % of the certified value.5
Each laboratory received a bottle of 50 ml of this solution (for all analytes).6
7
1.3. Analytical methods used and raw data8
The list of the laboratories-participants was confidential. All of them followed the9
standard [35]. The methods used were: inductively coupled plasma mass spectrometry10
(ICP-MS), graphite furnace atomic absorption spectrometry (GF-AAS), and11
inductively coupled plasma optical emission spectroscopy with ultrasonic12
nebulization (ICP-OES-USN). The measurements results of i-th laboratory ci, i= 1, 2,13
,N= 10 are shown in Table 6.14
15
1.4. Statistical analysis of the data16
There was no statistically significant dependence of the results on the analytical17
method used. The robust value of the experimental standard deviation sPT of a18
laboratory result ci calculated by the LNE from the data shown in Table 6 using19
Algorithm A of the standards [3, 37] was of 3.93 g l-1, i.e. 14.7 % of the certified20
value. Since the expanded uncertainty stated for lead in the European Directives21
[33, 34] and the standard [35, p.30] is 25 %, the target value for standard deviation of22
a laboratory result in the PT was targ= 25/2 = 12.5 % or 3.34 g l-1.23
Table 624
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
52/75
52
Results of the PT for lead content determination in the solution1
Lab No,
i
Method ci
g l-1
ci- ccert
g l-1
zi Sign
1 ICP-MS 20.12 -6.60 -1.98 -
2 ICP-MS 20.28 -6.44 -1.93 -
3 ICP-OES-USN 30.34 3.62 1.08 +
4 GF-AAS 29.00 2.28 0.68 +
5 ICP-MS 25.00 -1.72 -0.51 -
6 GF-AAS 28.40 1.68 0.50 +
7 ICP-MS 27.80 1.08 0.32 +
8 ICP-MS 25.70 -1.02 -0.31 -
9 GF-AAS 28.20 1.48 0.44 +
10 ICP-MS 25.51 -1.21 -0.36 -
2
Uncertainty of the certified value ucert= 1.4 % was negligible in comparison with3
targ and z-score was applicable for the proficiency testing based on the target targ4
value. The calculatedz-score values are shown in Table 6. All of them are between 25
and +2, and therefore, were interpreted as satisfactory.6
7
8
9
1.4.1. Metrological compatibility assessment10
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
53/75
53
Since a hypothesis on the normal distribution of the PT results was not taken into1
account, compatibility of the results (as a group) is tested based on non-parametric2
statistics as shown in Annex A, para. 4.3
As the standard uncertainty of the certified value ucert = 1.4 % was insignificant in4
comparison with the target standard deviation of PT results targ= 12.5 %, the5
permissible bias of the median of the PT results from the certified value was =6
0.3targ= 3.75 % or 1.00 g l-1. Therefore, ccert+ = 27.72 g l
-1and ccert- = 25.727
g l-1. There wereN+= 5 results ci> 27.72 g l-1andN-= 5 results ci< 25.72 g l
-1.8
They are shown in Table 6 as signs "+" and "-", respectively. Both N+andN-values9
are high than the critical value A = 1 in Table 4. Therefore, null hypothesis H010
concerning compatibility of this group of results should be rejected, in spite of the11
satisfactory z-score values for every laboratory-participant of the PT. Probability of12
type 1 error (to reject the hypothesis when it is correct) of the decision is of 0.025,13
while probability of type 2 error (to not reject the hypothesis when it is false) is of14
above 0.85 according to Fig. 9.15
16
17
EXAMPLE 2. SCENARIO 2: PT FOR ARSENIC DETERMINATION IN WATER18
2.1. Aim of the PT19
The aim of the PT was to support water testing laboratories from the Southern African20
Development Community (SADC) and from East African Community in their effort21
to improve the quality of measurement results. The PT round was organized in 200622
within the Water PT Scheme of the SADCMET (SADC Cooperation in Measurement23
Traceability). The organizers were the Water Quality Services, Windhoek, Namibia,24
in cooperation with the Universitt Stuttgart, Germany, and with financial support by25
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
54/75
54
the Physikalisch-Technische Bundesanstalt, Braunschweig, Germany. The analytes1
were Ca, Mg, Na, K, Fe, Mn, Al, Pb, Cu, Zn, Cr, Ni, Cd, As, SO42-, Cl-, F-, NO3
-, and2
PO43-
in synthetic water modeling drinking/ground water. Three IHRMs with different3
analyte concentrations were prepared and distributed between the laboratories-4
participants for analysis.5
In the following description the determination of the arsenic concentration in one6
IHRM only was selected as an example.7
8
2.2. Procedure for preparation of the IHRM9
The IHRM was formulated on the basis of analytical grade water spiked with pure10
chemicals. Arsenic (III) oxide from Sigma-Aldrich (purity pc= 99.995 %) was used11
for the preparation of the stock solution with a content of As of about 0.4 mg g-1. The12
mass32OAs
m of the oxide was measured on an analytical balance (Sartorius RC 210D),13
the total mass mss/tof the stock solution was determined by the difference weighing on14
a Sartorius BA3100P balance. About mss = 100 g of the stock solution was diluted to15
about mdil/t= 1000 g also on a Sartorius BA3100P balance. Finally about mdil= 200 g16
of the diluted solution (also weighed on the same balance) were diluted to about mlot=17
49900 g. The total mass mlotof this lot was determined by difference weighing on a18
Sartorius F150S balance.19
The assigned/certified value of the As concentration in the IHRM was assessed20
according to the preparation procedure and taking into account the proportion21
32/ OAsAsp of atomic weights (from IUPAC publications), the purity of As2O3used, the22
densitylotof the final lot and a buoyancy correction factor bcf. The density of the23
final lot was measured gravimetrically using a 100 ml pycnometer. The certified value24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
55/75
55
ccertof the mass concentration of As in the final lot was calculated by the following1
formula:2
tdilcflottss
dillotsscOAsAsOAs
cert mbmm
mmppm
c //
/ 3232
=
. (14)3
4
Formula (14) enables also calculation of the uncertainty budget of the certified5
value. The uncertainties of the masses were derived from precision experiments,6
delivering directly the standard uncertainty, and from the linearity tolerances given by7
the manufacturer (used as rectangular distribution). The uncertainty of the purity was8
derived from manufacturers information. The uncertainty of the buoyancy correction9
factor was estimated from the possible variations in the atmospheric pressure, air10
humidity and temperature [38]. For the estimation of the uncertainty of density, a11
separate budget was calculated taking into account the uncertainties of the weighing12
and that of the temperature measurement. The uncertainties of the atomic weights and13
of stability and homogeneity of the solution were neglected.14
The assigned/certified value of the As content in the IHRM and its expanded15
uncertainty were ccert Ucert = 0.1706 0.0001 mg l-1at the level of confidence 0.9516
and the coverage factor of 2. Note, the expanded uncertainty was of 0.07 % of the17
reference value.18
Each laboratory received a bottle of 1 L of this IHRM (for all analytes).19
20
2.3. Analytical methods used and raw data21
Nine laboratories-participants (N = 9) reported results on determination of the As22
concentration shown in Table 7. One of the major problems of current situation with23
water analysis in Africa is absence of any common standard for analytical methods.24
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
56/75
56
The methods used were: inductively coupled plasma optical emission spectrometry1
(ICP-OES), atomic absorption spectrometry (AAS) and others.2
3
2.4. Statistical analysis of the data4
High standard deviations from the certified value (above 20 % of the value) were5
expected at a workshop organized for representatives of the laboratories-participants6
prior to this PT round. Therefore, it was decided to use the target standard deviation7
targof 20 % of the certified value, when the experimental standard deviation sPT> 208
%. Since in the As case the robust sPT value, calculated from the data shown in9
Table 7 by Algorithm A of the standards [3, 37], was of 50.5 % (0.086 mg l -1), the10
stated target value targ= 20 % (0.034 mg l-1) was applied for the proficiency11
assessment withz-score. Thez-score values are shown in Table 7 with the comments:12
satisfactory (Yes) when they were between 2 and +2, questionable (Quest) for 2
7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09
57/75
57
Table 71
Results of the PT for arsenic content determination in water2
Lab N
i
Method ci
mg l-1ci- ccert
mg l-1zi Comment Sign
4 AAS 0.03 -0.1406 -4.12 No -
10 other 0. 20 0.0294 0.86 Yes +
18 ICP-OES 0.20 0.0294 0.86 Yes +
19 ICP-OES 0.12 -0.0506 -1.48 Yes -
26 ICP-OES 0.12 -0.0506 -1.48 Yes -
34 AAS 0.169 -0.0206 -0.05 Yes 0
35 AAS 0.08 -0.0906 -2.66 Quest -
37 ICP-OES 0.789 0.6184 18.12 No +
38 other 0.258 0.0874 2.56 Quest +
3
Therefore, the permissib