CONSENSUS MULTIVARIATE CALIBRATION OR MAINTENANCE WITHOUT REFERENCE SAMPLES USING TIKHONOV TYPE REGULARIZATION APPROACHES John Kalivas, Josh Ottaway, Jeremy Farrell, Parviz Shahbazikah Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA
46
Embed
John Kalivas, Josh Ottaway , Jeremy Farrell, Parviz Shahbazikah Department of Chemistry
CONSENSUS MULTIVARIATE CALIBRATION OR MAINTENANCE WITHOUT REFERENCE SAMPLES USING TIKHONOV TYPE REGULARIZATION APPROACHES . John Kalivas, Josh Ottaway , Jeremy Farrell, Parviz Shahbazikah Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA. Outline. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CONSENSUS MULTIVARIATE CALIBRATION OR MAINTENANCE WITHOUT REFERENCE SAMPLES
USING TIKHONOV TYPE REGULARIZATION APPROACHES
John Kalivas, Josh Ottaway, Jeremy Farrell, Parviz Shahbazikah Department of ChemistryIdaho State UniversityPocatello, Idaho 83209 USA
Outline
• Multivariate calibration• Tikhonov regularization (TR)• TR calibration maintenance with reference samples to
form full wavelength or sparse models– Selecting “a” model– Selecting a collection of models– Comparison to PLS
• TR calibration or maintenance without reference samples– Examples with comparison to PLS
• Summary TR variant equations
2
Spectral Multivariate Calibration• y = Xb
y = m x 1 vector of analyte reference values for m calibration samplesX = m x n matrix of spectra for n wavelengthsb = n x 1 regression (model) vector
• MLR solution; requires m ≥ p (wavelength selection)
• Biased regression solutions such as TR, RR (a TR variant), PLS, and PCR
Objective• Using laboratory produced tablets as the primary
calibration set– Determine active pharmaceutical ingredient (API)
concentration in new tablets produced in full production (secondary condition)
Primary Calibration Space: 30 random lab batch samples with 15 from types 1 and 2 eachSecondary Calibration Space: 30 random full batch samples with 15 from types 1 and 2 eachStandardization Set M: 4 random full batch samples with 2 from types 1 and 2 each Validation Space: Remaining 30 full batch types 1 and 2
• Other batch type combinations studied
9
2 2 22 22 2 2
min η λ MXb y b Mb y
Example Model Merit Landscapes
10
η
λλ
η
η
η
RMSEC
RMSEMλ λ
Model Merit Landscapes
11
RMSEC
RMSEMη
λη
λ
Convergence at small λ• Secondary conditions are not
included in new model• Amounts to using primary RR with
local centering where secondary validation samples are centered to the mean of M
2 2 22 22 2 2
min η λ MXb y b Mb y
Best local centered modelsA tradeoff region
Prediction of primary degrades while the prediction of secondary improves
Model Merit Landscapes
12
RMSEC
RMSEMη
λη
λ
2 2 22 22 2 2
min η λ MXb y b Mb y
too large2
b
Further tradeoffs• Tradeoff region between and RMSEC and RMSEM
• Can use an L-curve at a fixed λ value
2b
2
b
λ
ηA tradeoff regionPrediction of primary degrades while the prediction of secondary improves
0 0.004 0.257 16.037 10000
0.2
0.4
0.6
0.8
1
• Multiple merits can be used to assess tradeoff– Respective RMSEC and RMSEM landscapes for R2, slope,
and intercept– L-curves at selected η and λ values–
Model Merit Evaluations
13
H
η
λ = 54.29
22
max max2 2ˆ ˆH RMSEM RMSEMi i ib b
RMSEVη
λλ = 54.29
Model Updating Results
14
Method RMSEC RMSEM RMSEV R2 ηλ
TR2 2.10 0.731 0.014 0.264 0.966 1.70154.29
RR local centering 2.70 0.468 0.245 0.487 0.935 0.588
•Fewer PLS models selected due to sharpness of landscapes from the discrete factor nature of PLS
•Number of “good” models can be made to increase by reducing the increment sizes of η and λ
628 TR2-1 Models
Fact
ors
λ
Mod
el w
ith in
crea
si8ng
η
Consensus Mean Model Updating Results
28
Method No. Models RMSEC RMSEM RMSEV R2 η
Λ
TR2 348 4.32 0.552 0.016 0.284 0.955 0.591207
TR2-1 628 16.32 0.580 0.007 0.274 0.958 0.3791619
PLS 1 3.29 0.658 0.024 0.266 0.964 3 factors19.31
2b
Updating Primary Lab Batch Types 1 and 2 to Predict Secondary Full Batch Types 1 and 2
• The one PLS model predicts best• PLS limited to discrete factors where TR allows 0 ≤ η < ∞ to more fully resolve the landscape
8000 9000 10000-10
0
10
20
0 0.5 10
2
4
6
x 104
Correlation
Freq
uenc
y
8000 9000 10000
-0.5
0
0.5
1
1.5
Consensus Models and Correlations
29
TR2348 models
ib
ib
0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5 x 104
Correlation
Freq
uenc
y
cm-1
TR2-1628 models
Summary• Only a few samples needed for M with appropriate
weighting• Same samples measured in primary and secondary
conditions are not needed– Avoids long term stability issue
• Can select “a” model or a collection of models– Natural target values (thresholds) with model merits R2, slope, and
intercept for primary and secondary standardization sets– Work in progress
• Requires reference values for yM
30
Beer’s law: x = yaka + yiki + m + + n ka = pure component (PC) analyte spectrum ki = PC spectrum of ith interferent (drift, background, etc.) m = rest of the sample matrix n = spectral noise
• Ideal situation:
WHEN:
THEN:
• Cannot simultaneously satisfy 1, 2, and 3 to obtain 4
ˆ ay y4.
Without Reference Samples
31
ˆˆˆ ˆ ...ˆ ˆTa i
Ti
Ta
T Ty y y y k b k b m b nx b b
2
ˆ ˆ and ˆ ˆ such th0 at 0 ˆ 1 T Ti
Ta
T 2. k b 3. b n1 k b. bmb
N = spectra without analyte, e.g., ki
• Minimizing the sum requires a tradeoff between the three conditions– The closer the three conditions are met, the more
likely • Updating the non-matrix effected PC ka to predict
in current conditions (spanned by N)
2 2 22 2
2 22min 1 η λT
ak b b Nb 0
Compromise PCTR2 Model
32
ˆ ay y
• PC interferent spectra– Reference values are 0
• Matrix effected samples without the analyte– Reference values are 0
• Constant analyte samples– Reference values are 0 after spectra are mean centered
• Estimate using samples with reference values
• Samples for N need to be measured at current conditions
Sources of N
33
T
T
yyN I Xy y
Goicoechea et al. Chemom. Intell. Lab. Syst. 56 (2001) 73-81
• Synchronous fluorescence spectra 270 to 340 nm at Δλ=20 nm
Extra Virgin Olive Oil Adulteration
34280 300 3200
1
2
3 x 106
Excitation Wavelength (nm)
Inte
nsity
SunflowerEVOO
Model Merit Landscapes
35
RMSEV
RMSEN
2
b
λ
λ
λη η
η
RMSEPC
λη
H Values
36
95 9.7e4 1.0e8
0.2
0.4
0.6
0.8
1
H
95 9.7e4 1.0e80
0.2
0.4
0.6
0.8
1
1 3.6e-6 3.6e-3 3.6 1.8e3
0.2
0.4
0.6
0.8
1
H
PCTR2 at η = 9.1e3
RR full cal
RR with PCTR2 cal samples
22
max max2 2ˆ ˆH RMSEN RMSENi i ib b
22max max2 2
ˆ ˆH RMSEC RMSEC i i ib b
η
ηλ
Model Updating From PC SunflowerMethod
(No. Samples)
RMSEV R2 ηλ
PCTR2 (26) 2.6e-7 0.031 0.882 9.1e30.0036
RR (56) 4.0e-7 0.028 0.649 1.9e5-
RR with PCTR2
samples (26)2.3e-7 0.077 0.787 1.6e6
-
2b
•Updated PC predicts better than a full calibration
2b
0.05 0.1 0.150
0.05
0.1
0.15
0.2
RRPCTRLS lineLS lineEquality
37yi
ˆiyyi = 0.807xi - 0.0074
yi = 0.422xi + 0.048
• Wülfert, et al., Anal. Chem. 70 (1998) 1761-1767• hhttp://www.models.life.ku.dk/datasets ; Dept. of Food Sciences,
Univ. of Copenhagen• Water, 2-propanol, ethanol (analyte) • 850 to 1049 nm at 1 nm intervals at 30, 40, 50, 60, and 70°C• Calibration: 13 mixtures from 0% to 67% at 30°C• Validation: 6 mixtures from 16% to 66% at 70°C• Primary: PC ethanol at 30°C• Non-analyte matrix (standardization set) N at 70°C
PC interferents water and 2-propoanol (2 samples)Blanks (3 samples)Constant analyte (CA, 5 samples)
Updating analyte PC at 30°C to 70°C using interferent PC and blanks
PLS and PCTR2 predict similarly
8.7e-7 9.5e-5 1.1e-2 1.2 1000
2.73e-6
1e-3
0.37
100
0.1
0.2
0.3
0.4
0.5
0.6
PCTR2 RMSEV
0 2.8e-6 1.0e-3 9.1e-1 100
1
2
3
4
5
6 0.1
0.2
0.3
0.4
0.5
0.6
PLS RMSEV
λ
ηFa
ctor
s
λ
PCTR2 Consensus Modeling Temperature
• On-going work1. Cannot use R2, slope, and intercept for respective
predicted values of the PC and N– Set thresholds for RMSEN, RMSPC, and based on
preliminary inspection of landscapes• Tradeoff needed between , RMSEN and RMSEPC
– Can further filter based on predicted values• Majority vote• Remove outliers
2. Combine predicted value of analyte pure component sample with predicated non-analyte samples to obtain R2, slope, and intercept
42
ˆ2
b
2b
ˆ2
b
1.No reference values
2.With current condition sample reference values
3.A combination of N and M4.Replace with or to obtain sparse models
min η λTa Mk b b Mb y
2 2 22 22 22
1
PCTR Variants (Calibration or Maintenance)
43
2 2 22 2
2 22min 1 η λT
ak b b Nb 0
2b
1b
2Lb
Summary• PCTR2 calibrates (updates) to current conditions without
reference samples• Only a few new samples needed • Can predict better than a full calibration
– More focused to orthogonalize to the sample matrix
• Requires PC analyte spectrum– Does not have to be matrix effected
• Requires non-analyte samples– Can be estimated with reference samples
44
ˆ ˆˆ ...ˆ T TTa a i
Tiyy y k b m b nk b b
bias variance
Other TR Variants
Expression CommentsRR when L = I; includes variable selection when L = diag and approximates an 1-norm
Model updating; includes variable selection
Model updating with variable selection; approximates 0-norm when L = diagModel updating with robustness to the standardization set MCalibration or updating without reference samples
Calibration to target model b*
Adaptive LASSO and LASSO when L = IClaerbout JF, Muir F. Geophysics 1973; 38: 826-844
Elastic net
η λ MXb y Lb Mb y2 2 22 22 2 2
min
η λTa k b Lb Nb
2 2 22 22 22
min 1
η λ MXb y Lb Mb y2 222 1 2
min
MXb y Lb Mb yη λ2 222 2 1
min
η Xb y Lb2 222 2
min
η λ Xb y b b2 222 1 2
min
η Xb y Lb2
2 1min
η * Xb y L b b 22 22 2
min
• In addition to combining a set of models, can combine TR2, PLS, PCTR2, … sets of model predictions