8/12/2019 FTIR Topic_Undesired Phenomena
1/50
1
Error structure of spectroscopic data (NIR, FTIR etc)- and how to deal with them .
Harald Martens and Achim Kohler
Centre for Biospectroscopy and Data Modelling, Nofima Food, s,Norway
CIGENE Center for Integrative Genetics, University of Life Sciences,s,
Department of Mathematical Sciences and Technology (IMT), Norwegian
University of Life Sciences, s, Norway
8/12/2019 FTIR Topic_Undesired Phenomena
2/50
2
DNA mRNA Proteome Metabolome BiologicalStructure
Environment, human activity
Data analysis: Integrating different types of bio-dataLook for common variation patterns
Make quantitative prediction and forecasting
Identify outliers
Otherphenotypes
1D-, 2D -Electrophoresis
MALDI-TOFLC-MS
GC,LC(-MS)
Sequencing,SNP, AFLP, NIR, FT-IR
RamanFlourescenceSerotyping
Realtime PCRMicro-array
My own field:
Measurements and modelling in systems biology
Disease incidence
VirulenceDrug sensitivityBiofilm formationSensory ScienceEconomy
8/12/2019 FTIR Topic_Undesired Phenomena
3/50
3
DNA mRNA Proteome Metabolome BiologicalStructure
Environment, human activity
Otherphenotypes
1D-, 2D -Electrophoresis
MALDI-TOFLC-MS
GC,LC(-MS)
Sequencing,SNP, AFLP, NIR, FT-IR
RamanFlourescenceSerotyping
Realtime PCRMicro-array
Now the real fun starts: feed-back !
Disease incidence
VirulenceDrug sensitivityBiofilm formationSensory ScienceEconomy
High-dimensional dynamic, non-linear ODEs
Spatial PDEs
Possible, since we how are getting relevant and reliable
high-throughput, high-dimensional instrumentation
8/12/2019 FTIR Topic_Undesired Phenomena
4/504
Biospectroscopy
Wavelength ranges: UV-Vis (2500 nm Raman Scattering - - Fluorescence: (mainly
8/12/2019 FTIR Topic_Undesired Phenomena
5/505
Biospectroscopy
Errors in measurements:
White noise: Random measurement errors
(usually heteroscedastic: higher numbers have higher errors)
Coloured noise: Systematic errors Several undesired, but unavoidable interferants
From measurement sample thickness,
temp. effects From samples
light scattering (simple, complicated) constituent interactions
Several analytes, with overlapping spectra,
Model-based pre-processing: Identify and correct for systematic errors . Turn systematic errors into valuable sources of information.
8/12/2019 FTIR Topic_Undesired Phenomena
6/506
Water variations in tissues Mie Scattering Dispersive artefact
100015002000250030003500-0.8
-0.6
-0.4
-0.2
0
0.20.4
0.6
0.8
1
Wavenumber [c m -1]
Absorption
Wavenumber-dependent effectsBaseline shift Multiplicative effect
Examples for undesired phenomena in FTIR
8/12/2019 FTIR Topic_Undesired Phenomena
7/507
Chemical absorption Physical contribution
Pre-Processing
model
Measured spectra
Principle of model-based pre-processing:Mie Scattering of individual liver cancer cells in Synchrotron FTIR
8/12/2019 FTIR Topic_Undesired Phenomena
8/508
Example: Light microscopy of muscle,one wavelength in visible range
8/12/2019 FTIR Topic_Undesired Phenomena
9/509
Hyperspectral FTIR microscopy of same sample:
Traditional Chemical image at the best wavelength(1240cm-1 ) - the UNIVARIATE TRADITION!
like playing complex music on a grand piano with one finger at a time
8/12/2019 FTIR Topic_Undesired Phenomena
10/5010
Hyperspectral FTIR microscopy of same sample:
Chemical image at same wavelength after pre-processing
like playing SIMPLE music on a grand piano with one finger at a time
8/12/2019 FTIR Topic_Undesired Phenomena
11/5011
Hyperspectral FTIR microscopy of same sample:
Chemical image from pre-processing parameters,based on all wavelengths
like playing complex music on a grand piano with all fingers and toes (+ nose)
8/12/2019 FTIR Topic_Undesired Phenomena
12/5012
Analysing/Visualising estimated parameters/scatter effects
Estimated parameters can be used for making physical images:
b, proportional to the effective
optical path length, is estimatedfor each pixel spectrum
Kohler A, Bertrand D, Martens H, Hannesson K, Kirschner K, and Ofstad R (2007) Multivariate imageanalysis of a set of FTIR microspectroscopy images of aged bovine tissue combining image and design
information. Analytical and Bioanalytical Chemistry 389, 1143-1153.
8/12/2019 FTIR Topic_Undesired Phenomena
13/50
13
Pre-processing
Model-based pre-processing: parameterize the problems Combine knowledge-driven and data-driven modelling
Use linear data models (fast, simple, robust), but use both
additive and multiplicative operators Complicated non-linear mathematical models replaced by
bilinear, compressed summaries of model behaviour
8/12/2019 FTIR Topic_Undesired Phenomena
14/50
8/12/2019 FTIR Topic_Undesired Phenomena
15/50
15
Notation for model-based pre-processing:
ref = a reference spectrumz = an input sample spectrum(EXAMPLE: z zTrue ! But zTrue = ref)
m = mean of z,ref (and possibly some others)
Error model: 1) m zTrue2) z= f(m) + random noise
f()=is estimated from input spectra z and m
Error correction: zCorr= zTrue = f-1(z)
8/12/2019 FTIR Topic_Undesired Phenomena
16/50
16
0
Spec tra z and ref
0
Mean and diff.
00
z vs Ref
0
zcorr.and Ref
0
0
00 0
0
0
00 0
Wavelength Wavelength WavelengthAbsorb.(ref)
Absorbance
Absorbance
Absorb.(s
ample)
Absorbance
ref.
Simple error types; assume z(true)=ref
z = ref +a zc = z a
Inputspectra
Visualization tools Correctedspectra
8/12/2019 FTIR Topic_Undesired Phenomena
17/50
17
Simple error types
0
Spec tra z and ref
0
Mean and diff.
00
z vs Ref
0
zcorr.and Ref
0
0
00 0
0
0
00 0
Wavelength Wavelength WavelengthAbsorb.(ref)
Absorbance
Absorba
nce
Absorb.(s
ample)
Absorba
nce
ref.
z = ref +a
z = ref b
z = ref b +a
zcorr= z a
zcorr= z / b
zc = (z a ) / b
8/12/2019 FTIR Topic_Undesired Phenomena
18/50
18
Simple error types
0
Spec tra z and ref
0
Mean and diff.
00
z vs Ref
0
zcorr.and Ref
0
0
00 0
0
0
00 0
Wavelength Wavelength WavelengthAbsorb.(ref)
Absorba
nce
Absorba
nce
Absorb.(s
ample)
Absorba
nce
ref.
z = ref +a
z = ref b
z = ref b +a
zcorr= z a
zcorr= z / b
zcorr
= (z a ) / b
8/12/2019 FTIR Topic_Undesired Phenomena
19/50
19
Simple error types
0
Spec tra z and ref
0
Mean and diff.
00
z vs Ref
0
zcorr.and Ref
0
0
00 0
0
0
00 0
Wavelength Wavelength WavelengthAbsorb.(ref)
Absorba
nce
Absorba
nce
Absorb.(s
ample)
Absorba
nce
ref.
z = ref +a
z = ref b
z = ref b +a
zcorr= z a
zcorr= z / b
zcorr
= (z a ) / b
Method: Multiplicative Signal Correction (MSC)or Standard Normal Variates (SNV)
8/12/2019 FTIR Topic_Undesired Phenomena
20/50
8/12/2019 FTIR Topic_Undesired Phenomena
21/50
21
Multiplicative Signal Correction and its
extension (EMSC)
Model: z = b m + a +
zcorr= (z a ) / b
MSC:
i.e. z = b (m + cKanalytes + dGinterferants) + a +
zcorr= (z a Ginterferants / b
EMSC:Model: z = b zTrue + a +
Regression b, a
Regression b, , , a
Assumption: z True= m + cKanalytes + dGinterferants
Assumption: z True= m
i.e. z = b m + Kanalytes + Ginterferants + a +
8/12/2019 FTIR Topic_Undesired Phenomena
22/50
22
H.Martens is co-owner of EMSC patent, but academicuse is of course free.
Algorithms for EMSC are available in Matlab Toolboxetc and in The Unscrambler, for free research use.
Example: Model FTIR effects of varying
8/12/2019 FTIR Topic_Undesired Phenomena
23/50
23
p y gsample temperature in aquous samples
Input spectra: water atdifferent temperatures
Simple EMSC
Ginterferants =wavelengthdependent baseline
EMSC with model ofwater, Kanalytes and itstemperature effects, Ginterferant
Outside instrument range
Example: Model FTIR effects of varying
8/12/2019 FTIR Topic_Undesired Phenomena
24/50
24
p y gsample temperature in aquous samples
Input spectra: water atdifferent temperatures
Simple EMSC
Ginterferants =wavelengthdependent baseline
EMSC with model ofwater, Kanalytes and itstemperature effects, Ginterferant
Outside instrument range
Example: Model FTIR effects of varying
8/12/2019 FTIR Topic_Undesired Phenomena
25/50
25
p y gsample temperature in aquous samples
Input spectra: water atdifferent temperatures
Simple EMSC
Ginterferants =wavelengthdependent baseline
EMSC with model ofwater, Kanalytes and itstemperature effects, Ginterferant
Outside instrument range
Input, EMSCZ.MAT Output, DataCase=155, EM SC, opt.an extra Bad spectrum, in addition to input
)
8/12/2019 FTIR Topic_Undesired Phenomena
26/50
26
0 20 40 60 80 1001.5
2
2.5
3
3.5
Response
Channel #
0 20 40 60 80 1002.4
2.5
2.6
2.7
2.8
Response
Channel #
0 20 40 60 80 100-1
-0.5
0
0.5
1
Input, EMSCZ.MAT
Mean-Centred
Response
Channel #
0 20 40 60 80 100-0.04
-0.02
0
0.02
0.04
Output, DataCase=155, EM SC, opt.an extra Bad spectrum, in addition to input
Mean-Centred
Response
Channel #
850 1050 nmMixtures ofprotein andstarchpowders
Absorbancelo
g(1/T)
Example of EMSC:
Pre-processing ofNIR spectra of
powder mixtures
8/12/2019 FTIR Topic_Undesired Phenomena
27/50
8/12/2019 FTIR Topic_Undesired Phenomena
28/50
8/12/2019 FTIR Topic_Undesired Phenomena
29/50
More nasty error types
8/12/2019 FTIR Topic_Undesired Phenomena
30/50
30
0
z=Ref & nonlin. stray light
0
Mean and diff.
00
z vs Ref
0
zcorr.and Ref
0
0
0
0 0
0
0
00 0
Wavelength Wavelength WavelengthAbsorb.(ref)
Absorbance
Absorbance
Absorb.(sam
ple)
Absorbance
Responsecurvature
e.g. straylight ordetectorsaturation
z=f(ztrue)
zcorr= f-1(z)
Sidewaysshift
(frominstrument orsample)
zcorr= f-1(z)
Randomnoise,
hetero-scedastic
zcorr= filt(z)
Method: Non-linear parameter estimation or
Extended Multiplicative Signal Correction (EMSC)
y yp
More nasty error types
8/12/2019 FTIR Topic_Undesired Phenomena
31/50
31
0
z=Ref & nonlin. stray light
0
Mean and diff.
00
z vs Ref
0
zcorr.and Ref
0
0
0
0 0
0
0
00 0
Wavelength Wavelength WavelengthAbsorb.(ref)
Absorbance
Absorbance
Absorb.(sam
ple)
Absorbance
Responsecurvature
e.g. straylight ordetectorsaturation
z=f(ztrue)
zcorr= f-1(z)
Sidewaysshift
(frominstrument orsample)
zcorr= f-1(z)
Randomnoise,
hetero-scedastic
zcorr= filt(z)
Method: Non-linear parameter estimation or
Extended Multiplicative Signal Correction (EMSC)
y yp
Estimating baseline and multiplicative effect and pre processing
8/12/2019 FTIR Topic_Undesired Phenomena
32/50
32
100020003000
00.2
0.4
0.6
0.8
Wavenumber [c m -1]
Absorbance
0 0.2 0.4 0.6
0
0.20.4
0.6
0.8
Absorbance
Abs
orbance
01
100020003000
0
0.2
0.4
0.6
Wavenumber [cm -1]
Absorbance
0 0.2 0.4 0.6
0
0.2
0.4
0.6
Absorbance
Abs
orbance
01
Estimating baseline and multiplicative effect and pre-processing
Raw spectra MSC/EMSC (basic)
Raw spectra vs. mean Corrected spectra vs. mean
Examples for EMSC replicate correction (Ed Stark)
8/12/2019 FTIR Topic_Undesired Phenomena
33/50
33
100020003000
0
0.1
0.2
0.3
0.4
0.50.6
0.7
Wavenumber [cm -1]
Absorbance
0 0.1 0.2 0.3 0.4
0
0.1
0.20.3
0.4
0.5
0.6
0.7
Absorbance
Absorbance
07
100020003000
00.1
0.2
0.3
0.4
0.5
Wavenumber [cm -1]
Absorbance
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
Absorbance
07
100020003000
0
0.1
0.2
0.3
0.40.5
Wavenumber [cm -1]
Absorbance
0 0.2 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
Absorbance
07
Raw EMSC (basic) EMSC rep.
Examples for EMSC replicate correction (Ed Stark)
8/12/2019 FTIR Topic_Undesired Phenomena
34/50
34
100020003000
0
0.1
0.2
0.3
0.4
0.50.6
0.7
Wavenumber [cm -1]
Absorbance
0 0.1 0.2 0.3 0.4
0
0.1
0.20.3
0.4
0.5
0.6
0.7
Absorbance
Absorbance
07
100020003000
00.1
0.2
0.3
0.4
0.5
Wavenumber [cm -1]
Absorbance
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
Absorbance
07
100020003000
0
0.1
0.2
0.3
0.40.5
Wavenumber [cm -1]
Absorbance
0 0.2 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
Absorbance
07
Raw EMSC (basic) EMSC rep.
Examples for EMSC replicate correction (Ed Stark)
8/12/2019 FTIR Topic_Undesired Phenomena
35/50
35
100020003000
0
0.1
0.2
0.3
0.4
0.50.6
0.7
Wavenumber [cm -1]
Absorbance
0 0.1 0.2 0.3 0.4
0
0.1
0.20.3
0.4
0.5
0.6
0.7
Absorbance
Absorbance
07
100020003000
00.1
0.2
0.3
0.4
0.5
Wavenumber [cm -1]
Absorbance
0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
Absorbance
07
100020003000
0
0.1
0.2
0.3
0.40.5
Wavenumber [cm -1]
Absorbance
0 0.2 0.4
0
0.1
0.2
0.3
0.4
0.5
Absorbance
Absorbance
07
Raw EMSC (basic) EMSC rep.
Examples for EMSC replicate correction
8/12/2019 FTIR Topic_Undesired Phenomena
36/50
36
100020003000
0
0.2
0.4
0.6
0.8
1
Wavenumber [cm -1]
Absorbance
100020003000
-0.10
0.1
0.2
0.3
0.4
0.5
0.6
Wavenumber [cm -1]
Absorbance
100020003000
-0.2
0
0.2
0.4
Wavenumber [cm -1]
Absorbance
100020003000-0.2
-0.15
-0.1
-0.05
0
0.05
Wavenumber [cm -1]
Absorbance
100020003000
0
0.1
0.2
0.3
0.4
0.5
0.6
Wavenumber [cm -1]
Absorbance
100020003000-0.04
-0.02
0
0.02
0.04
Wavenumber [cm -1]
Absorbance
Kohler A, Bcker U, Warringer J, Blomberg A, Omholt SW, Stark E, Martens H (2008) Reducing inter-replicatevariation in FTIR spectrosocopy by extended multiplicative signal correction (EMSC). Applied Spectroscopy.
Raw EMSC (basic) EMSC rep.
8/12/2019 FTIR Topic_Undesired Phenomena
37/50
37
How to obtain more advanced pre-
processing models
1. By estimating unwanted variation from the data itself2. By estimating unwanted variation from mathematical
models about known scatter effects, instrumentalinformation etc.
But how to mix complicated mathematical models and simple,linear pre-processing models?
Solution, e.g. for Mie light scattering ( lense effects ) ofindividual cells in synchrotron FTIR microscopy
Estimating Mie scattering
8/12/2019 FTIR Topic_Undesired Phenomena
38/50
38
Theory
EMSC
subspacemodel
Kohler A, Sul-Suso J, Sockalingum GD, Tobin M, Bahrami F, Yang Y, Pijanka J, Dumas P, Cotte M, Martens H
(2008) Estimating and correcting Mie scattering in synchrotron based microscopic FTIR spectra by extendedmultiplicative signal correction (EMSC). Applied Spectroscopy , 62, 259-266.
Corrected spectra
Mie scattering
Using Mie scattering model for new samples
8/12/2019 FTIR Topic_Undesired Phenomena
39/50
39
Chemical absorption Physical contribution
Pre-Processing
model
Measured spectra
8/12/2019 FTIR Topic_Undesired Phenomena
40/50
8/12/2019 FTIR Topic_Undesired Phenomena
41/50
and functionalgenomics for optimized milk and meat qualit
Large-scale FTIR-bioscreening project in Norway
8/12/2019 FTIR Topic_Undesired Phenomena
42/50
42
-0.02
-0.01
0
0.01
3 05 5. 53 6 2 83 9. 48 8 2 62 3. 44 2 40 7. 39 2 2 09 8. 75 2 1 88 2. 70 4 1 53 9. 34 2 1 32 3. 29 4 1 10 7. 24 6
Variables
Other components
Cal. models
WavenumberWavenumber
Milk FTIR spectra:
genomics for optimized milk and meat qualit
6 million milk spectra/year
Calibration milk samples
Referencemeasurements,fatty acids (GC-MS)
Feeding experiments:
Pred. fatty acids etc
Routine milk analysis:
Background knowledge
QTLs etc ?20K SNPs Heritability,feeding effects etc
Cal. models FACombinations
8/12/2019 FTIR Topic_Undesired Phenomena
43/50
8/12/2019 FTIR Topic_Undesired Phenomena
44/50
44
DNA mRNA Proteome Metabolome BiologicalStructure
Environment, human activity
Otherphenotypes
1D-, 2D -ElectrophoresisMALDI-TOF
LC-MS
GC,LC(-MS)
Sequencing,SNP, AFLP, NIR, FT-IR
Raman
FlourescenceSerotyping
Realtime PCRMicro-array
Now the real fun starts: feed-back !
Disease incidenceVirulence
Drug sensitivityBiofilm formationSensory ScienceEconomy
Models: Dynamic, non-linear ODEsSpatial PDEs
Different feedback control (Jacobi matr.) in different parts ofstate space
10000-dimensional input data
Eigenvalues vs singular values of the Jacobi matr.
Identify outliers
8/12/2019 FTIR Topic_Undesired Phenomena
45/50
45
1000 1100 1200 1300 1400 1500 1600
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Input spectra
Wavenumber
Absorbance
Wavenumber of the FTIR light
FTIRligh
ta
bsorbanc
e
Monitoring dynamic processes
by biospectroscopyA fermentation process in dairy industrymonitored by FTIR (ATR) for 26 hours
Three first principal component scores
8/12/2019 FTIR Topic_Undesired Phenomena
46/50
46
-0.02
0
0.020.04
0.06
0.08
0.1
0.12
-0.05-0.04
-0.03-0.02
-0.010
0.010.02
0
0.01
0.02
0.03
PC 1,
89.6 % variancePC 2,8.7 % variance
PC3
,0.9%
variance k5
k3
k4
k2
k1t = 0
6 hrs
19 hrs
21.5 hrs
26 hrs
Semi-soft modelling of the process
8/12/2019 FTIR Topic_Undesired Phenomena
47/50
47
s2-s
1
s3-s2
s4
-s3
State fingerprints State amounts
Wavenumber, cm -1 Time, hrs
c1
c
2
c3
c4
c5
1000 1100 1200 1300 1400 1500 16000
0.050.1
0.15
0 5 10 15 20 250
0.5
1
1000 1100 1200 1300 1400 1500 1600-8-6-4-2
02
x 10-3
0 5 10 15 20 250
0.5
1
1000 1100 1200 1300 1400 1500 1600
-50
510
x 10-3
0 5 10 15 20 250
0.5
1
1000 1100 1200 1300 1400 1500 1600
0
0.02
0 5 10 15 20 250
0.5
1
1000 1100 1200 1300 1400 1500 1600-2
0246
x 10-3
0 5 10 15 20 250
0.5
1
s1
-0.02
s5
-s4
N li d i d l id tifi ti
8/12/2019 FTIR Topic_Undesired Phenomena
48/50
48
Non-linear dynamic model identification
My other activity in CIGENE:
Cell differentiation model: computer simulation, sensory analysis ofmathematical solutions
The Physiome Project: human heart
Individual heart muscle cell, 36 state variables, 72 param.
Sets of adjacent, interacting cells Assessing large non-linear dynamic models too complex for theory
Nominal-level (Leiden-school!) PLSR of rates vs states
Study local Jacobians and their eigenvalues vs singular values
Represent /replace a mathematical form by its behaviouralrepertoire, by exhaustive simulation (factorial designs to chosenresolution), in compressed Data Base.
C l i
8/12/2019 FTIR Topic_Undesired Phenomena
49/50
49
Conclusions
Many error-types are in fact sources of valuable information. Model-based pre-processing: identify, quantify and separate out
systematic error-types. Model-based pre-processing in biospectroscopy requires an
understanding of the different errors that create the unwantedvariation.
As usual: It is better to be approximately right than precisely wrong It is better to be aggressive/humble, than to be passive/arrogant
.
A k l d t
8/12/2019 FTIR Topic_Undesired Phenomena
50/50
50
Acknowledgements
People who contributed:
Centre for Integrative Genetics (CIGENE), Norw. U. Life Sci. : Stig Omholt, Erik Plahte, Arne Gjuvsland, Sigbjrn Lien,
Hanne Gro Olsen, shild Randby
NOFIMA /Matforsk:
Achim Kohler, Ulrike Bdtker,Nils Kristian Afseth,Martin Hy
TINE: Kjetil Jrgensen
GENO: Morten Svendsen