Design and Analysis of Physical and Computational Experiments in Chemical Engineering Marco Reis ([email protected]) SAS® Fórum PORTUGAL 2015 Lisboa, 10 de Novembro de 2015
Design and Analysis of Physical and Computational Experiments in
Chemical Engineering
Marco Reis
SAS® Fórum PORTUGAL 2015 Lisboa, 10 de Novembro de 2015
Outline
• I. Introduction
• II. Applications: JMP-SAS – Laboratory
– Industry
– Computer experiments
• III. Discussion & Conclusion
2
Introduction
3
3
Big Data
Data
Technology Analytics
Introduction
4
data + computation power = success ?
Applications
5
Laboratory Industry In silico
JMP-SAS in …
6
Laboratory
Laboratory: the “before” and the “after”…
“before”
• Time-consuming methods
• Affordable equipment
• Direct manipulation of samples, equipment and data
• Few data per experiment
• Few replicates
“after”
• Fast methods
• Expensive equipment
• Interaction through computer interfaces and electronics
• High volumes of data per experiment
• More replicates and samples can be processed
7
8
Laboratory: the “before” and the “after”…
“Hyphenated instruments” (GC-MS,emission-excitation fluorescence,…)
Cromatograma (Vinho)
NIR
GC-MS
250 300 350 400 450 500 550 600 650 700 7500
20
40
60
Absorv
ance
250 300 350 400 450 500 550 600 650 700 750-10
-5
0
5
Absorv
ance
250 300 350 400 450 500 550 600 650 700 750-4
-2
0
2
Absorv
ance
Espectro UV-VIS
Laboratory: the “before” and the “after”…
9
Univariate data Multivariate / Megavariate data
λ1 λ2 λ3 λ4 (…)
Laboratory: the “before” and the “after”…
10
Laboratory: the “before” and the “after”…
• “Data rich but information poor!” …
11
Design of
Experiments
(DOE)
Chemometrics: PCA, PLS, …
Visualization
Multivariate
Statistics
Applications: Laboratory
• Optimal operation of advanced analytical instrumentation
– Vicinal Diketones (VDK’s) are responsible for off-flavours in beer
– Optimize quantification of VDK’s:
• Dyacetil (DC)
• Pentanedione (PN)
– Head space solid-phase microextraction (HS-SPME)
– Gas Chromatography coupled with Mass Spectrometry detection (GC–MS)
12
GC-MS
Applications: Laboratory
• Current Practice
– Best guess & experience
– Empirical approaches
– Change-one-factor-at-a-time
• Approach followed
– D-Optimal Design of Experiments
– SAS-JMP®: Custom Design
13
Y Fβ ε
T
DMax F
F F
1
2ˆ TCov
F Fεβ
100
90
80
70
60
Applications: Laboratory
14
Preliminary screening
of factors and definition of experimental conditions
Optimal-DOE on selected factors
Processing of samples in randomized order
Data Analysis and model fitting
Validation and complete characterization of the
analytical procedure conducted at the optimal
factor levels
• Factors
15
Factor Qualitative/ Quantitative Levels
Type of Fiber Qualitative L1-DVB/PDMS, L2-Car/PDMS, L3-DVB/Car/PDMS
Sample volume Qualitative {5, 10}
Pre-incubation time (min) Quantitative [0,10]
Extraction time (min) Quantitative [5, 25]
Extraction temperature (°C) Quantitative [30, 50]
Agitation Qualitative {L1-Yes, L2-No}
Applications: Laboratory
16
Fib
er
co
ati
ng
(L
1)
Fib
er
co
ati
ng
(L
2)
Inc
ub
ati
on
tim
e (
t1),
min
Ex
tra
cti
on
te
mp
era
ture
, ºC
Ex
tra
cti
on
tim
e (
t2),
min
Ag
ita
tio
n
Sa
mp
le v
olu
me
Fib
er
co
ati
ng
[L
1]*
Inc
ub
ati
on
tim
e,m
in
Fib
er
co
ati
ng
[L
2]*
Inc
ub
ati
on
tim
e,m
in
Fib
er
co
ati
ng
[L
1]*
Ex
tra
cti
on
te
mp
era
ture
,ºC
Fib
er
co
ati
ng
[L
2]*
Ex
tra
cti
on
te
mp
era
ture
,ºC
Fib
er
co
ati
ng
[L
1]*
Ex
tra
cti
on
tim
e,
min
(1
)F
ibe
r c
oa
tin
g [
L2
]*E
xtr
ac
tio
n t
ime
, m
in (
2)
Fib
er
co
ati
ng
[L
1]*
Ag
ita
tio
n
Fib
er
co
ati
ng
[L
2]*
Ag
ita
tio
n
Fib
er
co
ati
ng
[L
1]*
Sa
mp
le v
olu
me
Fib
er
co
ati
ng
[L
2]*
Sa
mp
le v
olu
me
Inc
ub
ati
on
tim
e (
t1)*
Ex
tra
cti
on
te
mp
era
ture
Inc
ub
ati
on
tim
e (
t1)*
Ex
tra
cti
on
tim
e (
t2)
Inc
ub
ati
on
tim
e (
t1)*
Ag
ita
tio
n
Inc
ub
ati
on
tim
e (
t1)*
Sa
mp
leV
olu
me
Ex
tra
tio
n t
em
pe
ratu
re*
Ex
tra
cti
on
tim
eE
xtr
ati
on
te
mp
era
ture
*A
git
ati
on
Ex
tra
tio
n t
em
pe
ratu
re*
Sa
mp
le v
olu
me
Ex
tra
tio
n t
ime
(t2
)*A
git
ati
on
Ex
tra
tio
n t
ime
(t2
)*S
am
ple
vo
lum
e
Ag
ita
tio
n*
Sa
mp
le v
olu
me
• Preliminary analysis
of the design
Applications: Laboratory
• Results
17
Samples
volume
10 ml
5 ml
4500000
4000000
3500000
2500000
3000000
2000000
1500000
1000000
Scla
ed
Est
ima
te
500000
CarPDMS DVBCarPDMS DVBPDMS
Fiber Coating
Important few:
• Fiber coating
• Sample volume
Applications: Laboratory
• Profiler: optimal solution
18
500000040000003000000
20000001000000
0
10
.50
Som
a100ppb 4944465
[4361217,
552713]
Desi
rab
ilty
0.883729
L1
L2
L3
5m
l
10
ml
0 1 2 3 4 5 30
35
40
45
50 5 10
15
20
35
5 10
15
20
35
0
0.2
5
0.5
0.7
5 1
L2 5 ml 5 30 25
Fiber coating Sample Volume Incubation time
(t1), min
Extraction
temperature
ºC
Extraction time
(t2), min
Desirability
Applications: Laboratory
• Solution:
19
Factor Optimal
Level
Type of Fiber L2-Car/PDMS
Sample volume 5 ml
Pre-incubation time
(min) 5 min
Extraction time (min) 25 min
Extraction temperature
(°C) 30 ºC
Agitation L1 - Yes
Applications: Laboratory
• Validation
20
Parameter Diacetyl Pentanedione
Linear regression
(y=mx+b) 0.0026x + 0.0538
0,0057x - 0,0316
Linear concentration
range 10-300 μg L-1
10-200 μg L-1
R² 0.9999 0.9997
LOD (μg L-1) 0.9 3.3
LOQ (μg L-1) 2.8 10.0
Recovery %
LB + 50 μg L-1 SA 91 102
LB + 100 μg L-1 SA 97 99
LB + 200 μg L-1 SA 94 91
Applications: Laboratory
• Application
21
Sample Diacetyl Pentanedione
Cc (μg L−1) SD Cc (μg L−1) SD
LB1 n.d. --- n.q. ---
LB 2 40,7 2,4 17,0 0,6
LB 3 207,8 3,3 27,7 0,2
LB 4 294,9 21,4 30,6 3,5
LB 5 34,5 0,7 20,6 1,0
LB 6 83,8 3,2 23,8 0,1
LB 7 61,6 3,6 17,6 0,6
LB 8 n.d. --- 17,8 0,7
LB 9 6,5 1,6 17,8 0,2
LB 10 n.d. --- 15,4 0,7
LB 11 34,7 0,6 52,9 3,0
LB 12 74,9 4,0 76,4 3,0
LB 13 118,7 1,4 141,0 3,9
LB 14 n.d. --- 20,4 1,0
LB 15 n.d. --- 18,3 0,4
LB 16 n.d. --- 18,6 0,1
LB – lager beer; Cc – Concentration; SD - standard deviation; n.q. - not quantified; n.d. – not detected
Conclusions
“Before” “After”
22
Conclusions
• Planning
– MSA, R&R
– Design of experiments:
• Factorial, Fractional Factorial
• Plackett-Burman
• Mixture designs
• D-optimal, A-optimal, …
• Response surface
• Split-plot designs
• Tagushi robust designs
• Super-saturated designs
(Definitive Screening Designs)
23
JMP-SAS in …
24
Industry
Applications: Industry
• Identify the most influential variables for the partition of phenolic byproducts in a binary mixture at equilibrium
• Mixture
– Organic phase (org): MNB
– Acid phase (ac): H2SO4
– Sub-products to analyze: 2,4-DNF, TNF
25
Applications: Industry
26
Applications: Industry
• Approach: First Principles / Mechanistic
– 2+ PhD’s
– European project: 5.6+ M€
– Several Industry-University projects
… reaction network still unknown!...
27
Applications: Industry
• Sequential Data-Driven Approach
– Establish a research question
– Gather knowledge
– Design experiment
– Analyze data
– Act on the results
28
Applications: Industry
• Repeatability & Reproducibility (R&R) study
29
Applications: Industry
• Design of experiments
– Randomized complete factorial
– No replicates
30
Applications: Industry
• Experiments
31
Applications: Industry
• Analysis
32
2,4-DNF TNF
Efeitos mais significativos
Temperatura do equilíbrio não
exerce um efeito significativo na
resposta .
Applications: Industry
• Modeling
33
Applications: Industry
• Conclusions
– Concentrations of sub-products in the acid phase depend on just two parameters
– Processes provide you with the answers you need, if you ask them politely!
– Do the right questions = design the right experiments
34 George E.P. Box
JMP-SAS in …
35
In silico
I
J
K
XIxJxK I
R + P
Quantitative and
Qualitative Variables Sensor data Spectra
Images
Quality parameters
+ U(P)
ig Data
atch Processes B
36
Scope & Motivation
• Batch processes
– Widely used in industry (high added-value specialties, but also commodities)
• Semiconductor (~s, min)
• Chemical and Petrochemical (~hr)
• Pharmaceutical (~days)
• Food & Drinks (~hr, weeks, years)
• (…)
– Flexible (multipurpose, many degrees of freedom for intervention, scalable to different production ranges)
37
Scope & Motivation
• Many Batch Process Monitoring (BPM) methods and variants have been proposed: – 2-Way
• Batch-Wise unfolding (Nomikos & MacGregor, 1994, 1995)
• Variable-Wise unfolding (Wold et al., 1987, 1998)
– 3-Way • PARAFAC (Bro, 1997; Westerhuis et al., 1999)
• TUCKER3 (Geladi, 1989; Louwerse & Smilde, 2000)
– Dynamic • ARPCA (Choi et al., 2008)
• BDPCA (Chen & Liu, 2002)
– Hierarchical (Rännar, MacGregor & Wold, 1998)
– Local, Evolving (Ramaker et al., 2005)
– Kernel methods (Lee, J.-M. et al., 2004; Jia et al., 2010)
– Multiscale (Rato et al., 2015)
– (…)
39
Proposed Methodology
• Number of independently generated simulated testing scenarios
5×3×50 + 3×6×50 = 750+900 =1650
Proposed Methodology
• Methods and variants – Synchronization
• None, Indicator variable (IV), dynamic time warping (DTW)
– Modelling • 2-way – Batch-wise unfolding
– Infilling : zero deviation, current deviation, projection (missing data)
– Window size (Q statistics): 1, 3, 5
• 3-way – PARAFAC, Tucker3 – Infilling : zero deviation, current deviation, projection (missing
data)
• Dynamic – ARPCA, BDPCA – ARPCA: without (1) /with (2) normalization
– BDPCA: without (1) /with (2) normalization and using DPCA-DR (3)
60 different methods / versions
Analysis of Results
(…) DYN3W2W
90
80
70
60
50
40
30
20
Models
Sco
re
NONE
IV
DTW
Sync
Multi-Vari Chart for Score by Sync - Models (SEMIEX)
Conclusions
• Conclusions
– The results are case-dependent to a great extent
– More consistent methods:
• 2-Way (Synchronization NONE or DTW / Window 1 or 3)
• Dynamic: ARPCA (ORIG or NORM)
– Synchronization
• IV tends to present a comparatively worse performance
– Infilling
• 2-Way: MD
• 3-Way: CD
44
data + computation power + science = success
[science = domain knowledge + scientific method]
Conclusions & Discussion
O futuro …
45
http://www.eq.uc.pt/~marco/research/pclab/
SAS-JMP, Jos van der Velden, Volker Kraft
Ana Cristina Pereira
José Carlos Marques
Ana Leonor
Cristina Gaudêncio
Tiago Rato
Ricardo Rendall
Veronique Medeiros
Acknowledgements
http://www.enbis.org/
European Network for
Business and Industrial
Statistics