Statistics for Engineering Section 6: Latent variable modelling Kevin Dunn Copyright, and all rights reserved, Kevin Dunn, 2011 [email protected] April 2011 1
Statistics for EngineeringSection 6: Latent variable modelling
Kevin Dunn
Copyright, and all rights reserved, Kevin Dunn, [email protected]
April 2011
1
What we will cover
I What is a latent variable?
I Applications of latent variable methods
2
Extracting value from data
Ankit - 4C3 student in 2010 - now at Tenova:
I Now, having worked for over a year, I find myself referring back tomy notes all the time and appreciating the concepts about how tolook at data and represent the data in the best possible manner,especially since on a daily basis I look at a gigantic amount of dataand am required to make sense of it.
I I think what I loved most about the course was the emphasis on thethinking and process of getting to a solution instead of the thefinal solution itself which has been an important attribute tobecoming a good engineer or a problem solver/troubleshooter.
3
Extracting value from data
Engineers deal with large quantities of data from many different sources.We can do some some interesting and profitable things with these data:
1. Improve process understanding
2. Troubleshooting process problems
3. Improving, optimizing and controlling processes
4. Predictive modelling (inferential sensors)
5. Process monitoring
4
What is a latent variable?
Your healthI No single measurement of ”health”
I blood pressureI cholesterolI weightI various length/circumference measurementsI various ratios: BMI; waist/hip; etcI blood sugarI temperature, etc
I Combine these in some way? Trained doctor does this mentally.
Health is a latent (hidden) variable
5
What is a latent variable?Temperature in this room
I What drives the movement up and down?I Correlation of thermometers with the driving force.
6
What is a latent variable?
Temperature in this room: geometrically
I Each measurement in time is one point
7
What is a latent variable?
I Rotating plot demo
8
Why do we need latent variable methods?I Shewhart chart for two variables, x1 and x2
I e.g. final product quality from lab values
9
Why LVM? For process monitoring
10
LVM for regression
Multiple linear regression model: y = b1x1 + b2x2
I We get stableestimates for b1 andb2 when the plane is“well supported” bythe measured points
I Think of DOE: weintentionally move tothe corner points tofit the model
I Stable estimates are desirable:I for learning about the processI for accurate predictions in the future
11
LVM for regression
I But what if the two x-variables are strongly correlated?
I The plane rotates for small changes in x-variables
I The slope coefficients change (can even change sign!)
12
LVM for regression
I What can I do about it?
I Suboptimal solutionI Recognize that x1 and x2 are correlatedI Choose either x1 or x2 in the model:
I y = b0 + b1x1I y = b0 + b2x2
I Problems with correlated dataI which variables do you choose to keep or throw out?I can I use an average of the two correlated variables?I how do you know what is ”too strong correlation” before its
problematic?
I Variable selection is not easy ....I Demo on a small example
I Solution: don’t select variables !
13
LVM for troubleshootingI ∼300 measurementsI 3.5 months of data
I How to uncover problems and visualize this much data?
I Carol Slama, Masters Thesis, McMaster University, 1991
I http://digitalcommons.mcmaster.ca/opendissertations/3301
14
http://digitalcommons.mcmaster.ca/opendissertations/3301
LVM for troubleshooting
Monomer recovery flowsheet
I 447 tags measured (i.e. a 447 dimensional data cube)
I Data on about 500 days of operation
15
LVM for troubleshootingI Reduction in monomer recovery ∼ day 400. Target recovery = 92%
I Engineers looked at various time series plots for several weeksI 100 days elapsed without finding the cause
16
LVM for troubleshootingI A latent variable model with 2 variables was built
I Compressed the 447 variables to 2 variablesI Retains most of the information
I Interrogate the latent variables to see what changed ...
17
LVM for troubleshooting: contribution plot
I Shows difference between two points in the score plot
I 207: temperature on tray129 in distillation column #3
I 158: a tag from distillationcolumn #3
I 33 and 277: related toconcentration of feed A
I Suggests: bad temperature control on tray 129 when feedconcentration is high
I Fixed the controller and recovery went back to normal
18
Types of data we deal with
1920’s to 1950’s:
I small number of columns
I scatter plots
I time-series plots for each column
I Shewhart and EWMA charts
I multiple linear regression (MLR)
I carefully chose which columns tomeasure
I independentI low error
19
Types of data we deal with
I Small N and small KI expensive measurement, low frequencyI use scatterplots, linear regression, etc.
I Small N and large K
I Cannot use MLR directly: K > N
20
Types of data we deal with
I Large N and small KI Refinery, most chemical plantsI 2000 to 5000 variables (called tags) every secondI 50 to 100 Mb per second
35 temperatures, 5 to 10 flow rates, 10 pressures, 5 derived values
21
Types of data we deal with
I X and Y matrices
I Predict one or more variables
I Could use MLR; fails for highly correlated data
22
Types of data we deal withI 3D data sets and higher dimensions
I Very common situation now
I Image data (medical imaging)
I 4th dimension: time
I Very high redundancy: neighbouring pixels are similar (spatially andin time)
23
Types of data we deal with
I Batch data sets
24
Issues faced with engineering data
I Size of the dataI rows: we can deal with thisI columns: K(K − 1)/2 pairs of scatterplots
I Lack of independenceI XTX becomes singularI make-shift approach: pick a reduced set of columns
I Low signal to noise ratioI aim to keep our processes constantI little signal and high noiseI data collected is mostly uninformative: constant, noisy, has drift and
errorI Called ”happenstance data”
25
Issues faced with engineering data
Non-causal data
I Happenstance data is non-causalI Only see correlation effectsI Good enough in many cases
I Opposite case: a designed experimentI cause-and-effect
26
Issues faced with engineering dataI Errors in the data
I Least squares: assumes no error in XI Not realistic in most cases
I Missing data
27
Issues faced with engineering data
Tools that we require:
I extract relevant information from data
I deal with missing data
I 3-D, 4-D and higher data sets
I combine data from different sources (same object)
I handle collinearity (low signal to noise ratio)
I handle error in recorded data
Latent variable methods are a suitable tool that meet these requirements.
28
Examples of interesting data sets: Millau Viaduct
In France: expressway connecting Paris and Barcelona
29
Examples of interesting data sets: Millau Viaduct
30
Examples of interesting data sets: Millau Viaduct
31
Examples of interesting data sets: Millau Viaduct
I Pylons, deck, masts have anemometers, accelerometers,inclinometers, extensometers, and temperature sensors
I Detect movement (micrometer level), monitor for oscillations,stress/strain
I Piezoelectric sensors gather traffic data: weight, speed, density oftraffic
I Can distinguish between fourteen different types of vehicle
I 100 readings per second from the main pylon
I Data transmitted via ethernet and fibre optic cables
32
Other data sources
I Chemical plants are moving to wireless sensors and networks
I More and more data available and accessible to engineers thaneven before
I Prior to about 2005: data recorded, but not easily available
33
The “large data set” trap
It’s not about the size of your data ... it’s what you do with it.
I Many rows?I use a for loopI use parallel computingI Amazon EC2:
I Simple CPU rent: $0.17/hourI 23 GB memory, 4-core CPU, 1.7 TB storage, 64-bit: $1.60/hour
I Many columns?I are they really all independent?I use latent variable methods
34
Principal component analysis (PCA)
Main aim: data reduction
35
Principal component analysis (PCA)
I PCA considers a single matrix: X
I N observations
I K variables
I What goes in X?I Any variable we measureI Calculations about the processI Theoretical knowledge: e.g. dimensionless numbers
36
Food texture example
5 quality attributes are measured from pastries:
1. Percentage oil
2. Density
3. Crispiness measurement: from 7 (soft) to 15 (crispy)
4. Fracture angle
5. Hardness: force required before it breaks
I 54 measurements on these 5 variablesI How to visualize?I Scatter plot matrix (good when K < 10)
37
Food texture example
38
Food texture example: PCA model
I 87% of variance of 5 variables captured with 2 variables (13
I We cannot independently move the 5 quality variables
39
Improved process understanding
I Learn which variables are correlated
I Competitor has much less variability !
I Can we reproduce the competitor’s product? Yes
40
Troubleshooting process problems
I Yield was declining
I 6 measurements: 3 size-related, 3 from the lab
41
Process monitoring
Monitoring with latent variables:
I We have K variables (tags)
I Reduce this to A scores (latent variables)
I Combine these A scores to a single value: Hotelling’s T 2
I Errors: combined into a single value: SPE
Advantages:
I The scores are orthogonal
I Fewer scores than original variables
I Monitor anywhere that there is real-time data
I Don’t have to wait for the lab’s final measurement
42
Industrial case study: Dofasco
I ArcelorMittal in Hamilton (formerly called Dofasco) has usedmultivariate process monitoring tools since 1990’s
I Over 80 latent variable applications used daily
I Most well known is their casting monitoring application, Caster SOS(Stable Operation Supervisor)
I It is a multivariate monitoring system
43
Dofasco case study: slabs of steel
All screenshots with permission of Dr. John MacGregor
44
Dofasco case study: casting
45
Dofasco case study: breakout
46
Dofasco case study: monitoring for breakouts
47
Dofasco case study: monitoring for breakouts
I Stability Index 1 and 2:I Hotelling’s T2I SPE
I When alarm: shows contribution plots
I Shows real-time raw data, as operator requires it
48
Predictive modelling (inferential sensors)
I MLR has some serious disadvantagesI Cannot handle missing dataI Cannot handle strong correlationsI MLR requires N > KI Only one y -variable at a timeI Assumes no noise in X, which is never true
49
Predictive modelling (inferential sensors)I Inferential sensor = soft sensorI Image data used as XI Snackfood example: http://dx.doi.org/10.1021/ie020941fI Work by Honglu Yu
I http://digitalcommons.mcmaster.ca/opendissertations/866/
Figure 2 from the paper50
http://dx.doi.org/10.1021/ie020941fhttp://digitalcommons.mcmaster.ca/opendissertations/866
Predictive modelling (inferential sensors)
Figure 8 from the paper: http://dx.doi.org/10.1021/ie020941f
51
http://dx.doi.org/10.1021/ie020941f
Predictive modelling (inferential sensors)
Figure 10 from the paper: http://dx.doi.org/10.1021/ie020941f
52
http://dx.doi.org/10.1021/ie020941f
Summary: Extracting value from data
1. Improve process understanding
I Pastry example
I Competitor example
2. Troubleshooting process problems
I Monomer example
I Poor vs Adequate yield
3. Improving, optimizing and controlling processes
I Pastry example
4. Predictive modelling (inferential sensors)
I Snackfood example
5. Process monitoring
I Dofasco example
I Batch example
53