Data Analysis Examples Anthony E. Butterfield CH EN 4903-1
Jan 11, 2016
Data Analysis Examples
Anthony E. ButterfieldCH EN 4903-1
#1: The Normal PDF
• Your coworker tells you the temperature fluctuations of the outlet temperature from a certain coal gassifier have an average of 1304 K and keep within 12 K of that mean for 95% of her measurements, over months of operation. If we assume the temperature measurements are normally distributed, what is the standard deviation and what are the odds that a temperature measurement would be above 1310 K?
• T = 1304 ± 12 K (95% Confidence Level)
Normal Distribution• Probability density function (PDF):
2
2
2 2exp
2
1
x
#1: The Normal PDF
..2.. 1 LCerfIC
123.623859.1
12
95.0212 1
erf
2
12
1
2exp
2
12
2
2
x
erfx
dxPDFCDFxx
%3.162123.6
130413101
2
11)1310(%100
erfCDFP
#2: Error Propagation
• In a falling bead viscometer, the viscosity may be found by the following equation:
• Where r is the bead radius, g is gravitational acceleration, V is the terminal velocity, rB is the bead density and rF is the fluid density. If we find, within a 95% confidence level, that the bead density is 2 ± 0.1 g/cm3, the radius is 3 ± 0.1 mm, the fluid density is 1.1 ± 0.2 g/cm3, and, after terminal velocity is achieved, the bead falls 10 ± 0.2 cm in 12 ± 0.5 seconds. What is the calculated viscosity and the uncertainty in its value? Which measurement is the greatest source of error?
V
rg FB
9
2 2
#2: Error Propagation
• A couple options:
2
100
11321
3210
,...,,,...,,
,...,,
ni
ii
niiiii
n
ff
xxxxxxxff
xxxxff
2
1
2
321 ,...,,
n
ix
iG
n
ix
G
xxxxfG
Value CI Units Value CI Units f
g 9.80665 0m/s^2 980.665 0cm/s^2 f1 21.18236g/cm/s
rB 2 0.1g/cm^3 2 0.1g/cm^3 f2 23.53596g/cm/s
rF 1.1 0.2g/cm^3 1.1 0.2g/cm^3 f3 16.47517g/cm/s
r 3 0.1mm 0.3 0.01cm f4 22.61806g/cm/s
d 10 0.2cm 10 0.2cm f5 20.76702g/cm/s
t 12 0.5s 12 0.5s f6 22.06496g/cm/s
f0 21.18236g/cm/s
i (f0-fi)^2
1 0
2 5.539414
3 22.15766
4 2.061216
5 0.172508
6 0.77898 f0 sum^.5sum 30.70977 Viscosity 21.18236 ± 5.54164 g/cm/ssum^.5 5.54164
#2: Error Propagation
#3: Log Normal• 2. You find the following particle size
distributions from a spray dryer experiment:Table of data
If we were to assume this distribution of particle sizes is log-normal, what would be the mean and standard deviation for the log-normal pdf?
• Nonlinear fitting problem, like #6.
2
2
2
lnexp
2
1
x
x
#3: Log NormalRange Max
(um)Count Percentage
00.50001.00001.50002.00002.50003.00003.50004.00004.50005.0000
5.5
0300426352257182129926648360
0 15.8898
38.4534 57.0975 70.7097 80.3496 87.1822 92.0551 95.5508 98.0932 100.0000 100.0000
2
)ln(
2
1
2
1
x
erfCDF
#4: Hypothesis Testing• On a certain stage of a distillation
column theory predicts the ethanol concentration should be 27%. You take the following measurements over several runs:
• What is the likelihood that your measurements match theory?
Percent Ethanol
24.627.621.724.122.624.533.221.717.727.5
#4: Hypothesis Testing• Student’s T-Test.• Mean = 24.52• StDev = 4.2163• Degrees of Freedom
v = na – 1 = 10 -1 = 9
11
2222222
bb
ba
a
a
b
b
a
a nn
nnnn
v
#4: Hypothesis Testing• T-Statistic:
b
b
a
aab nn
22 ab
bat
86.1
10
102163.4
2752.2422
t
#4: Hypothesis Testing
• Use t-statistic in CDF to find probability.
Answer = 9.6%
#5: Hypothesis Testing 2• You are measuring the effectiveness of
a new catalyst on a reaction with a great deal of normally distributed variability. You measure the time to 99% conversion of your reactants with both your new and old catalyst for several experimental runs and find the following data:
• Given this data, what is the probability that the new catalyst is more effective than the old? What is the probability that they are equally effective?
Old(min)
New (min)
9.6711.5111.439.76
10.4110.8210.0510.2710.528.66
10.1312.789.549.928.00
10.638.35
10.8111.1810.8510.389.90
8.909.94
10.528.709.52
10.809.42
10.2310.619.339.409.148.55
11.149.41
10.359.869.578.056.47
#5: Hypothesis Testing 2• Mean A = 10.25, Mean B = 9.50• StDev A = 1.071, StDev B = 1.066• Number A = 22, Number B = 20• Degrees of Freedom
v = na + nb – 2 = 40
11
2222222
bb
ba
a
a
b
b
a
a nn
nnnn
v
#5: Hypothesis Testing 2• T-Statistic:
b
b
a
aab nn
22 ab
bat
2.2950
20066.1
22071.1
50.925.1022
t
#5: Hypothesis Testing 2
• Simple rule:– Greater or less than tests use one tail
(two unequal areas) and you can easily know which % you want to use by looking at the means.
– Equal test uses two equal tails.• For T-CDF with v = 40 and at t-statistic
of -2.295, P = 2.7%.• P that new catalyst is more effective
is a one tail test.• More effective (one tail) =
100% - 2.7% = 97%• Equal (two tail) = 2*2.7% = 5%
#6: Non-Linear Fit• The rate of population growth in a
bacteria culture are found to be: • It is thought that this data could be fit to
the equation:Rate=b1*sin(b2*t)
where b1 and b2 are constants to be determined and t is time. Determine the least squares estimated values for b1 and b2 and give an appropriate confidence interval for a confidence level of 90%. Also, what would you anticipate the rate to be at 24 hr? What would the confidence interval for a 95% confidence level be at 24 hr?
Time(hr)
Rate(SRU)
00.31580.63160.94741.26321.57891.89472.21052.52632.84213.15793.47373.78954.10534.42114.73685.05265.36845.68426.0000
0.00780.29930.18950.36450.30970.25320.34690.37260.0260-0.0107-0.0246-0.0623-0.2936-0.3387-0.2570-0.4667-0.2095-0.1778-0.2522-0.0271
#6: Non-Linear Fit
n
iiii
n
iiii
n
iii
n
n
iniin
n
tbtbtbbyb
S
tbtbbyb
S
tbbyS
c
S
c
S
c
S
cccxYycccS
cccXfY
12121
1
1221
1
1
221
21
1
22121
21
0)cos()sin(2
0)sin()sin(2
)sin(
0 ,...0 ,0
,...,,,...,
,...,,
%Anthony Butterfield 2009%Example of nonlinear fit with CIs clear close allb(1)=1/3;b(2)=1;re=0.1; %random noise strength x=linspace(0,6,20)'; %x data for fittingx2=linspace(0,6,100)'; %x data for plottingn=length(x);y=b(1)*sin(b(2)*x)+re*randn(n,1); %y data for fitting, note the random error added in to make it realisticyt=b(1)*sin(b(2)*x2); %theoretical y data for plotting[beta r J]=nlinfit(x,y,@nlinfitsin,[1 1]); %numerically performs a nonlinear fitbci = nlparci(beta,r,J); %returns the c.i. for the parameters, beta [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y disp('Fit to equation: y = b1 sin(b2 * x)')disp(' x data y data')for i=1:n txt=sprintf(' %5.3f %5.3f',x(i),y(i)); disp(txt)endtxt=sprintf('b1 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(1),beta(1),abs(beta(1)-bci(1,1)));disp(txt)txt=sprintf('b2 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(2),beta(2),abs(beta(2)-bci(2,1)));disp(txt) figure(1)hold ongrid onscatter(x,y,10,'r')plot(x2,yt,'Color',[1 0.5 0]) %just wanted to give you an example of how to change the line color to something not presetplot(x2,ypred,'b',x2,ypred+delta,'b:',x2,ypred-delta,'b:')hold off
#6: Non-Linear Fit
nlparci•In “theory” b1 = 0.3; estimated b1 = 0.35 ± 0.05 (90% CL)•In “theory” b2 = 1.0; estimated b2 = 1.04 ± 0.04 (90% CL)
nlpredciAt 24 hr “theory” predicts:Rate = -0.3019Fit predicts:Rate = -0.1090 ± 0.3839 (95% CL)