Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

Data Analysis Examples

Anthony E. ButterfieldCH EN 4903-1

#1: The Normal PDF

• Your coworker tells you the temperature fluctuations of the outlet temperature from a certain coal gassifier have an average of 1304 K and keep within 12 K of that mean for 95% of her measurements, over months of operation. If we assume the temperature measurements are normally distributed, what is the standard deviation and what are the odds that a temperature measurement would be above 1310 K?

• T = 1304 ± 12 K (95% Confidence Level)

Normal Distribution• Probability density function (PDF):

2

2

2 2exp

2

1

x

#1: The Normal PDF

..2.. 1 LCerfIC

123.623859.1

12

95.0212 1

erf

2

12

1

2exp

2

12

2

2

x

erfx

dxPDFCDFxx

%3.162123.6

130413101

2

11)1310(%100

erfCDFP

#2: Error Propagation

• In a falling bead viscometer, the viscosity may be found by the following equation:

• Where r is the bead radius, g is gravitational acceleration, V is the terminal velocity, rB is the bead density and rF is the fluid density. If we find, within a 95% confidence level, that the bead density is 2 ± 0.1 g/cm3, the radius is 3 ± 0.1 mm, the fluid density is 1.1 ± 0.2 g/cm3, and, after terminal velocity is achieved, the bead falls 10 ± 0.2 cm in 12 ± 0.5 seconds. What is the calculated viscosity and the uncertainty in its value? Which measurement is the greatest source of error?

V

rg FB

9

2 2


• A couple options:

2

100

11321

3210

,...,,,...,,

,...,,

ni

ii

niiiii

n

ff

xxxxxxxff

xxxxff

2

1

2

321 ,...,,

n

ix

iG

n

ix

G

xxxxfG

Value CI Units Value CI Units f

g 9.80665 0m/s^2 980.665 0cm/s^2 f1 21.18236g/cm/s

rB 2 0.1g/cm^3 2 0.1g/cm^3 f2 23.53596g/cm/s

rF 1.1 0.2g/cm^3 1.1 0.2g/cm^3 f3 16.47517g/cm/s

r 3 0.1mm 0.3 0.01cm f4 22.61806g/cm/s

d 10 0.2cm 10 0.2cm f5 20.76702g/cm/s

t 12 0.5s 12 0.5s f6 22.06496g/cm/s

f0 21.18236g/cm/s

i (f0-fi)^2

1 0

2 5.539414

3 22.15766

4 2.061216

5 0.172508

6 0.77898 f0 sum^.5sum 30.70977 Viscosity 21.18236 ± 5.54164 g/cm/ssum^.5 5.54164


#3: Log Normal• 2. You find the following particle size

distributions from a spray dryer experiment:Table of data

If we were to assume this distribution of particle sizes is log-normal, what would be the mean and standard deviation for the log-normal pdf?

• Nonlinear fitting problem, like #6.

2

2

2

lnexp

2

1

x

x

#3: Log NormalRange Max

(um)Count Percentage

00.50001.00001.50002.00002.50003.00003.50004.00004.50005.0000

5.5

0300426352257182129926648360

0 15.8898

38.4534 57.0975 70.7097 80.3496 87.1822 92.0551 95.5508 98.0932 100.0000 100.0000

2

)ln(

2

1

2

1

x

erfCDF

#4: Hypothesis Testing• On a certain stage of a distillation

column theory predicts the ethanol concentration should be 27%. You take the following measurements over several runs:

• What is the likelihood that your measurements match theory?

Percent Ethanol

24.627.621.724.122.624.533.221.717.727.5

#4: Hypothesis Testing• Student’s T-Test.• Mean = 24.52• StDev = 4.2163• Degrees of Freedom

v = na – 1 = 10 -1 = 9

11

2222222

bb

ba

a

a

b

b

a

a nn

nnnn

v

#4: Hypothesis Testing• T-Statistic:

b

b

a

aab nn

22 ab

bat

86.1

10

102163.4

2752.2422

t

#4: Hypothesis Testing

• Use t-statistic in CDF to find probability.

Answer = 9.6%

#5: Hypothesis Testing 2• You are measuring the effectiveness of

a new catalyst on a reaction with a great deal of normally distributed variability. You measure the time to 99% conversion of your reactants with both your new and old catalyst for several experimental runs and find the following data:

• Given this data, what is the probability that the new catalyst is more effective than the old? What is the probability that they are equally effective?

Old(min)

New (min)

9.6711.5111.439.76

10.4110.8210.0510.2710.528.66

10.1312.789.549.928.00

10.638.35

10.8111.1810.8510.389.90

8.909.94

10.528.709.52

10.809.42

10.2310.619.339.409.148.55

11.149.41

10.359.869.578.056.47

#5: Hypothesis Testing 2• Mean A = 10.25, Mean B = 9.50• StDev A = 1.071, StDev B = 1.066• Number A = 22, Number B = 20• Degrees of Freedom

v = na + nb – 2 = 40

11

2222222

bb

ba

a

a

b

b

a

a nn

nnnn

v

#5: Hypothesis Testing 2• T-Statistic:

b

b

a

aab nn

22 ab

bat

2.2950

20066.1

22071.1

50.925.1022

t

#5: Hypothesis Testing 2

• Simple rule:– Greater or less than tests use one tail

(two unequal areas) and you can easily know which % you want to use by looking at the means.

– Equal test uses two equal tails.• For T-CDF with v = 40 and at t-statistic

of -2.295, P = 2.7%.• P that new catalyst is more effective

is a one tail test.• More effective (one tail) =

100% - 2.7% = 97%• Equal (two tail) = 2*2.7% = 5%

#6: Non-Linear Fit• The rate of population growth in a

bacteria culture are found to be: • It is thought that this data could be fit to

the equation:Rate=b1*sin(b2*t)

where b1 and b2 are constants to be determined and t is time. Determine the least squares estimated values for b1 and b2 and give an appropriate confidence interval for a confidence level of 90%. Also, what would you anticipate the rate to be at 24 hr? What would the confidence interval for a 95% confidence level be at 24 hr?

Time(hr)

Rate(SRU)

00.31580.63160.94741.26321.57891.89472.21052.52632.84213.15793.47373.78954.10534.42114.73685.05265.36845.68426.0000

0.00780.29930.18950.36450.30970.25320.34690.37260.0260-0.0107-0.0246-0.0623-0.2936-0.3387-0.2570-0.4667-0.2095-0.1778-0.2522-0.0271

#6: Non-Linear Fit

n

iiii

n

iiii

n

iii

n

n

iniin

n

tbtbtbbyb

S

tbtbbyb

S

tbbyS

c

S

c

S

c

S

cccxYycccS

cccXfY

12121

1

1221

1

1

221

21

1

22121

21

0)cos()sin(2

0)sin()sin(2

)sin(

0 ,...0 ,0

,...,,,...,

,...,,

%Anthony Butterfield 2009%Example of nonlinear fit with CIs clear close allb(1)=1/3;b(2)=1;re=0.1; %random noise strength x=linspace(0,6,20)'; %x data for fittingx2=linspace(0,6,100)'; %x data for plottingn=length(x);y=b(1)*sin(b(2)*x)+re*randn(n,1); %y data for fitting, note the random error added in to make it realisticyt=b(1)*sin(b(2)*x2); %theoretical y data for plotting[beta r J]=nlinfit(x,y,@nlinfitsin,[1 1]); %numerically performs a nonlinear fitbci = nlparci(beta,r,J); %returns the c.i. for the parameters, beta [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y disp('Fit to equation: y = b1 sin(b2 * x)')disp(' x data y data')for i=1:n txt=sprintf(' %5.3f %5.3f',x(i),y(i)); disp(txt)endtxt=sprintf('b1 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(1),beta(1),abs(beta(1)-bci(1,1)));disp(txt)txt=sprintf('b2 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(2),beta(2),abs(beta(2)-bci(2,1)));disp(txt) figure(1)hold ongrid onscatter(x,y,10,'r')plot(x2,yt,'Color',[1 0.5 0]) %just wanted to give you an example of how to change the line color to something not presetplot(x2,ypred,'b',x2,ypred+delta,'b:',x2,ypred-delta,'b:')hold off

#6: Non-Linear Fit

nlparci•In “theory” b1 = 0.3; estimated b1 = 0.35 ± 0.05 (90% CL)•In “theory” b2 = 1.0; estimated b2 = 1.04 ± 0.04 (90% CL)

nlpredciAt 24 hr “theory” predicts:Rate = -0.3019Fit predicts:Rate = -0.1090 ± 0.3839 (95% CL)

Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

Documents

bead density

lognormal pdf

temperature measurements

following measurements

bead radius

fluid density

following data

new catalyst