Top Banner
Data Analysis Examples Anthony E. Butterfield CH EN 4903-1
22

Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

Jan 11, 2016

Download

Documents

Leonard Miller
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

Data Analysis Examples

Anthony E. ButterfieldCH EN 4903-1

Page 2: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#1: The Normal PDF

• Your coworker tells you the temperature fluctuations of the outlet temperature from a certain coal gassifier have an average of 1304 K and keep within 12 K of that mean for 95% of her measurements, over months of operation. If we assume the temperature measurements are normally distributed, what is the standard deviation and what are the odds that a temperature measurement would be above 1310 K?

• T = 1304 ± 12 K (95% Confidence Level)

Page 3: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

Normal Distribution• Probability density function (PDF):

2

2

2 2exp

2

1

x

Page 4: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#1: The Normal PDF

..2.. 1 LCerfIC

123.623859.1

12

95.0212 1

erf

2

12

1

2exp

2

12

2

2

x

erfx

dxPDFCDFxx

%3.162123.6

130413101

2

11)1310(%100

erfCDFP

Page 5: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#2: Error Propagation

• In a falling bead viscometer, the viscosity may be found by the following equation:

• Where r is the bead radius, g is gravitational acceleration, V is the terminal velocity, rB is the bead density and rF is the fluid density. If we find, within a 95% confidence level, that the bead density is 2 ± 0.1 g/cm3, the radius is 3 ± 0.1 mm, the fluid density is 1.1 ± 0.2 g/cm3, and, after terminal velocity is achieved, the bead falls 10 ± 0.2 cm in 12 ± 0.5 seconds. What is the calculated viscosity and the uncertainty in its value? Which measurement is the greatest source of error?

V

rg FB

9

2 2

Page 6: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#2: Error Propagation

• A couple options:

2

100

11321

3210

,...,,,...,,

,...,,

ni

ii

niiiii

n

ff

xxxxxxxff

xxxxff

2

1

2

321 ,...,,

n

ix

iG

n

ix

G

xxxxfG

Page 7: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

Value CI Units Value CI Units f

g 9.80665 0m/s^2 980.665 0cm/s^2 f1 21.18236g/cm/s

rB 2 0.1g/cm^3 2 0.1g/cm^3 f2 23.53596g/cm/s

rF 1.1 0.2g/cm^3 1.1 0.2g/cm^3 f3 16.47517g/cm/s

r 3 0.1mm 0.3 0.01cm f4 22.61806g/cm/s

d 10 0.2cm 10 0.2cm f5 20.76702g/cm/s

t 12 0.5s 12 0.5s f6 22.06496g/cm/s

f0 21.18236g/cm/s

i (f0-fi)^2

1 0

2 5.539414

3 22.15766

4 2.061216

5 0.172508

6 0.77898 f0 sum^.5sum 30.70977 Viscosity 21.18236 ± 5.54164 g/cm/ssum^.5 5.54164

#2: Error Propagation

Page 8: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#3: Log Normal• 2. You find the following particle size

distributions from a spray dryer experiment:Table of data

If we were to assume this distribution of particle sizes is log-normal, what would be the mean and standard deviation for the log-normal pdf?

• Nonlinear fitting problem, like #6.

2

2

2

lnexp

2

1

x

x

Page 9: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#3: Log NormalRange Max

(um)Count Percentage

00.50001.00001.50002.00002.50003.00003.50004.00004.50005.0000

5.5

0300426352257182129926648360

0 15.8898

38.4534 57.0975 70.7097 80.3496 87.1822 92.0551 95.5508 98.0932 100.0000 100.0000

2

)ln(

2

1

2

1

x

erfCDF

Page 10: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#4: Hypothesis Testing• On a certain stage of a distillation

column theory predicts the ethanol concentration should be 27%. You take the following measurements over several runs:

• What is the likelihood that your measurements match theory?

Percent Ethanol

24.627.621.724.122.624.533.221.717.727.5

Page 11: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#4: Hypothesis Testing• Student’s T-Test.• Mean = 24.52• StDev = 4.2163• Degrees of Freedom

v = na – 1 = 10 -1 = 9

11

2222222

bb

ba

a

a

b

b

a

a nn

nnnn

v

Page 12: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#4: Hypothesis Testing• T-Statistic:

b

b

a

aab nn

22 ab

bat

86.1

10

102163.4

2752.2422

t

Page 13: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#4: Hypothesis Testing

• Use t-statistic in CDF to find probability.

Answer = 9.6%

Page 14: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#5: Hypothesis Testing 2• You are measuring the effectiveness of

a new catalyst on a reaction with a great deal of normally distributed variability. You measure the time to 99% conversion of your reactants with both your new and old catalyst for several experimental runs and find the following data:

• Given this data, what is the probability that the new catalyst is more effective than the old? What is the probability that they are equally effective?

Old(min)

New (min)

9.6711.5111.439.76

10.4110.8210.0510.2710.528.66

10.1312.789.549.928.00

10.638.35

10.8111.1810.8510.389.90

8.909.94

10.528.709.52

10.809.42

10.2310.619.339.409.148.55

11.149.41

10.359.869.578.056.47

Page 15: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#5: Hypothesis Testing 2• Mean A = 10.25, Mean B = 9.50• StDev A = 1.071, StDev B = 1.066• Number A = 22, Number B = 20• Degrees of Freedom

v = na + nb – 2 = 40

11

2222222

bb

ba

a

a

b

b

a

a nn

nnnn

v

Page 16: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#5: Hypothesis Testing 2• T-Statistic:

b

b

a

aab nn

22 ab

bat

2.2950

20066.1

22071.1

50.925.1022

t

Page 17: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#5: Hypothesis Testing 2

• Simple rule:– Greater or less than tests use one tail

(two unequal areas) and you can easily know which % you want to use by looking at the means.

– Equal test uses two equal tails.• For T-CDF with v = 40 and at t-statistic

of -2.295, P = 2.7%.• P that new catalyst is more effective

is a one tail test.• More effective (one tail) =

100% - 2.7% = 97%• Equal (two tail) = 2*2.7% = 5%

Page 18: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#6: Non-Linear Fit• The rate of population growth in a

bacteria culture are found to be: • It is thought that this data could be fit to

the equation:Rate=b1*sin(b2*t)

where b1 and b2 are constants to be determined and t is time. Determine the least squares estimated values for b1 and b2 and give an appropriate confidence interval for a confidence level of 90%. Also, what would you anticipate the rate to be at 24 hr? What would the confidence interval for a 95% confidence level be at 24 hr?

Time(hr)

Rate(SRU)

00.31580.63160.94741.26321.57891.89472.21052.52632.84213.15793.47373.78954.10534.42114.73685.05265.36845.68426.0000

0.00780.29930.18950.36450.30970.25320.34690.37260.0260-0.0107-0.0246-0.0623-0.2936-0.3387-0.2570-0.4667-0.2095-0.1778-0.2522-0.0271

Page 19: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#6: Non-Linear Fit

n

iiii

n

iiii

n

iii

n

n

iniin

n

tbtbtbbyb

S

tbtbbyb

S

tbbyS

c

S

c

S

c

S

cccxYycccS

cccXfY

12121

1

1221

1

1

221

21

1

22121

21

0)cos()sin(2

0)sin()sin(2

)sin(

0 ,...0 ,0

,...,,,...,

,...,,

Page 20: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

%Anthony Butterfield 2009%Example of nonlinear fit with CIs clear close allb(1)=1/3;b(2)=1;re=0.1; %random noise strength x=linspace(0,6,20)'; %x data for fittingx2=linspace(0,6,100)'; %x data for plottingn=length(x);y=b(1)*sin(b(2)*x)+re*randn(n,1); %y data for fitting, note the random error added in to make it realisticyt=b(1)*sin(b(2)*x2); %theoretical y data for plotting[beta r J]=nlinfit(x,y,@nlinfitsin,[1 1]); %numerically performs a nonlinear fitbci = nlparci(beta,r,J); %returns the c.i. for the parameters, beta [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y disp('Fit to equation: y = b1 sin(b2 * x)')disp(' x data y data')for i=1:n txt=sprintf(' %5.3f %5.3f',x(i),y(i)); disp(txt)endtxt=sprintf('b1 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(1),beta(1),abs(beta(1)-bci(1,1)));disp(txt)txt=sprintf('b2 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(2),beta(2),abs(beta(2)-bci(2,1)));disp(txt) figure(1)hold ongrid onscatter(x,y,10,'r')plot(x2,yt,'Color',[1 0.5 0]) %just wanted to give you an example of how to change the line color to something not presetplot(x2,ypred,'b',x2,ypred+delta,'b:',x2,ypred-delta,'b:')hold off

Page 21: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.

#6: Non-Linear Fit

nlparci•In “theory” b1 = 0.3; estimated b1 = 0.35 ± 0.05 (90% CL)•In “theory” b2 = 1.0; estimated b2 = 1.04 ± 0.04 (90% CL)

nlpredciAt 24 hr “theory” predicts:Rate = -0.3019Fit predicts:Rate = -0.1090 ± 0.3839 (95% CL)

Page 22: Data Analysis Examples Anthony E. Butterfield CH EN 4903-1.