Top Banner
Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.” ~ Douglas Adams, Hitchhiker's Guide to the Galaxy
20

Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Dec 26, 2015

Download

Documents

Luke Stewart
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Data Analysis II

Anthony E. ButterfieldCH EN 4903-1

"There is a theory which states that if ever anybody discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.”

~ Douglas Adams, Hitchhiker's Guide to the Galaxy

Page 2: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Data Analysis II

• Review of Data Analysis I.• Hypothesis testing.– Types of errors.– Types of tests.– Student’s T-Test

• Fit lines of lines to data.

http://www.che.utah.edu/~geoff/writing/index.html

Page 3: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Quick Review of PDFs and CDFs• What is the probability of measuring a value

between -0.5 and 1.5 , with =0 and =1?• What is the probability of measuring a value

between -0.5 and 1.5 or between -2 and -1?

Page 4: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Hypothesis Testing• How do we know if one hypothesis is more likely true over

alternatives?

• Null Hypothesis (H0) – The hypothesis to be tested to determine if it is true (often that the data observed are the result of random chance).

• Alternative Hypothesis (Hi) – A hypothesis that may be found to be the more probable source of the observations if the null hypothesis is not (often that the observations are the result of more than chance, a real effect).

Page 5: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Possible Types of Error in Tests• Type I Error:– Rejecting a true hypothesis, (a significance level).

• Type II Error:– Accepting a false hypothesis, (b 1-test’s power).

• Tradeoff between a and b.

Page 6: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Testing Alternatives, Tail Tests• One Tail (One-Sided) Test.– H0: m = m0.

“Our new drug is no better than the old drug”H1: m > m0.“Our new drug works better than the old one.”

– H0: m = m0. “The catalytic converter is just as effective as it was when new.”H1: m < m0.“The catalytic converter has fowled.”

• Two Tail (Two-sided) Test.– H0: m = m0.

“Our liquid is a Newtonian fluid.”H1: m ≠ m0. “Our liquid is a non-Newtonian fluid.”

Page 7: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Student’s T-Test

• T-distribution :

• Used for small data sets, where the standard deviation is unknown.

• As the degrees of freedom, v, goes to ∞, the t-distribution becomes the normal distribution.

Page 8: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

11

2222222

bb

ba

a

a

b

b

a

a nn

nnnn

v

Student’s T-Test• Can use to determine the likelihood of two

means being the same.

ab

bat

b

b

a

aab nn

22

t

Page 9: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

T Statistics Example• The test statistic puts the data in question into

a scale in which we can use the T-distribution.• Is ma = mb, or ma ≠ mb,

or ma > mb, or ma < mb?

Page 10: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

T Statistics Example

ab

bat

b

b

a

aab nn

22

v = 38sab = 0.324

t = -1.53

ab

bat

Page 11: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Student’s T-Test Example

• Two sets of data, 10 measurements each, with different variances and with means separated by an increasing value.

• Note the error.• What if we take 100

measurements?

Page 12: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Student’s T-Test for Our p Data

0,0521.0

,1382.3

ba

ba

15116 v

2518.0

10

160521.0

1383.322

t

• Use t statistic and the CDF to find probability.

• Two-tailed test (P 2).• Would need t=0.064 for

95% confidence.

Page 13: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Linear Fitting

• How to best fit a straight line, Y=b+mx, to data?

n

iii

n

ii

n

ii

n

ii

n

ii

n

iiii

n

iii

n

iii

yxmxbx

ymxnb

mxbyxdm

dS

mxbydb

dS

mxbyS

11

2

1

11

1

1

1

2

20

20

Page 14: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

• Coefficient of Determination (R2):

• The closer R2 is to 1 the better the fit.

Linear Fit Quality

TotalError

n

iiiError

n

iiiTotal

SSSSR

bmxySS

yySS

12

2

1

2

1

Page 15: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Nonlinear Fits

• Linearized fits.– Prone to problems.

• Nonlinear fits.– Best for nonlinear

equations.– End up with n

nonlinear equations and n unknowns.

n

iii

n

ii

n

ii

n

ii

n

ii

yfxmxbx

yfmxnb

mXbYf

11

2

1

11

0 ,...0 ,0

,...,,,...,

,...,,

21

1

22121

21

n

n

iniin

n

c

S

c

S

c

S

cccxYycccS

cccXfY

Page 16: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Fitting Example

• Equation:• Linearized fit puts inordinate emphasis on

data taken at larger values of x, in this case.

yy xbxay 3exp2exp

Page 17: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

C.I. For Fitted Constants

• Method uses Student’s T-Test, residuals and Jacobian (Matrix of partial derivatives with respect to parameters for each data point).

• You may use a statistics program.• For example: Matlab • nlfit – get fit parameters, residuals, and

Jacobian.• nlparci – find the CI for parameters.• nlpredici – find CI for predicted values.

• Open the functions, though, to see how they function (“>> open nlparci” and “>> help nlparci”).

Page 18: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

C.I. For Fitted Constants, Example

• Put code for this example online, here. >> nlinfitex2Fit to equation: y = b1 + b2 * exp(-b3 * x) x data y data 0.000 3.022 0.222 2.002 0.444 1.644 0.667 1.241 0.889 0.888 1.111 1.052 1.333 1.043 1.556 1.104 1.778 1.055 2.000 0.800b1 was 1.0, and is estimated to be: 0.949577 ± 0.158716 (95% CL)b2 was 2.0, and is estimated to be: 2.073648 ± 0.317758 (95% CL)b3 was 3.0, and is estimated to be: 2.903019 ± 1.056934 (95% CL)

Page 19: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.

Data Analysis Conclusions• Data analysis is necessary to near any objective

use of measurements.• Must have a basic grasp on statistics.• All data and calculated values should come with

some confidence interval at some probability.• You can reject data under some circumstances,

but avoid them.• Use Student’s T-Test and fitting techniques to

judge if your data match theory.

Page 20: Data Analysis II Anthony E. Butterfield CH EN 4903-1 "There is a theory which states that if ever anybody discovers exactly what the Universe is for and.