STAT 231 Final
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 1/100
STAT 231
Final
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 2/100
Outline
• Chapter 1
– Data types (discrete, continuous, categorical)
– Problem (3
different
aspects)
– Populations (target, study, sample)
– Representations of data
• Graphical: histograms,
CDFs,
box
plots
• Numerical: mean, standard deviation, IQR
– Bivariate Data
• Relative risk
• Correlation co‐efficient
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 3/100
Outline
• Chapter 2
– Review of probability distributions
– Random PPDAC
examples…
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 4/100
Outline
• Chapter 3
– Binomial Model
– Response Model
– Regression Model
– Maximum Likelihood Estimation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 5/100
Outline
• Chapter 4
– Sampling distributions for estimators
– Introduction to
new
distributions
• Gaussian
• Chi‐squared
• t – Confidence Interval
– Hypothesis Testing
– Confidence Intervals
and
Hypothesis
Testing
with
the
likelihood
function
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 6/100
Outline
• Chapter 5
– Testing for independence with categorical variates
– Model checking
and
assessment
for
assumptions
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 7/100
Outline
• Chapter 6 – Comparison
• 2 sample t-tests• Paired t-test
– Causality
• Testing for association• Blocking
• Randomization and repetition
• Matching – Prediction
• Prediction intervals for response
• Prediction intervals for regression
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 8/100
Confidence Intervals using the
Relative Likelihood Function
Define the likelihood function
Define the relative likelihood function as:
)(
)(
π
π )
L
L
∏=
=n
i
i x f L1
)()(π
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 9/100
Confidence Intervals using the
Relative Likelihood Function
Graph the
relative
likelihood
function:
Draw a horizontal line at 0.1, the intersection of the two
x‐coordinates forms an approximate 95% confidence interval
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 10/100
Hypothesis Testing using the
Likelihood Function
1) Define the null hypothesis, define the alternate
hypothesis
2) Define
the
test
statistic,
identify
the
distribution,
calculate the observed value
3) Calculate the p‐value
The test statistic:
Distribution of D:
)]()~
([20θ θ ll D −=
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 11/100
Hypothesis Testing using the
Likelihood Function
Observed value
of
D:
P‐value:
)]()([20θ θ lld −= )
)( d DP ≥ pn D −2
~ χ
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 12/100
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 13/100
Example
The observed value of the test statistic )]()([20θ θ lld −=
)
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 14/100
Example
∑=
++=n
i
i xnl1
ln)1ln()( θ θ θ
N
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 15/100
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 16/100
Example
)]()([2 0θ θ lld −= ) ∑=
++=n
i
i xnl1
ln)1ln()( θ θ θ
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 17/100
Model Assessment
• We’ve been assuming our data collected fits
to a specific model (Binomial, Response, etc.)
• With these models come many assumptions,
including independence
• In this
chapter,
we
analyze
our
data
to
actually see if we’re able to use these models
to fit
our
data
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 18/100
Independence with
Binary Variates
• We want to see if we can assume two binary
variates (represented by 2 random variables X
and Y)
are
independent
• This is essentially another type of hypothesis
testing
• Since a binary variate is just a categorical
variate with
2 categories,
this
test
can
be
extended to two categorical variates
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 19/100
Independence with
Binary Variates
Define:
Let X represent the binary variate gender (Male = 0, Female = 1)
Let Y represent the binary variate smoker (Non‐Smoker = 0,
Smoker = 1)
Let n be the sample size
Let us collect our observed data and present in the following
frequency table:
Male (X=0) Female (X=1) TotalNon-Smoker (Y=0) a b a + b
Smoker (Y=1) c d c + d
Total a + c b + d n = a + b + c + d
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 20/100
Independence with
Binary Variates
If X and Y are independent then:
Expected
frequency
of
male
smokers
is
Expected frequency of male non‐smokers is
Expected frequency of female smokers is
Expected frequency
of
female
non
‐smokers
is
)1()0( =⋅=⋅ Y P X Pn
)0()0( =⋅=⋅ Y P X Pn
)1()1( =⋅=⋅ Y P X Pn
)0()1( =⋅=⋅ Y P X Pn
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 21/100
Independence with
Binary Variates
Using the observed frequency table
Male (X=0) Female (X=1) Total
Non-Smoker (Y=0) a b a + b
Smoker (Y=1) c d c + d
Total a + c b + d n = a + b + c + d
)0( = X P
)1( = X P
)0( =Y P
)1( =Y P
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 22/100
Independence with
Binary Variates
Creating our expected frequency tableMale (X=0) Female (X=1) Total
Non-Smoker (Y=0) a + b
Smoker (Y=1) c + d
Total a + c b + d n = a + b + c + d
1
)0()0(
e
Y P X Pn
=
=⋅=⋅
2
)0()1(
e
Y P X Pn
=
=⋅=⋅
3
)1()0(
e
Y P X Pn
=
=⋅=⋅
4
)1()1(
e
Y P X Pn
=
=⋅=⋅
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 23/100
Independence with
Binary Variates
As with any other hypothesis testing question, we need to define the test statistic.
Test Statistic:
Distribution of the test statistic:
Observed value:
∑=
−=
n
i i
ii
e
eoS
1
2)(
)1)(1(2
~ −− cr S χ
∑=
−=
n
i i
ii
e
eos
1
2)(
Independence with
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 24/100
Independence with
Binary Variates
p‐value
Make your
conclusion:
Reject: X and Y are not independent
Accept: X and
Y are
independent
)( sSP ≥=
E l
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 25/100
Example
E l
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 26/100
Example
E l
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 27/100
Example
∑=
−=
n
i i
ii
e
eos
1
2)(
Observed value:
l
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 28/100
Example
P‐value:
M d l A t
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 29/100
Model Assessment
For the
regression
model,
we
have
the
following
assumptions when fitting our data
1) The expectation of Y is a linear function of the explanatory
variate
2) The model used is Gaussian
3) Yi’s are independent
4)
The
model
has
a
constant
variance
M d l A t
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 30/100
Model Assessment
The expectation
of
Y is
a linear
function
of
the
explanatory variate
• The model
assumes
that
E[Yi]
is
a linear
combination
of
xi
• If we plot Yi vs. xi we should see a linear relationship
Model Assessment
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 31/100
Model Assessment
The model
used
is
Gaussian
• In the model, we assume and thus
• How do
we
check
if this
assumption
is
reasonable?
Residuals
• Rearranging the
model,
• A realization of R becomes
• An estimated residual is,
• Graphically , is the distance from the line of best fit to our observed response variate
),0(~ σ G R ),(~ σ β α xGY +
)( xY R β α +−=
)( iii x yr β α +−=
iiii y y x yr )
)
) )
−=+−= )( β α
ir )
Model Assessment
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 32/100
Model Assessment
• We can
check
for
the
Gaussian
assumptions
by
plotting
a QQ
plot
• Plot the sample quantiles against the theoretical quantiles of
the estimated
residuals,
if
the
line
is
relatively
straight,
then
the Gaussian assumption holds
Model Assessment
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 33/100
Model Assessment
Yi’s are
independent
• We will check these assumptions by plotting the fitted
response
,
against
the
estimated
residuals,• If our assumptions are true, we should see a random pattern
centered around 0
ii x y β α
)
) )
+= ir
)
Model Assessment
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 34/100
Model Assessment
Model Assessment
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 35/100
Model Assessment
Yi’s have Constant Variance
• If Yi’s have constant variance, we should see residuals evenly
distributed around
zero
Non‐constant variance: funnel shaped
Comparison
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 36/100
Comparison
Recall in Chapter 1 we learned there were three
different aspects (type of problem)
• Descriptive
• Causative• Predictive
Chapter 6 looks
at
techniques
for
solving
each
of
the 3 problems
Comparison
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 37/100
Comparison
• The descriptive aspect of the problem could involve looking
and comparing between two different populations
• In this
section,
we
will
learn
how
to
conduct
hypothesis
tests
that will allow us to make the conclusion whether there’s a
difference between 2 populations
– The question
asked
is
‘is
there
a difference
between
the
mean values of the 2 populations?’
• Essentially, the hypothesis tested is whether the parameter
for
each
population
is
equal
210 : μ μ = H
Comparison
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 38/100
Comparison
2 sample
t‐tests
(Response
Model)
• Two populations
• The estimator for each population is
• The sampling
distribution
for
each
estimator
is
j j RY 111 += μ j j RY 222 += μ
1
1
1
1
1
~
n
Y
n
j
j∑=
=μ 2
1
2
2
2
~
n
Y
n
j
j∑=
=μ
),(~~
1
11
n
Gσ
μ μ ),(~~
2
22
n
Gσ
μ μ
Comparison
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 39/100
Comparison
• In the
hypothesis
tests,
we
want
to
see
if
the
two
parameters
and are equal, so let’s look at the r.v.
• What is the sampling distribution of under the
assumption
1μ 2μ 21
~~ μ μ −
21
~~ μ μ −
21μ μ =
),(~~
1
11
n
Gσ
μ μ ),(~~
2
22
n
Gσ
μ μ
Comparison
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 40/100
Comparison
)11
,0(~~~
21
21nn
G +− σ μ μ
)1,0(~11
~~
21
21 G
nn +
−
σ
μ μ Standardize
Replace with estimate
2
21
21
21~
11~
~~−+
+
−nnt
nnσ
μ μ
Comparison
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 41/100
Comparison
)2(
)1()1(
21
2
22
2
11
−+
−+−= nn
nn σ σ σ
) )
)
2
21
21
21~
11~
~~
−+
+
−= nnt
nn
T
σ
μ μ
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 42/100
Example
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 43/100
Example
3.711 =μ )
7.682=μ
)
2.101 =σ )
3.112 =σ )
471 =n
362 =n
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 44/100
Example
6892.10)23647(
3.11)136(2.10)147(
)2(
)1()1(22
21
2
22
2
11 =−+
−+−=
−+
−+−=
nn
nn σ σ σ
) )
)
097.1
36
1
47
16892.10
7.683.71=
+
−=t
Paired T‐Tests
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 45/100
Paired T Tests
• In the prior pages, we looked at two sample t‐tests
• A stronger test is called the paired t‐test
• This test
only
works
if the
two
samples
we
collect
are
actually
data for the same group of n units, but at different times
• The paired t‐test involves simplifying the two data sets into
one by
finding
the
difference
of
each
pair
of
data,
and
working with this single dataset
• Then we conduct a usual t‐test/hypothesis test on this single
dataset of
differences
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 46/100
• The causative
aspect
of
a problem
looks
at
the
relationship between the explanatory and response
variates
• Recall in
chapter
1 we
looked
at
2 types
of
concepts
that
looks at the relationship between X and Y
– Relative Risk
– Association
• Association involves calculating the correlation
coefficient
∑∑
∑
==
=
−−
−−
===n
ii
n
ii
i
n
i
i
YY XX
XY
y y x x
y y x x
SS
Sr
1
2
1
2
1
)()(
)()(
ρ
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 47/100
• In this
course,
we
only
have
the
skills
to
test
for
association
• This involves
testing
the
hypothesis
in the regression model
• If , then we can say there is no
association between
X and
Y
0:0=
β H
0:0 = β H
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 48/100
p
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 49/100
p
)~
(
0
β
β β
SE t
−=
)
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 50/100
• Association does NOT imply causation
• The course
notes
talks
about
why
this
is
the
case and how we can avoid making the wrong
assumption using three techniques
– Blocking
– Repetition and Randomization
– Matching
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 51/100
Confounding
• Association does not imply causation
• There could be a third hidden variate that is related to both
the explanatory
and
response
and
causes
this
causal
relationship: this is called confounding
• The difficulty with confounding variates is identifying them in
the first
place,
or
else
we
will
make
a wrong
conclusion
about
the relationship between the explanatory and response
variates
• If we
can
identify
the
confounding
variates,
then
there
are
tools we can use when designing experimental plans to
account for these variates
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 52/100
Blocking
• If we’ve identified the confounding variate, we neutralize its
effect by collecting samples where the units have the same
value for
the
confounding
variate
• The Chicken Example:
– Response variate: growth rate of chickens
– Explanatory variate:
protein
in
diet
– Confounding variate: gender of the chickens
– Blocking: look at samples of only male chickens and samples of only
females chickens
– This eliminates the gender effect and the experimenter is able to look
at the effects of protein in diet on the growth rate of chickens
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 53/100
Replication and
Randomization
• If we cannot identify or control the confounding variate, we can
also try to neutralize its effects by randomly allocating our
controlled variate
in
the
experimental
plan
• The Medicine Example:
– Response variate: survival rate
– Explanatory variate:
type
of
treatment
– Confounding variates: medical history/health of the patient
– Using randomization and replication to assign the treatment type to each
unit
will
result
in
two
very
balanced
groups
in
terms
of
their
health/medical history
– This will eliminate the confounding variates as much as possible
Causation
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 54/100
Matching and Observational Plans
• In observational plans, the experimenter cannot
control the
variates
• The method of matching is used where the units that
are being observed are compared with a control unit
that has
very
similar
characteristics
to
the
unit
in
the
plan, (this is similar to blocking)
• Thus
if
there
is
a
difference
in
the
value
observed
between the sampled unit and the control unit, the
difference must be legitimate
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 55/100
• The predictive aspect of a problem involves
using
our
collected
data
to
estimate
a
value
for a unit to be randomly selected from the
population
• We will look at prediction intervals for
– Response
– Regression
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 56/100
The Model
The predicted
unit:
Since follows the response model then
RY += μ
0Y
),(~0 σ μ GY
),(~ σ μ GY
0
Y
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 57/100
What would be a logical choice to use as our predicted
value?
• The average
We need
the
estimator
for
the
mean
parameter:
),(~~n
G σ μ μ n
Y
n
ii
∑== 1~μ
μ ~
From MLE Sampling Distribution
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 58/100
If we
look
at
the
difference
between
our
predicted
value
and
the
population average, then we have the random variable
μ ~0 −Y
),(~~
nG
σ μ μ ),(~0 σ μ GY
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 59/100
Standardizing gives
Replace with an estimator gives
)1
1,0(~~0
nGY +− σ μ
)1,0(~1
1
~0 G
n
Y
+
−
σ
μ
1
0 ~1
1~
~
−
+
−nt
n
Y
σ
μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 60/100
Constructing a 95%
Prediction
Interval
for (
unknown)
Our ultimate goal:
Since we can make the probability statement:
0Y σ
bY a ≤≤ 0
1
0 ~1
1~
~
−
+
−nt
n
Y
σ
95.0)1
1~
~( 0 =≤
+
−c
n
Y P
σ
μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 61/100
95.0)1
1~
~( 0 =≤
+
−≤− c
n
Y cP
σ
μ
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 62/100
Let Y be
the
response
variate
representing
body
weight
(kg).
The
following sample is collected:
60 54 72 65 64
Construct a 95%
prediction
interval
for
the
body
weight
of
someone
we
randomly select from the population.
nc
1
1+⋅± σ μ
) )
N
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 63/100
nc
11+⋅± σ μ ) )
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 64/100
The Model
But
for
our
purposes,
we
will
use
a
shifted
version
of
the
model
R xY i ++= β α
R x xY i +−+= )( β α
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 65/100
The Model
The predicted
unit:
We want to predict given the subgroup
Since follows the regression model then
0Y
0Y
R x xY i +−+= )( β α
0Y 0 x xi =
)),((~ 00σ β α x xGY −+
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 66/100
What would be a logical choice to use as our predicted
value?
• The average
given
the
subgroup
which
we
will denote0 x xi =
)(~ 0 xμ
)(~~]|[)(~000 x x xY E x −+== β α μ
R x xY i +−+= )( β α
Regression Model
Average of the subgroup 0 x xi =
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 67/100
Using Maximum
Likelihood
Estimation
we
obtain
the
estimators
The sampling distributions of these two estimators are
),(~~
nG
σ α α ),(~
~
XX S
Gσ
β β
n
Y n
i
i∑== 1~α XX
XY
n
i
i
n
i
ii
S
S
x x
x xY Y
=
−
−−
=
∑
∑
=
=
1
2
1
)(
))((~
β
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 68/100
What is
the
sampling
distribution
of )(~~)(~ 00 x x x −+= β α μ
),(~~
n
Gσ
α α ),(~~
XX S
Gσ
β β
))
)(1(),((~)(~
2
0
00
xxS
x x
n x xG x−
+−+ σ β α μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 69/100
If we
look
at
the
difference
between
our
predicted
value
and
the
population average, then we have the random variable
The obvious
next
step
would
be
to
determine
the
sampling
distribution of
)(~ 00 xY μ −
)),((~ 00 σ β α x xGY −+ )))(1
(),((~)(~2
0
00
xx
S
x x
n x xG x
−+−+ σ β α μ
)(~ 00 xY μ −
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 70/100
)),((~ 00 σ β α x xGY −+ )))(1(),((~)(~2
0
00
xxS x x
n x xG x −+−+ σ β α μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 71/100
Standardizing gives
Estimating sigma gives
))(1
1,0(~)(~2
0
00
xxS
x x
nG xY
−++− σ μ
)1,0(~)(1
1
)(~
2
0
00 G
S
x x
n
xY
xx
−++
−
σ
μ
22
0
00 ~)(1
1~
)(~−
−++
−n
xx
t
S
x x
n
xY
σ
μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 72/100
Constructing a 95%
Prediction
Interval
for (
unknown)
Our ultimate goal:
Since we can make the probability
statement:
0Y σ
bY a ≤≤ 0
22
0
00 ~)(1
1~
)(~−
−++
−n
xx
t
S
x x
n
xY
σ
95.0)
)(11~
)(~(
20
00 =≤
−++
−c
S
x x
n
xY P
xx
σ
μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 73/100
95.0))(1
1~
)(~(
2
0
00 =≤−
++
−c
S
x x
n
xY P
xx
σ
μ
95.0))(1
1~)(~)(1
1~)(~(
95.0))(1
1~)(~)(1
1~(
95.0))(1
1
~
)(~(
2
0
00
2
0
0
2
0
00
2
0
2
0
00
=−
++⋅+≤≤−
++⋅−
=−
++⋅≤−≤−
++⋅−
=≤−
++
−≤−
xx xx
xx xx
xx
S
x x
nc xY
S
x x
nc xP
S
x x
nc xY
S
x x
ncP
c
S
x x
n
xY cP
σ μ σ μ
σ μ σ
σ
μ
Prediction
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 74/100
xxS
x x
n
c x x2
0
0
)(11)(
−++⋅±−+ σ β α
)
)
)
Upper and Lower bounds of a regression prediction interval
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 75/100
Let Y be
the
response
variate
representing
body
weight
(kg)
and
X be the explanatory variate representing body height (cm).
The following
sample
is
collected:
Construct a 95% prediction interval for the body weight of
someone we randomly select from the population whose
height is
175cm.
Use
i 1 2 3 4 5
xi 172 162 180 170 174
yi 60 54 72 65 64
97.2=σ )
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 76/100
xxS x x
nc x x
2
00 )(11)( −++⋅±−+ σ β α
)
)
)
i 1 2 3 4 5xi 172 162 180 170 174
yi 60 54 72 65 64
Example
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 77/100
xxS
x x
nc x x
20
0
)(11)( −++⋅±−+ σ β α ) ) )
Outline
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 78/100
• Chapter 1
– Data types (discrete, continuous, categorical)
– Problem (3 different aspects)
– Populations (target, study, sample)
– Representations of data
• Graphical: histograms, CDFs, box plots
• Numerical: mean,
standard
deviation,
IQR
– Bivariate Data
• Relative risk
• Correlation
co‐
efficient
• Chapter 2
– Review of probability distributions
– Random PPDAC
examples…
PPDAC
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 79/100
PPDAC
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 80/100
Draw a frequency
histogram
of
the
Flash
data,
with
bins
given
by
the intervals (45 – 49.9), (50 – 54.9), etc.
First make
a frequency
table
with
the
bin
widths
Interval Frequency
(45 – 49.9) 1
(50 – 54.9) 1
(55 – 59.9) 2
(60 – 64.9) 5
(65 – 69.9) 5
(70 – 74.9) 1
(75 – 79.9) 1
(80 – 84.9) 1
(85 – 89.9) 2
(90 – 94.9) 1
PPDAC
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 81/100
Concept Review
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 82/100
• From the
previous
example:
– Target population, study population, sample, unit
– Response vs.
explanatory
variates
– Aspects
• Descriptive
• Causative
• Predictive
– Histograms
• Bin Width
• Frequency histogram
Outline
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 83/100
• Chapter 3
– Binomial Model
– Response Model
– Regression Model
– Maximum Likelihood Estimation
MLE
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 84/100
∏=
=n
i
i x f L
1
);()( θ θ
MLE
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 85/100
∑−
+−=n
i
i xnl1
)ln()1(ln)( θ θ θ
Concept Review
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 86/100
• From the previous example:
– Maximum Likelihood Estimation Method
• Define likelihood
function
• Define log likelihood function
• Differentiate with respect to the parameter
• Set to
zero
• Solve for the parameter
Outline
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 87/100
• Chapter 4
– Sampling distributions for estimators
– Introduction to new distributions
• Gaussian
• Chi‐squared
• t
– Confidence Interval
– Hypothesis Testing
– Confidence Intervals and Hypothesis Testing with the likelihood
function
Confidence Intervals
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 88/100
Confidence Interval
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 89/100
Concepts Review
h i l
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 90/100
• From the
previous
example:
– Confidence Intervals for the response model, sigma
unknown – Structure of a symmetric confidence interval
Hypothesis Testing
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 91/100
Hypothesis Testing
For a paired t test we create a new set of data
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 92/100
For a paired
t‐test,
we
create
a new
set
of
data
1 2 3 4 5 6 7 8
Diff 0.48 0.53 0.52 0.21 -0.05 0.44 0.41 0.68
9 10 11 12 13 14 15 16
Diff 0.46 0.76 3.09 0.26 0.34 0.32 -0.07 0.33
Hypothesis Testing
T t t ti ti 0DT μμ
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 93/100
Test statistic: 1
0 ~~~
−−= n
D
D t
n
T σ
μ μ
Hypothesis Testing
P value
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 94/100
P‐value
Hypothesis Testing
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 95/100
Hypothesis Testing
For a 2 sample t test we have two populations with 2 sets of data
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 96/100
For a 2 sample
t‐test,
we
have
two
populations,
with
2 sets
of
data
Hypothesis Testing
Test statistic: 21
~~= tT μμ
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 97/100
Test statistic: 2
21
21~
11~−+
+
−= nnt
nn
T
σ
μ μ
Hypothesis Testing
912)116(482)116()1()1(2222
++ nn σσ
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 98/100
704.2)21616(
91.2)116(48.2)116(
)2(
)1()1(
21
2211 =−+
−+−=
−+
−+−=
nn
nn σ σ σ
) ) )
Observed value
of
the
test
statistic:
21
21
11
nn
t
+
−=
σ
μ μ
)
) )
Hypothesis Testing
P value
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 99/100
P‐value
Concepts Review
• From the previous example:
5/12/2018 Stat 231 Final Slides - slidepdf.com
http://slidepdf.com/reader/full/stat-231-final-slides 100/100
• From the
previous
example:
– Hypothesis Testing
• Define the null hypothesis
• Define the test statistic, identify the distribution, calculate
the observed value of the test statistic
• Calculate the p‐value
– 2 sample t test
– Paired t test