1 1 Basic assumptions about you Many elementary concepts have been skipped. At this stage, it is assumed that you should know them well. In particular, you MUST know how to do HATPC for each of the 8½ hypothesis tests. Only important things, or those that inter-connect several topics together, are elaborated here. You have ABSOLUTELY NO hope of passing STAT170 if you do not know the 8½ HATPCs. This PP file will NOT push you from F to P. The contents of this file will only help the P or above students, given the presumed basic knowledge. 2 2 Binding things together Review of: • 5 types of graphics • 5 types of research questions • 8½ statistical tests • 8 or MORE types of reports 3 3 Displaying Data: 5 types of graphics DATA categorical numerical categorical numerical clustered bar chart comparative box plots bar chart or pie chart comparative box plots scatter plot histogram or stem-and-leaf plot histogram or stem-and-leaf plot bar chart or pie chart 4 4 Combination of variable(s) Graphic One categorical (Lecture 2, 11) •Bar chart •pie chart One numerical (Lecture 2, 7) •Histogram • stem-&-leaf Two categorical (Lecture 2, 11, 12) Clustered bar chart Two numerical (Lectures 2, 9 & 10) Scatter plot One categorical and one numerical (Lecture 2, 8) Comparative box plots Displaying Data: 5 types of graphics (The following table conveys the same information as the previous slide.)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
Basic assumptions about you
Many elementary concepts have been skipped. At this stage, it is assumed that you should know them well. In particular, you MUST know how to do HATPC for each of the 8½ hypothesis tests.
Only important things, or those that inter-connect several topics together, are elaborated here.
You have ABSOLUTELY NO hope of passing STAT170 if you do not know the 8½ HATPCs. This PP file will NOT push you from F to P.
The contents of this file will only help the P or above students, given the presumed basic knowledge.
22
Binding things together
Review of:
• 5 types of graphics
• 5 types of research questions
• 8½ statistical tests
• 8 or MORE types of reports
33
Displaying Data: 5 types of graphics
DATA categorical numerical
categorical
numerical
clustered bar chart
comparative box plots
bar chart or pie chart
comparative box plots
scatter plothistogram or stem-and-leaf
plot
histogram or stem-and-leaf
plot
bar chart or pie chart
44
Combination of variable(s) Graphic
One categorical
(Lecture 2, 11)
•Bar chart
•pie chart
One numerical
(Lecture 2, 7)
•Histogram
• stem-&-leaf
Two categorical
(Lecture 2, 11, 12)
Clustered bar chart
Two numerical
(Lectures 2, 9 & 10)
Scatter plot
One categorical and one numerical
(Lecture 2, 8)
Comparative box plots
Displaying Data: 5 types of graphics
(The following table conveys the same information as the previous slide.)
55
5 types of graphics
STAT170 is restricted to only 5 types of combinations of variables, 5 different types of graphics, and 5 possible research questions.
The most important step is correctly identifying the types of variables: NUMERICAL vs CATERGORICAL. Surprisingly, many students have difficulty in this very first step.
The correct/wrong identification of variables would lead you to the correct/wrong:
• Type of graphic
• Research question, and
• Statistical test. 66
How to comment on graphics:
1. Comments on a single bar chart (seldom asked)
Comment depends on whether variable is ordinal or nominal
• Ordinal: comment similar to histogram
• Nominal: comment on which categories have the highest and lowest frequencies
0
50
100
150
200
250
300
350
400
meat vegetarian vegan
diet
count
Skewed to the right.
This doesn’t make any sense!
77
1. Comment on shape (skewed left/right, normal)
2. Range from xxxx to xxxx
3. Majority (high frequencies) of data about xxxx
4. Comment outliers (if present)
5. Comment on any unusual features (if present)
0 5 10 15 20 25 30
0
100
200
300
400
500
AssessmentFreq.
2. Comments on a single histogram(or stem-and-leaf plot)
80 3 6 9 12
0
20
40
60
80
100
Individual DaysFreq.
Example:
• U-shaped, high frequencies near both ends, lowest frequencies near the centre
• U-shaped, but slightly skewed left
• Range from 0 to 12
99
3. Comments on comparative boxplots
• Compare medians
• Compare spread (IRQ)
• Compare outliers
(Even when there are no outliers, say “no outliers”.)
Class
day
evening
15 20 25 30 35
Age
Class
1010
4. Comments on scatter plot
• Comment on linear/curved? Positive or negative slope?
• Comment on amount of scatter (big or small?)
• Comment on outliers, if any
• Comment on residuals
– Sym on both sides of the line/normal?
– Constant SD?
0
5
10
15
20
25
30
35
40
45
50
10 15 20 25 30 35 40 45
Median Age
Birth Rate
10
15
20
25
30
35
40
45
50
55
10 20 30 40 50 60 70
husband age
age marriage
10
20
30
40
50
60
70
80
90
100
110
-1 0 1 2 3 4 5GPA
UAI
1111
5. Comments on clustered bar charts
Compare the shapes of the clusters, NOT the sizes.
Shapes (not size) similar⇒ The 2 variables independent (ie have no association)
(since % are the same)
Shapes (not size) not similar⇒ The 2 variables not independent (ie have association)
(because % are not the same)
12
Comments on clustered bar charts: explanation
Never compare the actual frequencies (sizes).
Only compare % (or proportions) (shapes).
Since proportions are almost the same, ie about 1/3 and 2/3 for smokers and non-smokers,
smoking status is independent of Activity Level (no association)
13
Comments on clustered bar charts: explanation
Never compare the actual frequencies (sizes).
Only compare % (or proportions) (shapes).
Since percentages of smokers and non-smokers are obviously different for males and females, there is an association between smoking status and gender.
You only need to be able to identify between numerical and categorical.
No need to further classify into continuous or discrete(=integer), nor further classify into nominal or ordinal.
If you cannot distinguish between nominal and ordinal, you’ll only lose a few marks in Q.1.
But how about numerical vs categorical ? See next slide.
1717
Example: Numerical vs Categorical
Age: age in years
⇒ Numeric (continuous)
⇒ Histogram / stem-leaf
=> z-test or t-test
Age: 0-12 children (1),
13-18 teenager (2),
> 18 adult (3),
⇒ Categorical (ordinal)
⇒ bar chart /pie chart
=> Chi sq test of proportions (GOF test)
A mistake will cost you at least 6 marks in HATPC, plusother marks in subsequent parts of the questions.
The key is look at the definition, not the meaning we use in daily language. Read the question! The results are unchanged if we use the names ABC or XYZ instead of AGE.
1818
No one can help you …
How many such mistakes can you afford to make in exam? 3 such mistakes => you’ll fail in STAT170
You have absolutely no hope of passing STAT170 if you cannot distinguish between numerical and categorical variables – since the whole philosophy of STAT170 is based on classifying categorical and numerical variables. (This is unlike other 1st-year stat courses in other universities.)
1919
Absolute bottom line:1. HOW MANY variables?
2. Are the variables numerical or categorical?
Answering these 2 questions correctly will lead you to one of the 5 cases, and almost the correct test. The HATPC is then, hopefully, bookwork.
2020
How students fail ?But many students already have trouble in the first question: How to determine how many variables are there?
For example,
How many variables are there? 3 or 1?
Think of the survey. How many questions? 3 or 1?
How many columns do you need to store the data? 3 or 1?
You are doomed if you choose “3 variables”. In fact there is no test in STAT170 that involves 3 variables.
Who do you find it easier to make friends with?
0
50
100
150
200
250
300
350
400
same sex opposite sex either
response
frequency
21
How students fail ?
Another example:
How many variables are there? 1, 2 or 4?
You are doomed if you choose “4 variables”.
Smoker Non-smoker
Male 4 11
Female 5 8
22
Getting a pass in STAT170
You need to be able to do ALL of the following:
1. Count how many variables
2. Identify the variables as numerical or categorical
3. Do ALL 8½ hypothesis tests
You will fail in STAT170 if you cannot do just ONE of them!
(In fact, if you can do ALL of them well, a Cr is guaranteed.)
2323
Variable(s) Graphics Research Question
(e.g.)
Answering the research Q: Formal stat test
One
categorical
Barchart,
pie chart
•Is the proportion of smokers equal to 0.3?
•Are the proportions of meat-eaters, vegetarians & vegansequal to 0.8, 0.15 & 0.05?
•z-test of proportion (Lect 7) – 2 categories only
• χ2 test of proportions (GOF ) (Lect 11) -- 2 or more categories
One
numerical
Hist, stem-leaf, boxplot
Is the mean equal to …? z and t-tests of mean
(Lect 7)
Two
categorical
Clusteredbarchart
Is there an associationbetween … and …?
Chi sq test of association (Lect 11, 12) or Odds ratio
Two
numerical
Scatter plot Is there a relationbetween … and …?
Regression analysis: Test of slope (Lect 9,10)
One categ(binary) & one numeric
Comparativeboxplots
Is there a diff in heights between males and females?
2-sample t-test (Lect 8)
Note: 1. There is the paired t-test which doesn’t fit in any of 5 cases above, perhaps it fits best in the 2nd case (one sample t-test).2. 7½ tests above + paired t-test = 8½ hypothesis tests in STAT170
How to determine the appropriate test
2424
Beware of the paired t-test
The paired t-test may be mistaken as:
• 2-sample t-test
• Regression
Read the given Research Question
If you see “relation” or “predict” => regression
If you see “difference” => 2-sample t or paired t. Then think!
Eg: Weight loss program? Y1=Wt before, Y2=Weight after
2525
How to determine the appropriate testMethod 1
The ONLY SURE way to determine the correct test is to identify the variable types correctly!
Method 2
IF you cannot do (1), then you may look for keywords in the research questions. But be warned it is NOT 100% fool-proof.
• “association” => Chi-sq test of association
• “relation”, “predict” => Regression (with t-test on slope)
• “difference” => 2-sample t-test, or paired t-test
• “Proportion” (singular!), “percentage” => Z-test of proportion
• “Proportions” (plural), “percentages” => Chi-sq test of proportions(GoF)
• “mean”, “average” => One-sample z-test or t-test
See the underlined keywords in the previous slide.
NOT 100% fool-proof! Eg: Are proportions of smokers the same for males and females? => Chi-sq test of association
100% certain
2626
How to determine the appropriate test(continued)
Method 3 (Easiest for you)
Look at the given graphic, then deduce the appropriate test. This is almost certain, but many questions do NOT show graphs!
• ONE histogram/stem-leaf => z-test or t-test or paired t
• Bar chart/pie chart => chi-sq test of proportions (GOF)
(if binary, GOF or z-test of proportion)
• Clustered bar chart => chi-sq test of association
• Scatter plot => regression: test of slope
• TWO histograms/stem-leafs and/OR comparative box plots => 2-sample t
2727
3 types of statistical tests involving categorical data
Statistical test
Keywords in Res. Q
Ho Assumptions Test statistic
z-test of proportion
Proportion, %
Ho:π= π0 nπ0≥5, n(1-π0)≥5
Chi sq goodness of fit (chi sq test of proportions)
Proportions, percentages (plural)
Ho: π1=…,π2=…, π3=…
Ei=n*πi ≥5df=c-1
Chi sq test of independence (no association)
Association, independent, proportions
X and Y are independent
df=(r-1) (c-1)
n
pz
)1( 00
0
πππ−
−=
∑−
=j
jj
E
EO 22 )(
χ
∑−
=ij
ijij
E
EO 22 )(
χ
5
totalgrand
totcol tolrow
≥
×=iE
2828
3 types of statistical tests involving categorical data (CONTINUED)
Ho 95% C.I. Conclusion
(NOT reject Ho)
Conclusion (reject Ho)
Ho:π= π0 . . . . . . . . . Proportion π could be equal to π0
Proportion π is higher/lower than π0.
Ho: π1=…,π2=…, π3=…
. . . . . . . . . Read from computer output
The proportions π1=…, π2=…, π3=… COULD be correct.
The proportions π1=…, π2=…, π3=… are NOT correct.
X and Y are independent
. . . . . . . . .
-----------
X and Y COULD be independent (not associated)
X and Y are dependent (associated)
n
ppp
)1(96.1
−±
Copy Ho + “could be”
Opposite of Ho + is higher/lower
2929
5 types of statistical tests involving continuous data
Statistical test Keywords in Res. Q.
Ho Assumptions Test statistic
1-sample z-test of mean Mean,
average
Ho:µ=µ0
(σ known)Normal population, or
n ≥25 (CLT)1-sample t-test of mean
Ho:µ=µ0(σ unknown) df=n-1
Paired t-test difference Ho:µd=µ0 Difference from normal popn, or n ≥25 (CLT)
df=n-1
2-sample t-test difference Ho:µ1=µ2 Both groups from normalpopn, same SD
df=n1+n2-2
Test of linear relation between 2 variables
Relation, predict
Ho: β=0 LinearRes normalRes const SD
t=b/SEb
df=n-2
n
yz
/0
σµ−=
ns
yt
/0µ−=
ns
yt
d
dd
/
µ−=
nns
yyt
2
1
1
1p
21
+
−=
3030
5 types of statistical tests involving
continuous data (CONTINUED)
Ho 95% CI Conclusion
(NOT reject Ho) (Reject Ho)
Ho: µ=µ0
(σ known). . . . . . . . .
Ave xxx COULD be equal to µ0
Ave xxx is higher/lower thanµ0
Ho: µ=µ0(σ unknown)
. . . . . . . . .
Ho: µd=µ0
(paired t) . . . . . . . . .
The difference COULD be µ0 on
ave
The difference is higher/lower than µ0 on ave
Ho: µ1=µ2
(2-sample t)
. . . . . . . . . There COULD be no difference between ave xxxand ave xxx
Ave xxx is higher/lower thanave xxx
Ho: β=0b ± tn-2 SEb
There COULD be no relation between X & Y
There is a positive/negative relation.
ny
σ96.1±
n
sty n 1−±
n
sty dnd 1−±
21
21
11
)(
nnst
yy
p +±
−
ν
In ALL hypothesis tests, include CI in the conclusion.
Copy Ho + “could be”
Opposite of Ho + is higher/lower
3131
Examples of the 8 HATPCs?
It is assumed that you know them well at this stage. There are tons of examples of EACH in Lecture and Tutorial notes.
You have absolutely no hope of passing STAT170 if you cannot do the 8 HATPCs – since hypothesis tests, and related questions, span more than 60% of exam materials.
3232
1. One sample t-test (See Tutorial 8)
2. One-sample z-test
3. Paired t-test
4. 2-sample t-test
5. Z-test of proportion
6. Regression
7. Chi-sq test of proportions
8. Chi-sq test of independence (See Lect 13)
8 types of Simple Reports – involve only 1 hypothesis test only reports
3333
Key points to write in the Simple report
(Check list) – 1-hypothesis-test only
Introduction
*What this study is about, and why this study – if known
*Research question – any wording is OK
*Target population
Method
*How the sample was collected (why random and representative)
*Define variables
*Statistical method used
*Null hypothesis
*Justify assumptions [put under Method or Result, depending on the type of test]
3434
Results (NO HATPC; NO calculations)
*Test statistic
*P-val, decision (reject/not reject null)
Conclusion
*Decision in words: There is evidence/no evidence …
[Check that the research question is answered.]
*Your conclusion should be almost the same (several sentences) as the conclusion you have in the proper hypothesis test (HATPC), e.g. 95% CI if appropriate.
Note: It is most important that you identify the correct statistical method used (how???). For example, if it is a chi-sq test and you mention t-test, then the rest does not make sense, and you’ll lose most of the marks –and your time!
3535
Complex Reports: Involve severalhypothesis tests
Reports involving hypothesis tests of the same type:
• SIBT 2008B, 2009A – regressions • MQC 2009A, 2009C, 2010B, 2010C – regressions• SIBT 2009C, MQC 2010A – chi squares • University 2007, Term 2 – 2-sample t
Reports involving hypothesis tests of different types:
• SIBT 2008C, 2009B – 2-sample t & chi squares• MQC 2009B, 2011A, 2011B – regressions & 2-sample t 3636
Note: No matter how complicated it may appear (many X’s), there should only be ONE Y. (Several Y’s would bring you to post-graduate level!)
Since so many (at least 5) cases are possible,
it is stupid to copy a sample report (eg the one inTute 8) in your crib sheet, since there are
• 8½ possible simple reports
• at least 5 complex reports
3737
1st Example: SIBT 2008B exam (report question)
(I do not have a copy of the exam paper.)
Given 6 regressions (6 tables and 6 scatter plots):
Y vs x1, y vs x2, … y vs x6
Research Question: Which variables X1, … X6 are significant predictors for Y, and which bestpredicts Y?
3838
2nd Example: SIBT 2008C examResearch question: Which variables X1, X2, X3 and X4 affect Y?
Y and X4
Y and X3
Y and X1
Y and X2
3939
1st General Rule for COMPLEX report
Discard the bad variables:
• those where assumptions are violated – not valid.
• those whose p-val > 0.05 (ie those where Ho is NOT rejected, because null hypothesis represents no effect)
(eg no difference in 2-sample t, no relation in regression, no association in chi-sq test)
Variable P-val Significant variable?(Reject Ho?)
Result
X1 0.01 Yes (Reject Ho) Keep X1
X2 0.08 No (Not reject Ho) -------------(Discard X2)
X3 0.02 Yes (Reject Ho) Keep X34040
1st General rule for COMPLEX report
Warning: Common mistake:
• P-val<0.05 => reject Ho => reject the variable X Keep X
• P-val>0.05 => not reject Ho => => not reject variable X
Discard X
Golden rule: You may avoid mentioning Ho!
• p-val<0.05 => Keep X (Small prob (<5%), alarm bell rings)
• P-val>0.05 => Discard X
Warning: If you misunderstand the above, the conclusion of your report will be exactly opposite of what it should be, and you will lose MANY marks!
4141
2nd General rule for complex report
Sometimes the question may ask for the BEST variable that determines Y. Choose the best one within each group. Do NOT compare the p-val of one type of graph with the p-val of another type of graph. (Compare an apple with an apple; compare an orange with an orange.)
:
Regressions � choose best X
:
:
2-sample t’s � choose best X
:
:
Chi squares � choose best X
: 4242
What is the “best” X and how to choose it?
• In EACH set, choose the variable with the smallest p-val(ie the one that strongly rejects Ho) – EXCEPT regression.
• For regression, choose the largest r2, not smallest p-val
43
Example of choosing/discarding variables
Hence only X2 and X3 are significant (important) variables affecting Y. And X3 is the best predictor for Y.
Variable Assumptions
satisfied?
P-val Significant variable?
(Reject Ho?)
r2 Result
X1 No ----- ----- ----- -----
X2 Yes 0.006 Yes 0.53
X3 Yes 0.000 Yes 0.67 Best
X4 No ----- ----- ----- -----
X5 Yes 0.07 No (p-val>5%) ----- -----
An example on regression to illustrate 1st general rule:
Needed for
choosing
the BEST
variable(s)
4444
2nd Example: SIBT 2008C examResearch question: Which variables (X1, X2, X3 and X4) affect Y?
Y and X4
Y and X3
Y and X1
Y and X2
4545
Compare:
• Y vs WT : p-val = 0.00055
• Y vs STARTS: p-val = 0.0012
Both p-val< 0.05 => both Wt and Starts affect Y, but Wt has a stronger effect (because of smaller p-val).
Y vs Wt
Y vsStarts
4646
Compare:
• Y vs WIN: p-val=0.5641
• Y vs PAYOUT: p-val=0.0000
Hence WIN has no effect on Y. Payout has an effect.
Y vs WIN
Y vsPayout
4747
Key points to write in the COMPLEXreport (No rigid rules!)
Introduction
*Research question
*Target population
Method
*How the sample was collected (why random and representative)
*Define the Y and X variable(s)
*List ALL statistical methods used
*Check assumptions [put under Method or Result, depending on the type of test] in EACH case. (But AVOID lengthy repetitive checking the assumptions one by one.)
4848
Results (NO HATPC; NO calculations)
*Discard poor ones (assumptions violated, or p-val>0.05)
(AVOID lengthy repetitive checking p-val one by one.)
*IF required by the question, pick the best one within each group.
• Which of the variables X1, x2, …. affect variable Y?
• Which of the variables X1, x2, …. BEST affect variable Y?
4949
Hints and Tips: normal tables
1. 2-tailed normal table vs. 1-tailed normal table:
• 1-tailed – probability calculations
• 2-tailed – hypothesis testing
Suggestions:
The FIRST thing you should do in exam, before you start writing anything, is (on the two z-tables):
(This applies to the HD students as well!)
5050
5151
2. T statistic and the “tν” in C.I.
(This applies to ALL t tests: 1-sample t, 2-sample t, t in regression slope, paired t.)
• The t-statistic is calculated (not read from tables)
• The “tν” in 95% CI is read from table (row νand column 0.05)
The SECOND thing you should do in exam, before you start writing anything, is (on the t-table):
(This applies to the HD students as well!)
Hints and Tips: t and tcrit
5252
53
Hints and Tips: chi sq table
3. You should only use the top few rows of chi sq.
5454
Hints and Tips: y and y-bar
4. in probability calculations:
Look for the keyword “mean” or “average” => y-bar.
Note that there are NO such formulas:
n
yzvs
yz
σµ
σµ −=−= .
n
yzvs
yz
σµ
σµ −=−= .
5555
Hints and Tips: 2-sample t and paired t
5. Paired-t test vs. 2-sample t-test
No rules!
1st clue:
Different n1 & n2 => CANNOT be paired t-test; must be 2-sample t-test
If n1=n2 => either test is possible.
5656
5.
2nd clue:
Ask yourself “Can I move the values of one variable without moving the corresponding values of the other variable?”
• Can move values of one variable => independent data => 2-sample t
• Cannot move values of one variable (need to move BOTH variables) => dependent data => paired-t
5757
2nd clue:
From Lect 13: Age difference between husband and
wife
Can we swap the fathers’ of ages “33”and “46” WITHOUT moving the wives and the babies?
• Move alone =>indep’t => 2-sample• Move pairwise together => paired t
Baby ID
Mother’s age
Father’s age
21 28 33
22 34 40
23 24 26
24 34 45
25 32 35
26 24 27
27 30 39
28 29 27
29 37 34
30 41 46
5858
Hints and Tips: z and t tests
6. Z-test vs. t-test
• Know population standard deviation σ => z-test
• Do NOT know σ => t-test
Clues:
* “It is known that SD=xxx” => likely σ => z-test* Given numerical summary of data (MUST be sample):
The SD from a data set (sample) MUST be s, never σ=> t-test
* Do watch out if both σ and s are given. Once we have σ, s is useless => use z-test.
n mean StDev
xxxx xxxx xxxx
59
7. This is a common mistake: “When sample size is large (n≥25), the sample is approximately distributed.”
The statement means that if we make a histogram of the sample (n≥25), then the histogram should be approximately bell-shaped. This is NOT CLT; it is WRONG!
We know that as sample size n becomes larger and larger, the sample histogram looks more and more like the population, which could be anything.
The above statement is NOT CLT. The correct statement of CLT is: “When sample size is large (n≥25), the sample mean (y-bar) is approximately normally distributed.” The applies to one-sample z or t test, 2-sample t and the pair t-tests.
Hints and Tips: CLT
6060
Tips and Hints: Which condition?
nπ≥5 and n(1-π)≥5, or np≥5 and n(1-p)≥5 ?8. Lect 5 (prob calculation on p) or
Lect 7 (z-test on π)Check nπ≥5 and n(1-π)≥5
Lect 6: CI for ππππππππCheckCheck npnp≥5 and n(1-p)≥5Rule: Rule: p p goes goes with p, with p, ππππ goes with ππππ,
p NEVER goes with ππππ together.
Note that although the above 2 formulas are in the formula sheet, the 2 corresponding conditions are not. You have to know which one is the correct condition for checking.
n
pz
)1( πππ
−−=
n
ppp
)1(96.1
−±
6161
9. Find pth percentile
(a) Given ANY sample of size n, use the formula:
n*p/100 (Lect 2)
Then check result is integer or non-integer etc.Eg AGE: 12, 17, 28, 32, 33, 40, 40, 67 (MUST be sorted first!)
Tips and Hints:pth percentile
(including LQ, LQ)
6262
pth percentile
100
µ = 100σ = 15
(b) Given population (of infinite size) of known (given) µ and σ:
(i) Given normal:
Find z from the given area p (1-tailed)
Then find y = µ+σ*zEg: “It is known that IQ is normally distributed with mean 100 and SD 15. What is the 10th percentile?”What is the LQ?
(ii) non-normal (or unknown distribution)
CANNOT do it!
6363
10. No association/association between males and females.No association/association between smokers and non-smokers
(In fact, males, females, smokers and non-smokers are NOT variables.)
It should be: “Could be no association/There is association between Sex and Smoking Status.”
Smoker Non-smoker
Male 4 11
Female 5 8
Hints and Tips: Association
6464
Hints and Tips: Writing conclusion when Ho is NOT rejected
11. Many versions, hence students are confused.
Eg in 2-sample t-test:
(1) There could be no difference …; (there is strong no evidence to indicate otherwise.)
(2) There is probably no difference …
(3) There is no significant difference …
(4) There is no evidence to indicate a difference …
All of the above are correct!
Please stick to (1), which is easiest! (3) and (4) are double negatives, which you may make mistakes, with (3) being terrible. Keep things simple!
Note that in (2) or (3), if you miss out ‘probably’ or ‘significant’, then “There is no difference …” is wrong (accepting the null hypothesis).
6565
Try the chi sq test of association:
Ho: There is NO association between X and Y
Suppose we do NOT reject Ho.
Conclusion:
(1) There could be no association …; (there is strong no evidence to indicate otherwise.)
(2) There is probably no association …
(3) There is no significant association …
(4) There is no evidence of an association …
Again all of them are correct.
66
Writing conclusion in HATPC: the rules:
Eg 2-sample t: “Ho: There is no difference in exam marks on average for boys and girls.”
Eg chi sq test of association: “Ho: There is NO association between X and Y”
Eg regression: “Ho: β=0” (No relation between X and Y)
P-val<0.05 =>Reject Ho•Negate (make opposite) Ho•Be certain, use the verb “is”•Also give further info: “is greater/less than”, “is longer/shorter” (eg one-sample or 2-sample t) –except chi sq
P-val>0.05 =>Do not reject Ho
•Copy Ho
•Change the verb “is” to “could be”.
67
Writing conclusion in HATPC: Example 1
Eg 2-sample t: “Ho: There is no difference in exam marks on average between boys and girls.”
P-val<0.05 =>Reject Ho“There is a difference in exam marks between boys and girls, with girls have higher average than that of boys.”
P-val>0.05 =>Do not reject Ho“There could be no difference in average exam marks between boys and girls.”
68
Writing conclusion in HATPC: Example 2
Eg chi sq test of association: “Ho: There is NO association between sex and smoking status”
P-val<0.05 =>Reject Ho“There is association between sex and smoking status.”
P-val>0.05 =>Do not reject Ho“There could be no association between sex and smoking status.”
69
Hints and Tips: Symbols – their writings and meanings
12. Last, but not least, MANY students have lots of problems here. Surprising, it is not much more difficult than Primary 1 !!!
(a) Confusion of symbols of similar meanings:
1st yr Uni, STAT170:p=sample proportionπ=population proportionµ and s and σA confusion between p and π, µ and , and s and σwill cost your dearly in exam.
Primary 1:This is my book;this is your book;this is Mary’s book.My book, your book and Mary’s book are not the same.You will be in big trouble if you regard Mary’s money as the same as yours.
y
y
70
Hints and Tips: Symbols – their writings and meanings
(b) Confusion of look-alike symbols:
1st yr Uni, STAT170:µ and u σ and θβ and Bω and wΣ and E
Primary 1:i, j
g, p, qa, o, e, cd, bh, km, nl, 1u, vz, 2
Which is more difficult? Surprisingly, students find the symbols in STAT170 more difficult than the 26 English letters in Primary 1. If you have problems in the left column, you will be in big trouble. You will NOT lose “just a a few marks”, but many!
71
Predicting the future
The following happened in past semesters without exceptions, and WILL likewise occur in the future in this semester (prob=0.99999):
1. Someone will write u instead of µ.2. Someone will copy a sample report (from past exam
papers or Tute 8) onto the crib (pink) sheet.
3. Someone will leave the whole page blank on the hypothesis test on slope in regression, which is the easiest HATPC.
4. Someone will not know the meaning of r2.
5. Someone will write “There is an association between males and females”. This makes no sense at all.
72
Predicting the future
6. Someone will write
7. Someone will use the “formula” for 2-sample t or paired-t
n
yzand
yz
σµ
σµ −=−=
nss
yy
ns
yt
d
dd
/)(
0)(
/ 21
21
−−−=−= µ
7373
Ask yourself …
“How many hours did I spend on STAT170 each week, on average?”
Macquarie University recommends (minimum):
3 credit points * 3 hours
= 9 hours
= 4 hours in class + 5 hours on your own at home
Every WEEK.
74
Profile of students who fail –Failure check list
The followings are common characteristics of those who fail:
• Low class attendance• Did not study on a weekly basis• No/few attempts of online quizzes• #Can do at most one hypothesis test in exam• #Cannot do t-test on regression• #Cannot count how many variables• #Cannot distinguish between categorical and numerical• Do not know parameters vs statistics• Do not know the symbols µ, σ, π, β and ω• Mix up p and π, y-bar and µ, s and σ
75
Failure check list (continued!)
• Did not do the exercises on the tutorial sheets• Gave up assignment(s)• Do not know how to use calculator to find SD• Low marks in Practical Test• Copy past exam solutions, word by word, onto crib sheet• Copy report(s), word by word, onto crib sheet• Do not read past exam papers
How many ticks do you have in the above list ? ____Unfortunately, even just ONE tick, eg “Can do just
one hypothesis test”, can (and will) make a failure!