Top Banner
Preamble OBL 302 Statistics What seems to be the problems? Students’ view OBL 302 Lecture 1-3 S.M.S. Massomo 1 Enhanced Face to Face Dec 2009 Mpwapwa
86

Preamble

Dec 30, 2015

Download

Documents

ava-herman

Preamble. OBL 302 Statistics What seems to be the problems? Students’ view. Observations from the course tutor. It seems that some of you DO NOT know; How to use a scientific calculator for retrieval of mean, sum of squares, standard deviation etc. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Preamble

Preamble

OBL 302 StatisticsWhat seems to be the problems?Students’ view

OBL 302Lecture 1-3

S.M.S. Massomo1

Enhanced Face to Face Dec 2009Mpwapwa

Page 2: Preamble

Observations from the course tutor

It seems that some of you DO NOT know;1. How to use a scientific calculator for retrieval of mean, sum of squares,

standard deviation etc. 2. Which test should be used for a particular situation (question?). 3. How to do simple calculations, hence you fail to arrive at the correct answer

even when you know the correct procedure 4. That in hypothesis testing you can not arrive at a conclusion without

comparing the value of the test statistic and the critical value that must be read from a specific/appropriate table (you are allowed to bring statistical tables in examination rooms)

INFACT many students do not know how to read tables to obtain critical values and hence fail to make correct conclusion(s) when

answering questions.

OBL 302Lecture 1-3

S.M.S. Massomo2

Enhanced Face to Face Dec 2009Mpwapwa

Page 3: Preamble

Weaknesses among most of my students

6. That you waste time copying the whole question in your answer scripts. This is not necessary.

7. That you waste time trying to answer questions that you are not sure of instead of starting with questions that you are confident with.

8. It is important to follow instructions. 9. That it is not a good practice to attempt a paper

when you are not fully conversant with the course.

OBL 302Lecture 1-3

S.M.S. Massomo3

Enhanced Face to Face Dec 2009Mpwapwa

Page 4: Preamble

OBL 302Lecture 1-3

S.M.S. Massomo4

Enhanced Face to Face Dec 2009Mpwapwa

You are advised to

Read as many as possible references including those in the internet, do not rely solely on the course outline.

Read other Open University of Tanzania study materials that are similar to the OBL302 course outline? For example;

Course code

Course name Target students

OED 215 Educational statistics Education OEC 123 Introduction to Statistics and

Mathematics for Economists BA Economics

OMT 153 Probability and statistics Mathematics OMT 254 Advanced Statistics, Design and

Analysis of Experiments Mathematics

OHE 352 Statistics and Research Methods for Home Economics

Home Economics

Page 5: Preamble

OBL 302Lecture 1-3

S.M.S. Massomo5

Enhanced Face to Face Dec 2009Mpwapwa

Use the internet: Do not be shy.

Form discussion groups, do not study alone

Ask for help

Use our web based OBL302 discussion forum or contact me by email to [email protected]

o Do a lot of exercises, you can start with the past paper questions, textbook examples and so on.

Page 6: Preamble

Depth of material

• Course outline• Extended Course

outline• Compendium• Study material• Reference materials

Page 7: Preamble

OBL 302 Biostatistics

Dr Said M.S. MassomoFSTES, Morogoro

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo7

Page 8: Preamble

1.0 Introduction

OBL 302Lecture 1-3

S.M.S. Massomo8

What is statistics?

• There are two meanings

2. A branch of science within applied mathematics that deals with a collection of methods/techniques for Planning experiments, Collecting data, and then Organising, Summarizing, Presenting, Analysing, Interpreting data so as to assist in making more effective decisions

1. Numerical informationSingular form Statistic, Collection of > one figure = statistics

Page 9: Preamble

Introduction cont..

S.M.S. Massomo9

Statistics?

• Taught as core course in most programmes (examples at the OUT)

• The main difference is on scope of coverage and examples used

• It is applied Mathematics, basic knowledge of simple algebra is sufficient to master the course

Page 10: Preamble

Introduction cont..

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009 Mpwapwa

S.M.S. Massomo10

Why study Statistics?

• Numerical information is everywhere, how do we determine if the conclusions determined are reasonable?

• Decisions affect our daily lives and personal welfare eg Drugs and appropriate dosage

• Understand why decisions are made and will give you a better understanding of how they affect you

Page 11: Preamble

Introduction cont..

OBL 302Lecture 1-3

S.M.S. Massomo11

• You will always be required to make informed decisions. The questions will be

• Is the information adequate or not• Will additional information, if it is needed, provide

results that are not misleading• How do you summarise the information in a useful and

informative way• How do you analyze the information• How to draw conclusions and make inferences while

assessing the risks of an incorrect conclusion

Page 12: Preamble

Introduction cont..

OBL 302Lecture 1-3

S.M.S. Massomo12

• Biostatistics: Also known as Biometry, refers to the application of statistics to solve biological problems

What is an experiment?

• Planned enquiry/activity designed with the aim of getting new information, confirm or deny certain previous information

Page 13: Preamble

Course Coverage

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo13

1. Introduction

2. Describing data: Measure of location

3. Describing data: Measure of dispersion

4. The normal Probability distribution

5. Confidence intervals

6. Test of hypothesis: Small samples

7. Test of Hypothesis: Large samples

8. Linear Regression and correlation

9. Chi-square test

10. ANOVA

11. Other important notes

Page 14: Preamble

Types of Statistics

OBL 302Lecture 1-3

S.M.S. Massomo14

There are two types of statistics (i) Descriptive statistics and

(ii) Inferential statistics

• Planning experiments,

• Collecting data,

• Organising data,

• Summarizing data,

• Presenting data,

• Analysing data,

• Interpreting data

• Conclusions

Descriptive statistics

Inferential statistics

Page 15: Preamble

Introduction cont..

OBL 302Lecture 1-3

S.M.S. Massomo15

Descriptive statistics: Are statistical procedures that describe, organise and summarize the main characteristics of sample data

• Organising data, How?• Summarizing data, how?

– Means, range Standard deviations, Frequency tables etc

• Presenting data, – Histograms, charts & tables etc

Page 16: Preamble

Introduction cont..

OBL 302Lecture 1-3

S.M.S. Massomo16

Standard Deviation & Standard error of the mean

• Calculations

• Short cut method

• Use of scientific calculators

• Implications

Page 17: Preamble

Comparing Qualities of Measurement Scales (After Dunn, 2001)

OBL 302Lecture 1-3

S.M.S. Massomo17

Provide less information

Provide more information

Scale name

Defining features ExamplesMore qualitative

More Quantitative

Nominal Names, Labels, categories. qualitative operations: =, ≠

Gender (1=male, 2=female), Ethnicity or religion of a person, smoker Vs non smoker

Ordinal Observation ordered or ranked,qualitative operations: <, >

Class rank (1st, 2nd ..), Rank such as low, high

Interval Order or ranking, equal intervals between observations, no true zero pointqualitative operations: +, -, X, ÷

Fahrenheight temperatureIQ score

Ratio Order or ranking, equal intervals between observations, true zero pointqualitative operations: +, -, X, ÷

Weight, height, reaction time, speed etc

Page 18: Preamble

Normal Probabilities

OBL 302Lecture 1-3

S.M.S. Massomo18

• Comprehension of this table is vital to success in the course!

• There is a table which must be used to look up standard normal probabilities. The z-score is broken into two parts, the whole number and tenth are looked up along the left side and the hundredth is looked up across the top. The value in the intersection of the row and column is the area under the curve between zero and the z-score looked up.

• Because of the symmetry of the normal distribution, look up the absolute value of any z-score.

Page 19: Preamble

Normal Probabilities

OBL 302Lecture 1-3

S.M.S. Massomo19

• There are several different situations that can arise when asked to find normal probabilities.

Situation I nstructions

Between zero and any number

Look up the area in the table

Between two positives, or Between two negatives

Look up both areas in the table and subtract the smaller from the larger.

Between a negative and a positive

Look up both areas in the table and add them together

Less than a negative, or Greater than a positive

Look up the area in the table and subtract from 0.5000

Greater than a negative, or Less than a positive

Look up the area in the table and add to 0.5000

Page 20: Preamble

Normal Probabilities

OBL 302Lecture 1-3

S.M.S. Massomo20

• This can be shortened into two rules. • If there is only one z-score given, use 0.5000 for the

second area, otherwise look up both z-scores in the table

• If the two numbers are the same sign, then subtract; if they are different signs, then add. If there is only one z-score, then use the inequality to determine the second sign (< is negative, and > is positive).

Page 21: Preamble

Normal Probabilities

OBL 302Lecture 1-3

S.M.S. Massomo21

Finding z-scores from probabilities• This is more difficult, and requires you to use the table

inversely. You must look up the area between zero and the value on the inside part of the table, and then read the z-score from the outside.

• Finally, decide if the z-score should be positive or negative, based on whether it was on the left side or the right side of the mean. Remember, z-scores can be negative, but areas or probabilities cannot be.

Page 22: Preamble

Normal Probabilities

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo22

Situation Instructions

Area between 0 and a value Look up the area in the table Make negative if on the left side

Area in one tail Subtract the area from 0.5000 Look up the difference in the table Make negative if in the left tail

Area including one complete half (Less than a positive or greater than a negative)

Subtract 0.5000 from the area Look up the difference in the table Make negative if on the left side

Within z units of the mean Divide the area by 2 Look up the quotient in the table Use both the positive and negative z-scores

Two tails with equal area (More than z units from the mean)

Subtract the area from 1.000 Divide the area by 2 Look up the quotient in the table Use both the positive and negative z-scores

Page 23: Preamble

The Normal Probabilities..

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo23

• The values in the table are the areas between zero and the z-score.

• That is, P(0 < Z < z-score)

• See tables

Page 24: Preamble

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo24

Standard Normal Probabilities (Area under normal curve)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

Page 25: Preamble

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo25

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Page 26: Preamble
Page 27: Preamble

Inferential statistics

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo27

Inferential statistics: extend the scope of descriptive statistics by examining the relationships within a set of data, in particular, inferential statistics enable the researcher to make inference, that is conclusions / deductions / judgements, about the population based on the relationships within the sample data.

Inferential statistics: Unlike descriptive statistics, inferential statistics make inference about a population basing on sample data

Page 28: Preamble

Population Vs Sample

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo28

• Population: All subjects possessing a common characteristic that is being studied. Example?

• Sample: subgroup or subset of the population. examples?

• Why do we work with samples?

Page 29: Preamble

Inferential Statistics

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo29

Why do we work with samples?

1. Cost

2. Practicability, eg destructive sampling

3. Time constraint

4. It is possible to draw correct conclusions if sampling is done in a proper way

Page 30: Preamble

Inferential statistics...

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo30

• Random sample: A sample that has been drawn from a population such that each individual in the population has an equal chance of being selected.

• Central limit theorem: theorem which states ‘as the sample size increases, the sampling distribution of the sample means will become approximate normally distributed

• Sampling error: Difference that occurs between the sample statistic and the population parameter due to the fact that the sample is not a perfect representation of the population.

Page 31: Preamble

Inferential statistics: Test of hypotheses

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo31

Test of hypotheses (Also called test of significance)

• Definition: It is a statistical test that examine a set of sample data, and on a basis of an expected distribution of the data (eg Z, t, F or Chi),at a specific level of significance and leads to a decision about whether to reject null hypothesis or alternative hypothesis

Page 32: Preamble

Test of hypotheses...

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo32

• Statistical tests = difference between sample means divided by Error term (within group error)

• Statistical significance: Refers to whether a test detected a reliable difference between two or more groups, one caused by the effect of an independent variable on a dependent measure

Page 33: Preamble

Test of hypotheses..

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo33

• A hypothesis: A statement about a population that is subject for testing. Hypothesis may be null or alternative.

• Null hypothesis: A statement about a population that is under test. Denoted as Ho, and Ho state that there is no difference between means or there is no effect. Always include the equal sign =,

• Alternative hypothesis: A statement that is true when Ho is false. Hi determine whether the test is left/right one tailed of two tailed. Characterised by presence of inequality sign

Page 34: Preamble

Test of hypotheses: Type I error and Type II error

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo34

Type I: Rejecting Ho when it is true. Usually more serious error

Type II: Accepting Ho when it is false, that is saying true when it is false (examples...).

• Usually defendants are presumed innocent until proven guilty. The purpose of a court trial is to see whether a null hypothesis of innocence is rejected by the weight of the data (evidence).

• The null hypothesis : Ho = the person is innocent,

• The alternative hypothesis Hi = the person is guilty

Which is more serious error? Convicting an innocent person or letting the guilty person go free?

Page 35: Preamble

Test of hypotheses...

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo35

Level of significance: Also called p-value or alpha. Refers to the probability of rejecting the null hypothesis when it is true. P=0.05 and 0.01 are common for biological studies. It is a way of expressing the likelihood that Ho is not true. The level of significance is the complement of the level of confidence in estimation. If no level of significance is given then use P=0.05.

Test statistic: A value, determined from sample information, used to determine whether to reject the null hypothesis. Eg Z, t, F value.

Page 36: Preamble

Test of hypotheses...

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo36

A critical value: The value(s) which separates the critical region from the non critical region.

• The critical values are determined independently of the sample statistics.

• They are read from appropriate tables of distribution.

Critical region: also called rejection region, is a set of all values which would cause us to reject Ho. If the test statistic falls in the rejection region Ho is rejected.

Page 37: Preamble

Test of hypotheses...

OBL 302Lecture 1-3

Enhanced Face to Face Dec 2009Mpwapwa

S.M.S. Massomo37

Arrive at a decision: A statement based upon the null hypothesis. It is either ‘reject the null hypothesis’ or ‘fail to reject the null hypothesis’. Usually we NEVER accept the null hypothesis.

Conclusion: A statement which indicates the level of evidence (sufficient or insufficient), at a specific level of significance and decide whether the original claim is rejected (null) or supported (alternative)

Page 38: Preamble

Coverage OBL 302 part II1. Test of hypothesis: Small samples using t test

a) The t test for small samplesb) The t test for independent samplesc) The t test for dependent samples

2. Test of Hypothesis: Large samples using Z test3. Chi-square test4. Linear Regression and correlation5. ANOVA6. Other important notes

OBL 302Lecture 1-3

S.M.S. Massomo38

Enhanced Face to Face Dec 2009Mpwapwa

Page 39: Preamble

Test of hypothesesSmall samples using the t test

Why use t distribution?The t-test is used for small samples (n < 30) as Z

distribution provides unreliable estimates of differences between samples when the number of available observation is less than 30

Remember the t distribution is more flatter than the Z distribution

OBL 302Lecture 1-3

S.M.S. Massomo39

Enhanced Face to Face Dec 2009Mpwapwa

Page 40: Preamble

The t-test ..Application of the t-test• The t test was created to deal with small samples when

parameters and variability of larger parent population is unknown

• The t tests are used to compare one or two sample means but not more than two means.

• The t test detects a significant difference between means when the– Difference is large, – Sample standard deviation is small and or – Sample size is large

OBL 302Lecture 1-3

S.M.S. Massomo40

Enhanced Face to Face Dec 2009Mpwapwa

Page 41: Preamble

Variation of the t test: 1. Single or one sample t test

• This is used to compare the observed mean of one sample with a hypothesized value assumed to represent a population.

• T or Z test both use similar formulasTest statistic= Diference between sample means

Standard error of the mean• It tries to answer the question: is it likely that a sample with a

given mean could have come from a population with the proposed µ?

• It is usually used to determine if some set of scores or observation deviate from some established pattern examples?

OBL 302Lecture 1-3

S.M.S. Massomo41

Enhanced Face to Face Dec 2009Mpwapwa

Page 42: Preamble

Variation of the t test: 1. Single or one sample t test

• If the population standard deviation, sigma, is unknown, then the population mean has a student's t distribution, and you will be using the t-score formula for sample means.

• The test statistic is very similar to that for the z-score, except that sigma has been replaced by s and z has been replaced by t.

• The critical value is obtained from the t-table. The degree of freedom for this test is n-1.

OBL 302Lecture 1-3

S.M.S. Massomo42

Enhanced Face to Face Dec 2009Mpwapwa

Page 43: Preamble

1. Single or one sample t test• A poultry farm produces chickens with a mean weight

of 2.18 kg at the age of 5 months. The weights are normally distributed. In an event to increase their weight, a special additive was mixed with the chicken feed. The subsequent weights of a sample of five-month-old chickens were (in Kg) 2.21, 2.19, 2.17, 2.18, 2.15, 2.20, 2.18, 2.19, 2.20 and 2.20.

• At the 0.05 level of significance, determine whether the special additive has increased the weight of chicken. (26 marks)

OBL 302Lecture 1-3

S.M.S. Massomo43

Enhanced Face to Face Dec 2009Mpwapwa

Page 44: Preamble

1. Single or one sample t test

OBL 302Lecture 1-3

S.M.S. Massomo44

Enhanced Face to Face Dec 2009Mpwapwa

Step 1. Calculate the sample Standard deviation and Mean

Sd =ටσ𝑥2−(σ𝑥)2 𝑛ൗ�𝑛−1

ඩ47.25−ሺ21.87ሻ2 10൘

10−1 = ට47.25−478.30 10ൗ�9 = ට47.25−47.839 =

𝑠𝑑= ξ0.05 = 0.0177

Mean (x) = σ𝑥/𝑛 = 21.87/10 = 𝟐.𝟏𝟖𝟕

Page 45: Preamble

Variation of the t test: 1. Single or one sample t test

OBL 302Lecture 1-3

S.M.S. Massomo45

Enhanced Face to Face Dec 2009Mpwapwa

T value (t0.05/2) at 9 d.f (Critical value from table at p = 0.05) = 2.262

Step 2. Calculate the value of Test statistics (t)

(a) Formula 𝑡 = 𝑋−µ𝑠𝑑 ξ𝑛൘

𝑡 = 2.187−2.180.018 ξ10ൗ� 𝑡 = 0.0070.018 3.16ൗ�

𝑡 = 0.0070.0057 𝑡 = 𝟏.𝟐𝟐𝟖

Conclusion: Since the value of test statistic (t = 1.228) is less than the critical value (2.262)We fail to reject the null hypothesis Ho: X = 2.18, instead we reject the alternative hypothesis Hi: X ≥ 2.18 hrsIn other words the sample mean (2.187 kg) is not significantly different from the population mean (2.18 kg).

Page 46: Preamble

2. The T test for independent groups (two sample test)

• Independent Samples: samples are independent when they are not related.

• Independent samples may or may not have the same sample size.

• Designed to detect significant difference between a control group and an experimental group

• It tries to answer the Question: Is X1 different from X2 or could the two sample means come from identical population?

• Examples....see Z test for two samples why?• The test statistic is very similar to that for the z-score, except

that sigma has been replaced by s and z has been replaced by t. OBL 302Lecture 1-3

S.M.S. Massomo46

Enhanced Face to Face Dec 2009Mpwapwa

Page 47: Preamble

3. Dependent Samples T test (paired samples t test)

• Samples in which the subjects are paired or matched in some way

• Dependent samples must have the same sample size, but it is possible to have the same sample size without being dependent.

Type of Dependent samples are • Those characterised by a measurement, an intervention of some

type, then another measurement. In other words, a paired t test is designed to detect the presence of measurable change in the average attitude/behaviour of group from one point in time to another point in time.

It tries to answer the Question: Is the mean one (X1) different from mean two (X2)?

• Involves matching or pairing of observation

OBL 302Lecture 1-3

S.M.S. Massomo47

Enhanced Face to Face Dec 2009Mpwapwa

Page 48: Preamble

The t test: Dependent samples

OBL 302Lecture 1-3

S.M.S. Massomo48

Enhanced Face to Face Dec 2009Mpwapwa

To measure the effect of a fitness campaign, five students were randomly sampled and their weights (in Kg) were recorded before and after the exercise as presented in the following table. Using 0.05 level of significance, determine whether the campaign had any significantly effect on the students (18 marks).

Students A B C D E Before 88.45 76.65 83.00 70.30 76.20 After 89.35 73.93 81.65 68.04 72.57

...... characterised by a measurement, an intervention of some type, then another measurement. In other words, a paired t test is designed to detect the presence of measurable change in the average attitude/behaviour of group from one point in time to another point in time.

Page 49: Preamble

The t test: Dependent samples

OBL 302Lecture 1-3

S.M.S. Massomo49

Enhanced Face to Face Dec 2009Mpwapwa

Student Before After d d2 A 88.45 89.35 -0.90 0.81 B 76.65 73.93 2.72 7.40 C 83.00 81.65 1.35 1.82 D 70.30 68.04 2.26 5.11 E 76.20 72.57 3.63 13.18

Total 9.06 28.32

Mean

1.81

Page 50: Preamble

The t test: Dependent samples

OBL 302Lecture 1-3

S.M.S. Massomo50

Enhanced Face to Face Dec 2009Mpwapwa

SD = = = =

= 1.73

Formula t = = = = =

2 marks for SD formula

4 marks for SD value

2 marks for table t value

3 marks for correct conclusion

2 marks for t formula

2 marks for t value

2.35

Page 51: Preamble

The t test: Dependent samples

OBL 302Lecture 1-3

S.M.S. Massomo51

Enhanced Face to Face Dec 2009Mpwapwa

Critical value at 4 df = 2.132

Conclusion Since test statistic (calculated t value: 2.35) > the critical

value (2.132), we fail to accept the null hypothesis and accept alternative hypothesis

Conclude that the fitness campaign significantly reduced the weights of students

Page 52: Preamble

OBL 302Lecture 1-3

S.M.S. Massomo52

Enhanced Face to Face Dec 2009Mpwapwa

Page 53: Preamble

The Z test: one sampleQ3. A manufacturing process has produced

thousands units of a certain laboratory chemical with a mean shelf life of 1,200 hours and a standard deviation of 300 hours. A new process is tried and a sample of 100 units produced had a sample average of 1,265 hours.

• At the 0.05 level of significance, determine whether the new manufacturing process is better than the old one (18 marks)

• Is this a one tailed test or two tailed test? Why? (7 marks).

OBL 302Lecture 1-3

S.M.S. Massomo53

Enhanced Face to Face Dec 2009Mpwapwa

Page 54: Preamble

The Z test:one sample

OBL 302Lecture 1-3

S.M.S. Massomo54

Enhanced Face to Face Dec 2009Mpwapwa

(a) Formula

Critical value from table at p = 0.05 is 1.96

Conclusion: Since the value of test statistic (Z=2.167) exceed the critical value (1.96)

We fail to accept null hypothesis Ho: µ = 1200 hrs and accept the alternative hypothesis Hi: µ ≥ 1200 hrs

In other words the sample mean (1265 hrs) is significantly greater than the population mean (1200 hrs).

2 marks for Z formula

6 marks for Z value

5 marks for table value

5 marks for correct conclusion

Page 55: Preamble

The Z test: one sample

OBL 302Lecture 1-3

S.M.S. Massomo55

Enhanced Face to Face Dec 2009Mpwapwa

(b) This is a one tailed test (3 marks)

Why? Because the intention of the study and was to improve the process of production and Hi show a directional increase (4 marks)

Page 56: Preamble

The Z test: two sample• A random sample of 120 workers in one large farm

took an average of 22.0 minutes to complete a task, with a variance of 4. A random sample of 120 workers in a second large plant took an average of 19.0 minutes to complete the task, with a variance of 10.

• Using an appropriate test, at the 5% level, Determine whether there is a significant difference between the two populations mean completion times (11 marks)

OBL 302Lecture 1-3

S.M.S. Massomo56

Enhanced Face to Face Dec 2009Mpwapwa

Page 57: Preamble

The Z test: two sample

OBL 302Lecture 1-3

S.M.S. Massomo57

Enhanced Face to Face Dec 2009Mpwapwa

Z = =

Z= = = 9.09

Critical value from table = 1.96

2 marks for Z formula

4 marks for Z value

2 marks for table value

3 marks for correct conclusion

Note: A similar procedure is used with the t test except that critical value is read from the t distribution. Usually it is larger than that of Z distribution

Page 58: Preamble

The Z test: two sampleConclusion: • Since test statistic (calculated Z value: 9.09) > the

critical value (1.96), we fail to accept the null hypothesis and accept alternative hypothesis

• That is the two mean completion times are significantly different

OBL 302Lecture 1-3

S.M.S. Massomo58

Enhanced Face to Face Dec 2009Mpwapwa

Page 59: Preamble

The Z test: two sampleIs this a one tailed or two tailed test? Why? (4 marks)• It is a two tailed test (2 marks)• Because the intention was to find whether the two

means differ. Alternative hypothesis Hi: X1 ≠ X2 does not indicate a direction (2 marks)

• What is the p-value? (3 marks)• Since the test statistic (9.09) is far greater than max Z

value of 3.0 in the table, the p-value is less than 0.001. That is 0.5 – 0.4990

OBL 302Lecture 1-3

S.M.S. Massomo59

Enhanced Face to Face Dec 2009Mpwapwa

Page 60: Preamble

OBL 302Lecture 1-3

S.M.S. Massomo60

Enhanced Face to Face Dec 2009Mpwapwa

Page 61: Preamble

The Chi Square test: Definitions• Chi-square distribution : A distribution obtained from the multiplying

the ratio of sample variance to population variance by the degrees of freedom when random samples are selected from a normally distributed population

• Contingency Table : Data arranged in table form for the chi-square independence test

• Expected Frequency : The frequencies obtained by calculation. • Goodness-of-fit Test : A test to see if a sample comes from a

population with the given distribution. • Independence Test : A test to see if the row and column

variables are independent. • Observed Frequency: The frequencies obtained by observation.

These are the sample frequencies.

OBL 302Lecture 1-3

S.M.S. Massomo61

Enhanced Face to Face Dec 2009Mpwapwa

Page 62: Preamble

(χ2) Chi-square..Properties of the (χ2) Chi-square distribution are• Chi-square is non-negative. Is the ratio of two non-

negative values, therefore must be non-negative itself.

• Chi-square is non-symmetric. • There are many different chi-square distributions,

one for each degree of freedom. • The degrees of freedom when working with a

single population variance is n-1.

OBL 302Lecture 1-3

S.M.S. Massomo62

Enhanced Face to Face Dec 2009Mpwapwa

Page 63: Preamble

Application of the χ2 testApplication of the χ2 test 1. Goodness-of-fit Test2. Test for Independence

OBL 302Lecture 1-3

S.M.S. Massomo63

Enhanced Face to Face Dec 2009Mpwapwa

Page 64: Preamble

Variation of the χ2 test: 1. Goodness-of-fit Test

• The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.

• Two values are involved, an observed value, which is the frequency of a category from a sample, and the expected frequency, which is calculated based upon the claimed distribution.

• The idea is that if the observed frequency is really close to the claimed (expected) frequency, then the square of the deviations will be small.

OBL 302Lecture 1-3

S.M.S. Massomo64

Enhanced Face to Face Dec 2009Mpwapwa

Page 65: Preamble

Variation of the χ2 test: 1. Goodness-of-fit Test

QuestionAccording to the Mendelian genetic model, a certain garden pea plant should produce offspring that have white, pink, and red flowers, in the proportion of 25%, 50%, 25%. A sample of 1000 such offspring was coloured as follows: white 21%; red 27% ; pink 52%.

• Using an appropriate test, can you reject the Mendelian hypothesis at the 5% level? (25 marks).

OBL 302Lecture 1-3

S.M.S. Massomo65

Enhanced Face to Face Dec 2009Mpwapwa

Page 66: Preamble

Variation of the χ2 test: 1. Goodness-of-fit Test

• Calculations

OBL 302Lecture 1-3

S.M.S. Massomo66

Enhanced Face to Face Dec 2009Mpwapwa

Colour Observed ratio (%)

Observed ratio Obs (No)

Expected ratio Exp (No)

Difference (d)

d2 d2/Exp Marks

White 21 210 250 -40 1600 6.4 2 Pink 52 520 500 20 400 0.8 2 Red 27 270 250 20 400 1.6 2 1000 0 8.8 4

Formula : X2 (5 marks)

Page 67: Preamble

Variation of the χ2 test: 1. Goodness-of-fit Test

• Calculated value = 8.8• X2 table value (critical value) at 0.05 = 5.991 (5 marks)• Conclusion: (5 marks)• Since calculated value (8.8), of the test statistic, is

greater than the critical value at 0.05 (5.991), we fail to accept null hypothesis (Ho) and accept null hypothesis (Hi), that is observed values significantly deviate from the expected 25:50:25 ratio

OBL 302Lecture 1-3

S.M.S. Massomo67

Enhanced Face to Face Dec 2009Mpwapwa

Page 68: Preamble

Variation of the χ2 test: 2.  Test for independence

• In the test for independence, the claim is that the row and column variables are independent of each other. This is the null hypothesis.

• The multiplication rule said that if two events were independent, then the probability of both occurring was the product of the probabilities of each occurring.

• This is key to working the test for independence. If you end up rejecting the null hypothesis, then the assumption must have been wrong and the row and column variable are dependent. Remember, all hypothesis testing is done under the assumption the null hypothesis is true.

OBL 302Lecture 1-3

S.M.S. Massomo68

Enhanced Face to Face Dec 2009Mpwapwa

Page 69: Preamble

Variation of the χ2 test: 2. Test for independence...

• The test statistic used and the principle behind the test for independence is the same as the principle behind the goodness-of-fit test. The test for independence is always a right tail test.

• In fact, you can think of the test for independence as a goodness-of-fit test where the data is arranged into table form. This table is called a contingency table.

• The test statistic has a chi-square distribution when the following assumptions are met

• The data are obtained from a random sample • The expected frequency of each category must be at least 5.

OBL 302Lecture 1-3

S.M.S. Massomo69

Enhanced Face to Face Dec 2009Mpwapwa

Page 70: Preamble

Variation of the χ2 test: 2.  Test for independence

OBL 302Lecture 1-3

S.M.S. Massomo70

Enhanced Face to Face Dec 2009Mpwapwa

An ecological study was carried out to determine the association between two plant species in Serengeti plains. The researchers randomly threw a 1m X 1m sampling frame several times and recoded the presence and or absence of the two plant species in the samples as follows;

Plant species A Plant species B Totals

Present Absent Present 90 (A) 181 (B) 271 Absent 66 (C) 113 (D) 179 Totals 156 294 450

Page 71: Preamble

Variation of the χ2 test: 2.  Test for independence

What is random sampling? (give 6 marks)• Random sampling: Sampling that gives each individual

in the population a known likelihood/equal chance of being selected in the sample

• Extras: Avoid biasness and may result into a representative sample

Perform an appropriate test, using the 0.05 level of significance, to determine if there is association between the two species (14 marks)

OBL 302Lecture 1-3

S.M.S. Massomo71

Enhanced Face to Face Dec 2009Mpwapwa

Page 72: Preamble

Variation of the χ2 test: 2.  Test for independence

OBL 302Lecture 1-3

S.M.S. Massomo72

Enhanced Face to Face Dec 2009Mpwapwa

Shortcut method

formula X2 =

≈ 0.639

Page 73: Preamble

Variation of the χ2 test: 2.  Test for independence

OBL 302Lecture 1-3

S.M.S. Massomo73

Enhanced Face to Face Dec 2009Mpwapwa

Traditional method

Formula : X2 (2 marks)

Calculations (Award 7 marks see last column)

S/N Observed value (Obs)

Expected value (Exp)

Difference (d)

d2 d2/Exp Marks

A 90 93.95 -3.95 15.60 0.166 1

B 181 177.10 3.95 15.60 0.088 1

C 66 62.05 3.95 15.60 0.251 1

D 113 116.95 -3.95 15.60 0.133 1

Totals 0.00 0.639 3

Page 74: Preamble

Variation of the χ2 test: 2.  Test for independence

• Calculated X2 value (test statistic) = 0.639• X2 table value (critical value) at 0.05 =

3.841 (2 marks)

Conclusion: (3 marks)• Since calculated value (0.639), of the test statistic,

is less than the critical value at 0.05 (3.841), we do not reject the null hypothesis (Ho) and reject alternative hypothesis (Hi), and conclude there is no association between the two species

OBL 302Lecture 1-3

S.M.S. Massomo74

Enhanced Face to Face Dec 2009Mpwapwa

Page 75: Preamble

Summary of the χ2 test

• Chi-square test;• Formula : X2

• Deals with observed and expected frequencies• Non parametric test commonly used for • Goodness of fit test: Fixed ratio hypothesis or

Category data• Test of independence: whether or not frequencies

are associated

OBL 302Lecture 1-3

S.M.S. Massomo75

Enhanced Face to Face Dec 2009Mpwapwa

= ሺ𝑂𝑏𝑠− 𝐸𝑥𝑝ሻ2𝐸𝑥𝑝

Page 76: Preamble

Regression Analysis

OBL 302Lecture 1-3

S.M.S. Massomo76

Enhanced Face to Face Dec 2009Mpwapwa

Q4. The following data show grain yield of rice following application of different rates of Nitrogen fertilizer in the field.

(a) Compute the simple linear regression equation between the two variables i.e. Nitrogen rates and grain yield (22 marks).

(b) What is the mean yield of rice when 136 kg/ha of fertilizer are applied? (8 marks).

Grain yield (t/ha) 4.2 5.4 6.6 7.1

Nitrogen rate (kg/ha) 0 50 100 150

Page 77: Preamble

Regression AnalysisRegression analysis; • A procedure used to predict the value of one dependent

variable (Y) from another independent variable (X) • Regression equation y = a + bxCorrelation analysis;• A test that determines the nature and strength of

association between two or more variables• Correlation can be zero, positive or negative• Diagram(s)

OBL 302Lecture 1-3

S.M.S. Massomo77

Enhanced Face to Face Dec 2009Mpwapwa

Page 78: Preamble

Correlation Analysis• Increase in one variable

result into a decrease of the other variable and vice versus

• This is an example of ......... Linear Correlation

OBL 302Lecture 1-3

S.M.S. Massomo78

Enhanced Face to Face Dec 2009Mpwapwa

Page 79: Preamble

Regression Analysis

OBL 302Lecture 1-3

S.M.S. Massomo79

Enhanced Face to Face Dec 2009Mpwapwa

X mean = 75 = 35,000

a = 4.33 b = 0.02 = 140.77

(a) Formula for b = =

0.0198 ≈ 0.02

Formula for a = = ≈

4.33

2 marks for b formula

6 marks for b value

6 marks for ‘a’ value

6 marks for correct Regression formula

2 marks for ‘a’ formula

Page 80: Preamble

Regression Analysis

OBL 302Lecture 1-3

S.M.S. Massomo80

Enhanced Face to Face Dec 2009Mpwapwa

Simple linear regression equation: (Y= a + bx) = Y = 4.33 + 0.02x

(b) Yield of rice when 136 kg/ha of fertilizer are applied? (8 marks, award 2 marks for the formula and 6 for correct answer).

Formula: Y= a + bx

Y = 4.33 + (0.02 x 136)

Y = 4.33 + 2.72 = 7.05 Kg

Page 81: Preamble

The F test• Test whether two samples are from a population

with equal variances• Comparison of several means simultaneously ANOVA

OBL 302Lecture 1-3

S.M.S. Massomo81

Enhanced Face to Face Dec 2009Mpwapwa

Page 82: Preamble

ANOVA• Analysis of variance;• Abbreviated as ANOVA, is a technique used to test

a hypothesis concerning means of three or more populations

• It determines if there is statistical differences between the means

• It is based on the F test that test whether two independent variances are equal or not

• It is a right tailed test• F distribution is used

OBL 302Lecture 1-3

S.M.S. Massomo82

Enhanced Face to Face Dec 2009Mpwapwa

Page 83: Preamble

ANOVA

OBL 302Lecture 1-3

S.M.S. Massomo83

Enhanced Face to Face Dec 2009Mpwapwa

Study the following yield data obtained from a certain fertilizer experiment. Prepare an ANOVA table and at the 0.05 level of significance, determine (i) Whether fertilizers differ and (ii) Whether replications differ (18 marks).

Fertilizer Applied

Replication Totals 1 2 3 4

No fertilizer 6.0 6.4 6.5 5.5 24.4 Fertilizer A 6.9 7.5 7.0 6.6 28.0 Fertilizer B 7.2 7.4 7.8 6.8 29.2 Totals 20.1 21.3 21.3 18.9 81.6

Page 84: Preamble

ANOVA

OBL 302Lecture 1-3

S.M.S. Massomo84

Enhanced Face to Face Dec 2009Mpwapwa

Study the following yield data obtained from a certain fertilizer experiment. Prepare an ANOVA table and at the 0.05 level of significance, determine (i) Whether fertilizers differ and (ii) Whether replications differ (18 marks).

Fertilizer Applied

Replication Totals 1 2 3 4

No fertilizer 6.0 6.4 6.5 5.5 24.4 Fertilizer A 6.9 7.5 7.0 6.6 28.0 Fertilizer B 7.2 7.4 7.8 6.8 29.2 Totals 20.1 21.3 21.3 18.9 81.6

= 81.6

Page 85: Preamble

ANOVA

OBL 302Lecture 1-3

S.M.S. Massomo85

Enhanced Face to Face Dec 2009Mpwapwa

Correction factor (CF) = = = 554.88 (1.5 marks)

Total SS = 559.56 – CF = 4.68

Rep SS = ((20.12 + ... 18.92) / 3) – CF , = (1668.60 / 3) – CF

= 556.2 – CF = 1.32

Treat SS = ((24.42 + 28.02 + 29.22) / 4) – CF , = (2232/4) – CF = 558.00 – CF = 3.12

Error SS = Total SS – (Rep SS + Treat SS)

= 4.68 – (1.32 + 31.2) = 0.24

Page 86: Preamble

ANOVA cont..

OBL 302Lecture 1-3

S.M.S. Massomo86

Enhanced Face to Face Dec 2009Mpwapwa

Source of variation

Degree of freedom

Sum of squares

Mean SS F-Value Table F-value (0.05)

Rep. 3 1.32 0.44 11.00 4.76 Treat. 2 3.12 1.56 39.00 5.14 Error 6 0.24 0.04 Total 11 4.68 Conclusions (4 marks: Award 2 marks for each correct conclusion) (i) Since calculated F value 11.0 (test statistic) for Replications > table value

4.76 (critical value) we fail to accept Ho. We accept Hi: and conclude that the means for replications differ significantly

(ii) Since calculated F value 39.0 (test statistic) for Treatments > table value 5.14 (critical value) we fail to accept Ho. We accept Hi: and conclude that the means for treatments/fertilizers differ significantly